Network Render Crashes / Issues

Use this forum for general user support and related questions.
Forum rules
Please upload a testscene that allows developers to reproduce the problem, and attach some images.
Post Reply
gecko
Posts: 31
Joined: Mon Jan 01, 2018 11:10 pm

Network Render Crashes / Issues

Post by gecko »

Just found myself with a scene complex enough that I wanted to bother with network rendering again, and I can't seem to get it to work. I think I'm encountering two separate issues, but not sure. Any ideas?
  • I'm using Blender 2.82 and LuxCore 2.3
  • I'm running Windows 10 on both computers
  • Rendering inside Blender with LuxCore works fine
  • Generating the BCF file appears to work fine
  • Running both Console and Node on the same computer results in the Node randomly stopping work a few seconds in to receiving the files. It doesn't crash, as I can still interact with the window (eg scroll through the log). It just stops work, and the status reverts to "Waiting for new connection". The Console continues running, just waiting infinitely for the Node to have a film ready.
    • For the complex scene, processing stops in the middle of receiving one of the mesh files.
    • If I export just the basic startup scene (single cube), processing stops later (after receiving the mesh file, but before rendering starts). It's roughly the same amount of time that passes (5-10sec) in both scenarios.
  • If I run Node on a networked computer only, I get the error "Permission denied" on the Node when trying to receive the BCF file (and the Console reports that the Node shut down the connection). It makes several attempts at this with the same errors before giving up.
    • I've expressly allowed both inbound and outbound connections on the port.
    • I've expressly allowed both public and private network communication for pyluxcoretool on both computers (maybe something else also needs permission?).
    • For what it's worth, in all cases I was hosting the BCF and related files on my network file share (a 3rd computer running FreeNAS), but both Windows PCs running LuxCore have the file share mapped as a network drive (so shouldn't behave any differently than a local file), and both were connected to the drive at the time of testing. When rendering locally in Blender all files (textures, .blend, caches, etc) are located on this same network file share.
User avatar
Dade
Developer
Developer
Posts: 5672
Joined: Mon Dec 04, 2017 8:36 pm
Location: Italy

Re: Network Render Crashes / Issues

Post by Dade »

It is somewhat a known problem: https://github.com/LuxCoreRender/LuxCore/issues/100

I have never been able to replicate the problem on Linux but it happens for someone on Windows. Due to lack of interest in network rendering, none has fixed it and/or further developed the related Python code.

If you are using only 2 nodes it may be a lot simpler to just do network rendering "by hand":

1) export the scene in .bcf format (complete stand alone format, it includes everything required for the rendering, non need of shared file system, etc.);
2) start the rendering with stand alone LuxCore on each node using a different random number generator seed;
3) save the film on each node;
4) merge the saved films with pyluxcoretools and save the PNG/JPG/EXR/whatever.

I can further explain each single step if you are interested.
Support LuxCoreRender project with salts and bounties
acasta69
Developer
Developer
Posts: 472
Joined: Tue Jan 09, 2018 3:45 pm
Location: Italy

Re: Network Render Crashes / Issues

Post by acasta69 »

I have tried it here, both with startup cube scene and something more complex, but it's working fine.
Maybe you could send the node and console log?
Support LuxCoreRender project with salts and bounties

Windows 10 64 bits, i7-4770 3.4 GHz, RAM 16 GB, GTX 970 4GB v445.87
gecko
Posts: 31
Joined: Mon Jan 01, 2018 11:10 pm

Re: Network Render Crashes / Issues

Post by gecko »

Well the plot thickens. I'd be fine to manually start the render on both PCs instead of using the network render capability, my only concern would be that if both PCs are on the same network anyway, aren't they going to detect eachothers' nodes and cause problems? But we can deal with that later. Right now I'm still having the same two (different) issues on each PC when using the external renderer.

On my slave PC, here are the logs when trying to start up the job locally (I moved the BCF and all supporting files for the basic cube scene to the local PC's desktop, and am running both Console and Node on this PC which I previously had configured as only a Node). These look identical to the logs I was getting when trying to use this PC as a Node for a network render. My main PC is having the other issue where the Node seems to time out on itself after a few seconds. At the moment I have a render running inside Blender though, and I don't want to screw it up running tests in the external renderer. Hopefully will be able to post some logs from that PC tomorrow though.

I'm slightly surprised that no one else is clamoring to get network rendering working, as I consider it one of the main reasons for using LuxCore (the other being caustics). Of course after years of using Lux I'm pretty comfortable with its material pipeline, but with the switch to node-based materials and the fact that I run most of my texturing through Substance these days... it's really distributed network rendering and caustics that keep me on Lux.

Console:

Code: Select all

[MainThread][2020-04-13 20:40:50,342] LuxCore 2.3
[NetBeaconReceiverThread][2020-04-13 20:40:50,343] NetBeaconReceiver thread started.
[NetBeaconReceiverThread][2020-04-13 20:40:52,084] Discovered new node: 192.168.200.218:18018
[MainThread][2020-04-13 20:41:02,913] Creating single image render farm job: C:/Users/Andrew/Desktop/Untitled_LuxCore/00001.bcf
[MainThread][2020-04-13 20:41:02,913] New render farm job: C:/Users/Andrew/Desktop/Untitled_LuxCore/00001.bcf
[MainThread][2020-04-13 20:41:02,914] Job file md5: eb59ee38e35900900dd44b3d78d3da60
[MainThread][2020-04-13 20:41:02,915] -------------------------------------------------------
[MainThread][2020-04-13 20:41:02,915] Job started: C:/Users/Andrew/Desktop/Untitled_LuxCore/00001.bcf
[MainThread][2020-04-13 20:41:02,915] -------------------------------------------------------
[RenderFarmNodeThread-192.168.200.218:18018][2020-04-13 20:41:02,916] Node thread started
[FilmMergeThread][2020-04-13 20:41:02,917] Film merge thread started
[RenderFarmNodeThread-192.168.200.218:18018][2020-04-13 20:41:02,918] Remote node has the same pyluxcore verison
[RenderFarmNodeThread-192.168.200.218:18018][2020-04-13 20:41:02,918] Sending file: C:/Users/Andrew/Desktop/Untitled_LuxCore/00001.bcf
[RenderFarmNodeThread-192.168.200.218:18018][2020-04-13 20:41:02,920] [WinError 10054] An existing connection was forcibly closed by the remote host
Traceback (most recent call last):
  File "C:\Users\Andrew\AppData\Local\Temp\_MEI134522\pyluxcoretools.zip\pyluxcoretools\renderfarm\renderfarmjobsingleimage.py", line 409, in NodeThread
    socketutils.SendFile(nodeSocket, self.jobSingleImage.GetRenderConfigFileName())
  File "C:\Users\Andrew\AppData\Local\Temp\_MEI134522\pyluxcoretools.zip\pyluxcoretools\utils\socket.py", line 90, in SendFile
    RecvOk(soc)
  File "C:\Users\Andrew\AppData\Local\Temp\_MEI134522\pyluxcoretools.zip\pyluxcoretools\utils\socket.py", line 62, in RecvOk
    line = RecvLine(soc)
  File "C:\Users\Andrew\AppData\Local\Temp\_MEI134522\pyluxcoretools.zip\pyluxcoretools\utils\socket.py", line 42, in RecvLine
    data = soc.recv(BUFF_SIZE)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
[RenderFarmNodeThread-192.168.200.218:18018][2020-04-13 20:41:02,921] Node thread done
[NetBeaconReceiverThread][2020-04-13 20:41:04,085] Retrying node: 192.168.200.218:18018
Node:

Code: Select all

[MainThread][2020-04-13 20:40:35,422] LuxCore 2.3
[NetBeaconSenderThread][2020-04-13 20:40:37,081] NetBeaconSender thread started.
[Thread-1][2020-04-13 20:40:37,081] Waiting for a new connection
[Thread-1][2020-04-13 20:41:02,917] Received connection from: ('192.168.200.218', 62539)
[Thread-1][2020-04-13 20:41:02,918] Remote pyluxcore version: 2.3
[Thread-1][2020-04-13 20:41:02,918] Local pyluxcore version: 2.3
[Thread-1][2020-04-13 20:41:02,918] Receiving RenderConfig serialized file: renderfarmnode-e8d7161a-4d16-4899-a579-3aac381f0846.bcf
[Thread-1][2020-04-13 20:41:02,918] Receiving file: renderfarmnode-e8d7161a-4d16-4899-a579-3aac381f0846.bcf
[Thread-1][2020-04-13 20:41:02,919] [Errno 13] Permission denied: 'renderfarmnode-e8d7161a-4d16-4899-a579-3aac381f0846.bcf'
Traceback (most recent call last):
  File "C:\Users\Andrew\AppData\Local\Temp\_MEI139562\pyluxcoretools.zip\pyluxcoretools\renderfarm\renderfarmnode.py", line 152, in __HandleConnection
    socketutils.RecvFile(clientSocket, renderConfigFile)
  File "C:\Users\Andrew\AppData\Local\Temp\_MEI139562\pyluxcoretools.zip\pyluxcoretools\utils\socket.py", line 108, in RecvFile
    with open(fileName, "wb") as f:
PermissionError: [Errno 13] Permission denied: 'renderfarmnode-e8d7161a-4d16-4899-a579-3aac381f0846.bcf'
[Thread-1][2020-04-13 20:41:02,920] Connection done: ('192.168.200.218', 62539)
[Thread-1][2020-04-13 20:41:02,920] Waiting for a new connection
User avatar
Dade
Developer
Developer
Posts: 5672
Joined: Mon Dec 04, 2017 8:36 pm
Location: Italy

Re: Network Render Crashes / Issues

Post by Dade »

gecko wrote: Tue Apr 14, 2020 12:53 am Well the plot thickens. I'd be fine to manually start the render on both PCs instead of using the network render capability, my only concern would be that if both PCs are on the same network anyway, aren't they going to detect eachothers' nodes and cause problems?
You can use luxcoreui.exe or the "console" mode of pylucoretools, there is no need to use networkrendering script.
gecko wrote: Tue Apr 14, 2020 12:53 am

Code: Select all

PermissionError: [Errno 13] Permission denied: 'renderfarmnode-e8d7161a-4d16-4899-a579-3aac381f0846.bcf'
I assume you have installed BlendLuxCore with Admin rights and/or on a directory where it lacks the filesystem write permission. It can not create the temporary file 'renderfarmnode-e8d7161a-4d16-4899-a579-3aac381f0846.bcf' so it throws an error. You should fix filesystem permissions.
Support LuxCoreRender project with salts and bounties
gecko
Posts: 31
Joined: Mon Jan 01, 2018 11:10 pm

Re: Network Render Crashes / Issues

Post by gecko »

Dade wrote: Tue Apr 14, 2020 7:41 am You can use luxcoreui.exe or the "console" mode of pylucoretools, there is no need to use networkrendering script.
Ah, ok got it.
Dade wrote: Tue Apr 14, 2020 7:41 am I assume you have installed BlendLuxCore with Admin rights and/or on a directory where it lacks the filesystem write permission. It can not create the temporary file 'renderfarmnode-e8d7161a-4d16-4899-a579-3aac381f0846.bcf' so it throws an error. You should fix filesystem permissions.
Ok, this seems really strange to me - LuxCore writes the temp BCF file to its own installation directory? That might explain the issue with Windows installations - if Lux is installed in Program Files (which would be the logical place to stick it), it won't get write access to that directory without explicitly setting it. Maybe also only an issue for standalone Lux (which is what I installed) - I'm pretty sure Blender sticks addons in the AppData folder, which should have write access enabled by default (I think).

Either way, slave PC is running. Now i just need to figure out why my main PC won't render outside of Blender...
gecko
Posts: 31
Joined: Mon Jan 01, 2018 11:10 pm

Re: Network Render Crashes / Issues

Post by gecko »

Ok, here are the logs from my main PC launching both the console and node on this computer from a BCF file on the local desktop. For what it's worth, I used the button inside Blender to launch LuxCore so I wouldn't need to go digging for wherever Blender decided to install it. I didn't notice this before, but the command line window is completely blank in this scenario, different from when I launch in standalone mode on my slave PC. So these logs are pulled from the interface window. Again, not sure if this matters.

Node logs. Note that this is for the startup cube scene. Loading a more complex scene, it stops at one of the "Loading serialized mesh" steps. Same amount of total runtime before the hang. I've waited over an hour for it to progress in both cases. It's not locked up, just stops progressing.

Code: Select all

LuxCore 2.3
Waiting for configuration...
Started
NetBeaconSender thread started.
Waiting for a new connection
Received connection from: ('192.168.200.46', 50223)
Remote pyluxcore version: 2.3
Local pyluxcore version: 2.3
Receiving RenderConfig serialized file: renderfarmnode-6a1a5c0f-a36b-4371-9fed-f24437117d8e.bcf
Receiving file: renderfarmnode-6a1a5c0f-a36b-4371-9fed-f24437117d8e.bcf
Transfered 2.06 Kbytes in 00:00:00 (2.02 Mbytes/sec)
Receiving RenderConfig serialized MD5: eb59ee38e35900900dd44b3d78d3da60
Received seed: 1
Reading RenderConfig serialized file: renderfarmnode-6a1a5c0f-a36b-4371-9fed-f24437117d8e.bcf
[SDL][55.437] Loading serialized mesh: Mesh_Cube2075980532072000
[SDL][55.437] Material definition: Material2075979363528
[SDL][55.437] Camera type: perspective
[SDL][55.437] Camera position: Point[7.35889, -6.92579, 4.95831]
[SDL][55.437] Camera target: Point[6.70733, -6.31162, 4.51304]
[SDL][55.437] Camera clipping plane disabled
[SDL][55.437] Scene objects count: 1
[SDL][55.437] Light definition: __WORLD_BACKGROUND_LIGHT__
[SDL][55.453] Light definition: 2075980533528
OpenCL render engines available
[LuxCore][55.453] Film resolution: 1920x1080
[SDL][55.453] Film output definition: RGB_IMAGEPIPELINE [image.png]
[SDL][55.453] Image pipeline: film.imagepipelines.0
[SDL][55.453] Image pipeline step 0: NOP
[SDL][55.453] Image pipeline step 1: TONEMAP_LINEAR
[SDL][55.453] Image pipeline step 2: GAMMA_CORRECTION
[SDL][55.453] Film output definition: RGB_IMAGEPIPELINE [RGB_IMAGEPIPELINE_0.png]
[LuxRays][55.484] OpenCL Platform 0: NVIDIA Corporation
[LuxRays][55.484] Device 0 name: NativeThread
[LuxRays][55.484] Device 0 type: NATIVE_THREAD
[LuxRays][55.484] Device 0 compute units: 1
[LuxRays][55.484] Device 0 preferred float vector width: 4
[LuxRays][55.484] Device 0 max allocable memory: 0MBytes
[LuxRays][55.484] Device 0 max allocable memory block size: 0MBytes
[LuxRays][55.484] Device 1 name: GeForce GTX 1060 6GB
[LuxRays][55.484] Device 1 type: OPENCL_GPU
[LuxRays][55.484] Device 1 compute units: 10
[LuxRays][55.484] Device 1 preferred float vector width: 1
[LuxRays][55.484] Device 1 max allocable memory: 6144MBytes
[LuxRays][55.484] Device 1 max allocable memory block size: 1536MBytes
[LuxRays][55.484] Creating 12 intersection device(s)
[LuxRays][55.484] Allocating intersection device 0: NativeThread (Type = NATIVE_THREAD)
[LuxRays][55.484] Allocating intersection device 1: NativeThread (Type = NATIVE_THREAD)
Console logs. The console will eventually announce that no film files were received from the node and continue waiting for it indefinitely.

Code: Select all

LuxCore 2.3
NetBeaconReceiver thread started.
Discovered new node: 192.168.200.46:18018
Creating single image render farm job: C:/Users/sauer/Desktop/Untitled_LuxCore/00001.bcf
New render farm job: C:/Users/sauer/Desktop/Untitled_LuxCore/00001.bcf
Job file md5: eb59ee38e35900900dd44b3d78d3da60
-------------------------------------------------------
Job started: C:/Users/sauer/Desktop/Untitled_LuxCore/00001.bcf
-------------------------------------------------------
Node thread started
Film merge thread started
Remote node has the same pyluxcore verison
Sending file: C:/Users/sauer/Desktop/Untitled_LuxCore/00001.bcf
Transfered 2.06 Kbytes in 00:00:00 (0 bytes/sec)
Sending seed: 1
Waiting for node rendering start
gecko
Posts: 31
Joined: Mon Jan 01, 2018 11:10 pm

Re: Network Render Crashes / Issues

Post by gecko »

I've confirmed that the issue with the network render node randomly stopping during load is specific to launching PyLuxCoreTool from inside Blender. I downloaded the standalone version of LuxCore (and set filesystem permissions to allow it to write to its install directory), exported my complex scene from BlendLuxCore, and opened the BCF in standalone LuxCore. This works both locally and across the network. Definitely annoying, but at least it works.
Post Reply