Tiled OIDN Denoising

Post by **B.Y.O.B.** » Sat Mar 09, 2019 6:44 pm

CodeHD wrote: ↑Sat Mar 09, 2019 6:21 pm Regarding other images to test, maybe you (and others) could provide me with some? I don't really have many scenes myself, so I could only test on simple stuff and it would be better to have some diverse scenes.

You could use the example scenes: https://luxcorerender.org/download/#ExampleScenes

CodeHD wrote: ↑Sat Mar 09, 2019 6:21 pm And last, a question about the code: Am I right in assuming that the link you quoted, plus the code below for the viewport denoise are the only two places where OIDN is called? I don't think we need to implement it for the viewport, I just want to know if that is how it works

Yes, those are the only places.
I also think we don't need it in the viewport.

Cool that you're taking it on

If you have other questions, feel free to ask.

Post by **CodeHD** » Sat Mar 09, 2019 7:07 pm

B.Y.O.B. wrote: ↑Sat Mar 09, 2019 6:44 pm
CodeHD wrote: ↑Sat Mar 09, 2019 6:21 pm Regarding other images to test, maybe you (and others) could provide me with some? I don't really have many scenes myself, so I could only test on simple stuff and it would be better to have some diverse scenes.
You could use the example scenes: https://luxcorerender.org/download/#ExampleScenes

Of course... Somehow my logic earlier was that I need blend files to render something, don't know why I forgot about the standalone

Post by **B.Y.O.B.** » Sat Mar 09, 2019 7:16 pm

You can use blend files - a basic implementation with hardcoded stripe width and overlap percentage should not require any changes to the Blender addon. You would only need to recompile pyluxcore and replace it in the BlendLuxCore/bin/ folder.
Of course you can also compile luxcoreui and use that one.

Post by **CodeHD** » Sat Mar 09, 2019 7:21 pm

That's not what I meant, I was only talking about getting more images to test OIDN.
I forgot about the LuxCoreTestScenes repository + the UI

Post by **CodeHD** » Sat Mar 09, 2019 9:03 pm

(Note up front: Should we move this discussion to the OIDN thread in the dev forum? Might be better suited there.)

I got the testscenes working now (with the commandline tool in the end).

For the implementation, we should discuss/decide some points:

1) As the purpose is to make large scenes fit into ram to avoid swap-speed-penalties, it would technically be ideal to have a check for available memory, then decide if to do split or single OIDN pass. I don't know how well this can be done, and also I don't think I could implement it (let alone cross platform), but I could prepare for it. Otherwise, hardcode it so that it is always split for >= 8Mpix (i.e. 4K)?

2) Input parameters: number of slices: technically should also be dependent on RAM + image size. Although we can choose a hardcoded value like 8, which should cover most reasonable combinations that people will use. If someone really wants to run OIDN on a 200Mpix render, well...

3) Input parameters: number of pixels overlap: can probably be hardcoded, my tests yesterday used only 50 pixels. Any objections why it should be variable?

4)B.Y.O.B., you said something about not needing buffers if you do horizontal stripes. I might understand what you mean when I look further into the current implementation, but could you elaborate for me anyways?

Post by **B.Y.O.B.** » Sat Mar 09, 2019 10:01 pm

CodeHD wrote: ↑Sat Mar 09, 2019 9:03 pm 4)B.Y.O.B., you said something about not needing buffers if you do horizontal stripes. I might understand what you mean when I look further into the current implementation, but could you elaborate for me anyways?

The main input pixels are passed to OIDN in this line:

Code: Select all

filter.setImage("color", (float *)pixels, oidn::Format::Float3, width, height);

So we pass a starting pointer and width/height.
OIDN then reads from the starting pointer onwards in the array until (starting pointer + width * height * 3).

This means that cutting the image into quadratic tiles is problematic: it requires you to create a temp buffer because you can't pass a stride to OIDN - the data has to be in a continuous array.
But if you cut it into horizontal stripes, you just have to pass smaller height values and intermediate starting pointers, something like:

1st stripe:
- start 0
- width 1000
- height 100

2nd stripe:
- start 0 + 1000 * 100 * 3
- width 1000
- height 100

3rd stripe:
- start 0 + (1000 * 100 * 3) + (1000 * 100 * 3)
- width 1000
- height 100

And so on.
You can write:

Code: Select all

for (int stripeIndex = 0; stripeIndex < STRIPECOUNT; ++stripeIndex) {
	float *start = ((float *)pixels) + width * stripeHeight * 3 * stripeIndex;
	filter.setImage("color", start, oidn::Format::Float3, width, stripeHeight);
	
	// set extra AOVs (normals, albedo) and output, then denoise and write stripe result into result buffer
}

Is this explanation good?

Post by **Dade** » Sat Mar 09, 2019 10:26 pm

Just add to the plugin an option (i.e. an on/off flag) for "Stripe denoising". You can than use fixed height for stripes (and a fixed overlap height too).

With tiles, we could have bound to max. memory used (i.e. the tile size) but stripes are far more handy as explained by B.Y.O.B. And with a height of 256 (for instance) the amount of memory required for a 4K will be ~1/8 of normal denoising.

Post by **CodeHD** » Sun Mar 10, 2019 1:03 pm

B.Y.O.B. wrote: ↑Sat Mar 09, 2019 10:01 pm Is this explanation good?

Perfect, thanks!

Dade wrote: ↑Sat Mar 09, 2019 10:26 pm Just add to the plugin an option (i.e. an on/off flag) for "Stripe denoising". You can than use fixed height for stripes (and a fixed overlap height too).

With tiles, we could have bound to max. memory used (i.e. the tile size) but stripes are far more handy as explained by B.Y.O.B. And with a height of 256 (for instance) the amount of memory required for a 4K will be ~1/8 of normal denoising.

ok, yes, fixed height is also an option. With the further test images I also want to compare the speed penalty for different settings, then we can see if it is reasonable for large images. (I'm thinking of people using OIDN for animations, where it could matter, perhaps - but lets see...)

Post by **CodeHD** » Wed Mar 13, 2019 3:47 pm

Sorry for the silence, my entire plans for this week spontaneoulsy imploded on Saturday evening, so I didn't have time to continue until now...
Hence I just want to give a little sign that I am still working on it

I am now in the process of rendering my way through the test scenes and taking OIDN times for it. Below are some preliminary results.

Some specs:

Windows 10
32GB RAM, SSD 970EVO 1TB M.2/PCIe as cache/swap
OIDN executed with 4 threads on a 6-core (i5-9600K) to avoid resource-fighting influencing the results.
I am now working with EXR files as input, but still image only, no albedo/normals
The times reported in the table are the times reported by the OIDN call.
each image sliced in 4 stripes, reagrdsless of resolution.

For the largest case (6400x4800), it was about 50/50 RAM/Cache according to the task manger.
Unfortunately I don't have a computer here with a swap on HDD for comparison.

Post by **lacilaci** » Wed Mar 13, 2019 3:53 pm

Looks good, more importantly it could make denoising possible for super high res shots...

LuxCoreRender Forums

Tiled OIDN Denoising

Re: LuxCoreRender v2.2alpha0 released

Re: LuxCoreRender v2.2alpha0 released

Re: LuxCoreRender v2.2alpha0 released

Re: LuxCoreRender v2.2alpha0 released

Re: LuxCoreRender v2.2alpha0 released

Re: Tiled OIDN Denoising

Re: Tiled OIDN Denoising

Re: Tiled OIDN Denoising

Re: Tiled OIDN Denoising

Re: Tiled OIDN Denoising