In my previous posts of these series I analyzed the basic idea behind the depth-aware upsampling techniques. In the first post [1], I implemented the nearest depth sampling algorithm [3] from NVIDIA and in the second one [2], I compared some methods that are improving the quality of the z-buffer downsampled data that I use with the nearest depth. The conclusion was that the nearest depth sampling alone is not good enough to reduce the artifacts of Iago Toral’s SSAO implementation in VKDF [4] to an acceptable level. So, in this post, I am going to talk about my early experiments to further improve the upsampling and the logic behind each one. I named it part 3.1 because while having started the series I’ve found that some combinations of these methods with other ones can give quite better visual results, and as my experiments with the upsampling techniques cannot fit one blog post, I am going to split the upscaling improvements (part 3) in sub-parts.
As we have seen, improving the SSAO texture upscaling with a depth aware technique involves 2 parts:
- Improving the z-buffer downsampling (so that it contains the depths the most representative of the original depths of the scene).
- Improving the upsampling of the SSAO texture using information from the depth buffer and maybe other resources.
The previous post was about optimizing the downsampling part and we’ve seen that selecting depths by taking the maximum in a 2×2 neighborhood or by taking once the minimum and once the maximum following a checkerboard pattern can improve the nearest depth upsampling and reduce the artifacts where we have depth discontinuities (samples belong to different surfaces) but it doesn’t improve significantly the overall quality. We’ve also seen that the linear interpolation causes too many artifacts but works well in surfaces because it is some form of weighted average. We also rejected the downsampling by taking the minimum depth everywhere although it reduces the artifacts in the nearest surfaces (which is some information we might use again in later posts) because it cannot compete with the maximum depth where we have depth discontinuities.
This post is about improving the upsampling of the SSAO texture but the conclusions above about the nearest depth capabilities will still be useful to understand the method that follows.
Samples classification using depth information
We’ve seen that the nearest depth works better where we have depth discontinuities and the linear interpolation works better on surfaces, wouldn’t it be nice to combine the methods and see if we can improve the upscaling part?
I will refer again to this analysis of the upscaling techniques used in the Call of Duty Black Ops 3. The author suggested that the following methods can give an insight about whether all samples of the neighborhood lie in the same surface (in order to average them) or there is a depth discontinuity (not all samples belong to the same surface):
1 2 3 |
- lerp(bilinear_weights, depth_weights, f(depth_discontinuity)) * four_samples - lerp(bilinear_sample, best_depth_sample, f(depth_discontinuity)) - bilinear_fetch(lerp(bilinear_texcoords, best_depth_texcoords, f(depth_discontinuity))) |
The suggestions above are all equivalent, and so I will only explain the second one that is the shortest of the 3. π
The idea here, is that we use some depth based metric function to understand whether a 2×2 neighborhood can be classified as part of a continuous surface or as part of a region where we have some discontinuity (not all samples fall in one surface), this metric is the f(depth_discontinuity)
. Then, depending on the return value of this function, we either select the best depth (which is the result of the nearest depth in our case) or the bilinear texture coordinates (which is the result of the linear interpolation or the result of the GLSL texture2D
). In our case, we will obviously select the nearest depth where we detect discontinuities and the linear interpolation where we detect surfaces.
So what would be a good depth based metric? I don’t know what exactly they used in Black Ops as it was not explained in detail in the article, but this is what I tried and I suppose that they did something similar:
In each 2×2 neighborhood of the downscaled z-buffer, I calculated the distance between the maximum and the minimum depth. When this distance was small I was assuming that all the 4 samples belong to the same surface (so the depths are all close to each other) when it was above a certain threshold, I assumed that there is some sort of discontinuity in the region. When a discontinuity was detected, I was selecting the SSAO texture sample that corresponds to the depth of the neighborhood that its value is closer to the original depth (nearest depth algorithm). In all other cases, I performed linear interpolation (average the colors).
in GLSL that would be:
1 2 3 4 5 |
float min_depth = min(min(depth1, depth2), min(depth3, depth4)); float max_depth = max(max(depth1, depth2), max(depth3, depth4)); float step_distance = max_depth - min_depth; float s = step(0.0000013, step_distance); |
and the selection after performing nearest depth and lerp (see the first post for the shader) would be something like this:
1 |
mix(texture(ssao_texture, in_uv), nearest_depth, s); |
Although, this method sounds simple and reasonable, I quickly realized that it could never work well with a demo like sponza. Let’s see why:
That value: 0.0000013 that I use as a threshold above, was the one that was resulting to the best separation of surfaces and discontinuities for my scene. It was found by a binary search trial and error and it generated the following image:
(Actually this image above is the result of 1.0 - step(0.0000013, step_distance);
because I wanted to have black edges on white background but it can still demonstrate why this algorithm couldn’t work well).
Here are some of the problems with this “edge detection” – like result:
- First of all, it is obvious that the algorithm couldn’t detect all the depth discontinuities. It had barely detected some corners and edges not all the points where there was a change in depth because the samples lie in different surfaces.
- Second, I noticed that slight modifications to the value (that
0.0000013
value found by trial and error) like adding or subtracting 0.0000001 could cause whole surfaces like the floor become black, so it is heavily dependent on what was visible on the screen at the time I performed that trial and error. - Third and most important, this method could not be applied to every scene without modifications as it heavily depends on where we’ve placed the near and far clipping planes (depth values also depend on that).
But the idea to use another algorithm in the surfaces and another in the discontinuities is a very interesting and good one. And as the article mentions later, one doesn’t have to use depths only but can try other resources as well. This gave me the idea to use the normals. The normal directions can give a very good insight about the shape of a surface, and help us detect edges and corners, and they aren’t at all dependent on the visible parts of the scene, the clipping planes or anything like that…
But this post is already too long. So, I am going to analyze my idea to use the normals in Part 3.2.
For the moment, I will only add one video that shows the ambient occlusion using the algorithm described above. You can see that although sometimes the combination seems to give very good results, sudden artifacts appear like misplaced parts of walls in some corners and these artifacts are very visible.
Closing
Applying a different algorithm on the surfaces and a different where we detect discontinuities seems to be a very promising idea but we certainly need to fix the samples classification before attempting further improvements.
[1]: https://eleni.mutantstargoat.com/hikiko/on-depth-aware-upsampling
[2]: https://eleni.mutantstargoat.com/hikiko/depth-aware-upsampling-2
[3]: http://developer.download.nvidia.com/assets/gamedev/files/sdk/11/OpacityMappingSDKWhitePaper.pdf
[4]: https://blogs.igalia.com/itoral/2018/04/17/frame-analysis-of-a-rendering-of-the-sponza-model/
[5]: http://c0de517e.blogspot.com/2016/02/downsampled-effects-with-depth-aware.html
I wonder if the small threshold value is because you’re trying to determine the depth differences using non-linear z values instead of linearized z values. I think such a small difference in linearized Z would make your screen go completely black. Maybe you would get more consistent (less scene-dependent) results that way. Very interesting read regardless!
Good idea! I haven’t tried to use the depths in linear space to see if I could get better results TBH π Instead, I tried to use the normals after I’ve seen that this method doesn’t work well for my scenes. Thank you! (and sorry for the so late reply, I hadn’t seen this comment for ages…)