The big brainstorm on optimizing unions #169

lf94 · 2023-02-03T15:51:57Z

lf94
Feb 3, 2023

The biggest blocker for SDFs for me is the union of shapes. This causes massive amounts of computation per pixel in the fragment shader. We need to come up with a way to still have unions, but without the massive computation.

At it's core, we need: min(a, b) to be minimized in some way. Maybe min is actually bad overall for shader size? Maybe we should be using if? Etc. I'm open to all ideas. Remember that it has to work for both rendering and meshing!

lf94 · 2023-02-03T15:53:25Z

lf94
Feb 3, 2023
Author

The most obvious optimization is what libfive and others do: have a small virtual machine that reads in which hand-picked SDFs render. I understand now what @doug-moen told me, that curv is more powerful than these tools because there is no limitation on the SDF you can create. I think we should stick to this strength.

0 replies

doug-moen · 2023-02-09T05:06:20Z

doug-moen
Feb 9, 2023
Maintainer

First, let's define the "operation tree" as the tree of operations that are used to specify a shape in Curv.

We need a hybrid geometry engine, that supports multiple strategies for representing and rendering shapes. Different shape representations have different tradeoffs WRT speed of rendering, for different operation trees.

In traditional low-level computer programming ("procedural programming"), what you are trying to accomplish and how you accomplish it are entwined together in the same code. This creates complexity and inflexibility, and leads to a lot of work if you want to change the "how" without changing the "why", due to a lot of code needing to change. But, this is initially the simplest way to design a programming language.

Using the procedural approach, we will provide multiple different ways to represent and render the same shape, and you will explicitly choose a strategy when designing a shape.

In declarative programming, which is high-level, it is possible to separate the "what" and the "how" into two separate pieces of code that are not intertwined. In some cases, the developer is responsible for writing the "how" code. (An example might be the proof assistant systems, where you write a proof in pure first order logic or whatever, clean code that just expresses the "what", and then separately you define "tactics" which is the implementation strategy for proving the proof.) Even better and more advanced is where the system examines the operation tree of a shape and automatically chooses an optimal implementation strategy for representing and rendering that shape. One example is a query optimizer in a relational database. Writing an optimizer is traditionally difficult, but there is a relatively new technique called equality saturation and e-graphs that automates the complex coding for the "how" part of optimization, and you just specify the "what" using declarative code, a set of transformation/equivalence rules over expressions. Not only is the coding much simpler, but it is no longer necessary to hard code all of the transformations in the optimizer. Transformation rules over specific operation tree patterns could also be specified using Curv code. But this is for later.

What's the relevance to making unions faster? I claim that, depending on the operation tree for a unioned shape, there are different shape representations and rendering algorithms appropriate for that operation tree.

Next, what are these different representations and rendering algorithms that speed up different special cases of unions?

There is some exploration of this question in the directory ideas/v-rep. Specifically, my initial starting points for speeding up large unions.

0 replies

doug-moen · 2023-02-09T05:39:19Z

doug-moen
Feb 9, 2023
Maintainer

I remembered some more stuff. Different shape representations for speeding up union rendering:

The linear_union function from the large unions paper. Implements something you can do in shadertoy but not easily in Curv right now.
Create a BVH (bounding volume hierarchy) for a large union in order to speed up GPU ray marching (sphere tracing). This is the same data structure used to speed up ray-tracing in video games. Also mentioned in the large unions paper.
An operation that renders an expensive shape (like a large union) to a sampled distance field, represented as a voxel grid. There is a one time initial slowdown before the shape appears on the screen, but then the voxel grid can be rendered efficiently and your frame rate is decently fast. A standard technique in video games that use complex SDFs.

A common theme of §2 and §3 is two-phase rendering, where we "compile" the shape into a data structure during evaluation, then we render using that data structure in the GPU viewer. The tradeoff may be that it's slow to update the data structure when a parameter changes by dragging a slider.

In the literature for raying tracing "acceleration structures" (the BVH), there are alternative BVH representations depending on whether you need to efficiently update the BVH in each frame (the shape is being animated for some reason), or the BVH is static and you want per-frame ray-tracing traversal to be as efficient as possible. A tradeoff, where you choose between two different representations based on the operation tree ("is this shape animated?").

Two-phase rendering is also my idea for splines.

0 replies

doug-moen · 2023-02-09T12:59:38Z

doug-moen
Feb 9, 2023
Maintainer

In the RaphLevian essay, I'm trying to understand Raph's advanced ideas for fast and flexible GPU rendering of 2D graphics. He's using a multi-phase rendering approach, where for performance reasons, each phase is implemented on the GPU, using a compute shader for each stage. This is why I was talking about the need to switch Curv over to using compute shaders for everything back in 2021.

The mTec system, which is for efficiently rendering 3D signed distance fields (unlike RaphLevian) also uses multistage rendering using a chain of compute shaders.

0 replies

doug-moen · 2023-02-09T13:28:31Z

doug-moen
Feb 9, 2023
Maintainer

If you put this all together, it points towards a complete redesign of Curv (Curv 2), in which:

The operation tree is explicitly represented in memory as a data structure, which you can perform pattern matching on. (Not true today.) I have a new language design (the Curv 2 language) which supports this.
Different nodes of the operation tree can use different shape representations. But there has to be a common protocol that makes all of these representations compatible, so that any shape operation can take any shape as an argument. The common protocol is the idea of a signed distance field.
Initially, there are multiple different union operators that use different implementation strategies. Later on, there is a shape optimizer based on equality saturation, that allows you to just say "union", and an implementation strategy is automatically chosen based on pattern matching.
Some of these shape representations use multi-stage rendering. In some cases, based on the operation tree, the first stage data structure is generated once. In other cases, we need to continually regenerate the first stage data structure (eg, once per frame). In the former case, we could generate the first stage data structure on the CPU. In the latter case, we want to generate the first stage data structure on the GPU, using a different compute shader for each stage.

The idea of "multiple shape representations", a hybrid operation tree, needs more elaboration.

The easy approach, which we will start with, requires all nodes in the operation tree to have a Lipschitz(1) distance field. We render using sphere tracing. We support two-phase rendering, where the first phase maps shape parameters onto a data structure, and the second phase is a Lipschitz(1) distance function that interprets the data structure. The voxel grid representation is an example of this.
A more ambitious and flexible design would support non-Lipschitz(1) SDFs, in which case we have a hybrid ray marcher that combines sphere tracing with other strategies, like bisection. This alternate representation would have limitations, similar to the libfive limitations (eg, no fractals). Of course, we could wrap a bisection algorithm inside of a Lipschitz(1) signed distance function, and just use sphere tracing, but that is inefficient. This is a research project. Not exactly sure how to do it.
Adding support for BVH structures requires a more complex ray marcher, especially if we want it to be maximally efficient. Otherwise, using the easy approach, we traverse the BVH in each sphere tracing step, which is inefficient (although still more efficient than what Curv does right now: the SDF would be O(ln N) instead of O(N) for large unions). A hybrid sphere tracing/BVH ray marcher will be straighforward. mTec (on github) is an SDF system that implements this idea.
It might be possible to integrate the Libfive representation into this hybrid framework, but that's a research project that only Matt Keeter is qualified to have an opinion on. I understand that libfive supports non-Lipschitz SDFs, and that the rendering algorithm does in some sense use bisection. But I just don't understand the algorithm well enough. Plus, there are challenges in implementing the algorithm on a GPU. I'm more comfortable pursuing voxel grid sampled distance fields, and BVH structures, because these are both industry standards that many people have worked on, and they are certified GPU friendly. There's lots of research papers and lots of open source that we can look at or use.

0 replies

lf94 · 2023-02-09T14:38:03Z

lf94
Feb 9, 2023
Author

Very nice ideas 🙂 I have one question: How do the expressed ideas affect using curv as a REPL?

The linear_union function from the large unions paper. Implements something you can do in shadertoy but not easily in Curv right now.

This sounds the most appealing to me out of everything. The simplest approach is the approach I'd really like to invest my time in. At the end of the day I'd like anyone to be able to re-implement curv with some confidence...

Optimizing code output and linear_union sounds like it would get curv to a point of performance that I'm looking for - which is about ~500 unions in less than a few seconds. Right now it has a hard time to do ~20.

0 replies

doug-moen · 2023-02-09T14:58:00Z

doug-moen
Feb 9, 2023
Maintainer

Speaking of hybrid shape representations, what about triangle meshes? It should be possible to read an STL file and use it with Curv shape operators. Or, provide a polyhedron primitive like in OpenSCAD.

For this to work, we need some way to create an SDF for the mesh. The mTec project has a tool that converts a mesh into a voxel grid. You do this offline, then import the voxel grid. There is another project (a research paper I read years ago) that creates a BVH for the mesh, then interprets the BVH to produce an SDF. It's GPU friendly, I recall they reported good performance.

4 replies

lf94 Feb 9, 2023
Author

I had a crazy idea for importing shapes a few months back: packing circles into the shape. You specify the smallest circle, the steps, and let it do its best to form the best representation of the incoming mesh. Again though this would involve good union code.

doug-moen Feb 9, 2023
Maintainer

How do the expressed ideas affect using curv as a REPL?

No change.

I had a crazy idea for importing shapes a few months back: packing circles into the shape.

This is a special case of a more general technique: representing shapes using radial basis functions, aka RBFs. There was a bunch of academic research into this back in the 2000's. Doesn't seem to be popular any more, but is related to the idea of representing a shape using a neural network (deep learning of shapes, etc). There's been some recent research into the latter thing, but rendering is hideously slow. My impression of RBFs is that the idea was probably dropped due to performance problems in rendering shapes.

You could speed up rendering of RBFs using a BVH, putting each circle or RBF into a cell in the BVH, but I think you are better off not using circles as an intermediate representation. Instead, put each facet of the polyhedron or each triangle into a BVH cell, which is a well known technique.

Consider this:
https://www.researchgate.net/publication/262215434_Efficient_evaluation_of_continuous_signed_distance_to_a_polygonal_mesh

lf94 Feb 9, 2023
Author

There are no performance repercussions? Isn't the initial "setup" each time going to be slow? (This space of things is completely unknown to me.)

I only mentioned spheres because it seems simple, but I guess you could use triangular prisms too...! And you only need to cover the surface... Sounds like a good idea

doug-moen Feb 9, 2023
Maintainer

The initial setup cost for packing spheres into a mesh, then building a BVH for the spheres, might be greater than the initial setup cost of directly building a BVH for a mesh. In the paper I cited, they did performance testing for alternative designs, and give benchmark results, so it is worth looking at.

They present different designs depending on whether you want to optimize the build time for the BVH, or optimize the query time.

Less annoying link for the paper: https://sci-hub.ru/10.1145/2448531.2448544

Although I don't see it covered in this paper, building a BVH for a mesh seems like a task that could benefit from massive parallelism, so maybe somebody has a GPU algorithm for this.

doug-moen · 2023-02-09T17:56:25Z

doug-moen
Feb 9, 2023
Maintainer

The fastest way I know to render meshes within an SDF scene only works in a special case, where the mesh is a member of a top level union. The root of the operation tree is a union, and the mesh is one of the union arguments.

Let's call this approach zbuffer_union. This operation can only be used at the top level of a scene. The arguments to the union are meshes and SDF shapes. Each mesh is rendered into the Z-buffer using GPU hardware. Each SDF is rendered into the Z-buffer using sphere tracing.

A shape optimizer could convert a union into a zbuffer_union when the preconditions are met.

0 replies

doug-moen · 2023-02-09T18:47:52Z

doug-moen
Feb 9, 2023
Maintainer

The mTec project experimented with a number of ways to speed up SDF rendering on a GPU. For each optimization, they give the % speedup.

The optimization with the greatest benefit is what is sometimes called "tile based rendering" or "beam optimization", and what they call "culling". You partition the display into small tiles (often 16x16, but they used 8x8). There is a compute shader that creates a display list for each tile: the display list contains only the shapes in the top level union that are visible in that tile. A separate compute shader is dispatched to render the display list for each tile.

This speeds up a large top level union because you don't have to evaluate every SDF in the top level union for every pixel in the display.

Since top level unions are important for optimization, a shape optimizer could use algebraic identities to push a non-union top level operation farther down into the tree. For example,

offset 2 (union [shape1, shape2])

could be transformed to

union [offset 2 shape1, offset 2 shape2]

0 replies

doug-moen · 2023-02-11T19:59:09Z

doug-moen
Feb 11, 2023
Maintainer

There are many ideas here, in order to illustrate the scope of what's possible. For a "Curv 2" project I would focus on a small, high value subset, comprising 1. new language, 2. new SubCurv compiler, 3. new graphics engine.

Language Extensions.
- Explicit Operation Tree. Shape constructors (like cube, rotate, etc) preserve the function name and arguments in their representation, and you can pattern match on this. Eg, cube 10 constructs an abstract cube object which prints as cube 10. This will be used for optimizing the operation tree to make it render more efficiently. In a simple case, some shape operations might match against shape constructors and generate different code for different cases. This requires the support of user-defined named abstract data types in the Curv language, and all the code for shape operations will change. Possibly we just create a new incompatible language and migrate to it.
- Compute pipelines. There's a Curv language API for setting up a pipeline of compute shaders, which enables advanced optimizations to be written entirely in Curv.
New shader language compiler.
- Generate function definitions. We currently generate large amounts of GLSL, because all functions are inline expanded. Sometimes it's so much code it overwhelms the GPU driver. Instead of inline expanding functions, monomorphize them (like Rust generics, C++ templates, Zig, Julia, etc). Several versions of the same function may be generated with different argument types, since all Curv functions are generics.
- Iterative union operator. I called this linear_union above, but I think iterative_union is clearer. This will greatly reduce the amount of code generated for a common kind of large union. It won't make the distance function evaluate any faster, unless code bloat is slowing things down.
- Shape function arguments. The obvious way to implement iterative union is to pass a function that maps an iteration index onto a shape. Unfortunately, if you try this, you find that the SubCurv compiler doesn't know how to compile a call to a function that returns a shape.
Use compute shaders. All of the big performance improvements require switching the graphics engine from 2017-era ShaderToy (a single fragment shader) to a pipeline of compute shaders. Although this can be done with GLSL 4.3 or later (not available on MacOS), it would be better to use a modern high performance GPU API, either Vulkan (MoltenVK on MacOS) or WebGPU (which also works with WebAssembly and the Web). The two best optimizations are:
- Tile-based rendering. See mTec on github for a C++ implementation, plus thesis for documentation. Optimizes large top level union.
- Discrete sampled SDFs (voxel grids). See mTec. A new primitive, inspired by OpenSCAD render, preprocesses an expensive static shape by conversion to a voxel grid before we enter the sphere tracer. Also, load voxel grid from a file. mTec has a tool for converting a mesh to a voxel grid, so we can import meshes this way.

1 reply

lf94 Feb 12, 2023
Author

Excellent, this is the type of thing I was thinking about: stages of complexity, each increasing performance!

Does this mean curv currently doesn't really have an AST?

Of all the things I'm probably going to try to modify the current curv code to the best of my ability to generate less code using iterative_union and whatever else is obvious. Otherwise it seems a complete rewrite is in order...

doug-moen · 2023-02-12T03:38:56Z

doug-moen
Feb 12, 2023
Maintainer

Curv has an AST, that's easy. It has a tree-walking AST interpreter, that's easy.
What's hard (for me) is the SC Compiler that translates a shape value into GLSL.
The design I came up with has serious limitations, the code is not great, it needs a rewrite.
Writing compiler backends is not my strong suit.

0 replies

Xiaoyuew · 2023-08-29T08:23:06Z

Xiaoyuew
Aug 29, 2023

I discussed with Doug about this large union problem years ago, and I just went through this thread of discussion. I can provide some of what we previously discovered, especially when you want to accelerate without a complete-rewrite compiler or something.

The main problem is about the length of compiled code. Here's 2 little skills:

Removing colour evaluating by adding a colour specification in the end of the code could help reduce plenty of compiled code.
If you write a union by make_shape and play with it, it is able to reduce the compiled code a lot than the current union operator.
By these 2 methods I was able to do rendering of union of several thousand elements.

If you only want to export the shape to STL, without need of rendering, then you can do union of a very big number, even more than 100,000 (I don't know the exact limitation).

For exporting, you can further accelerate by manually sperate regions ahead of time, and import the sperated information into the curv file, then write your make_shape function region by region, go through the elements involved in each region. I finally get it pretty fast and now large union exporting is not a concern for me, for a large number (several thousand to hundred of thousand, didn't remember exact number) it mostly takes only several mins to export. The largest case I have run took maybe several hours. BTW I used multi-threading when exporting and the -O jit flag.

1 reply

Xiaoyuew Aug 29, 2023

I noticed you have already combined in a glsl-optimizer, that should also help? I don't know how many lines of code can you reduce with the optimizer regarding large union problem.

pca006132 · 2023-09-21T17:38:50Z

pca006132
Sep 21, 2023

I was reading about sdf and come across this discussion. From a compiler perspective, floating point operations are hard to optimize as the operations are generally not associative.

For floating point operations, it is sometimes possible to convert one expression into another with bounded error and faster: https://herbie.uwplse.org/
I think the bounding box and vsize are great information that can be utilized.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The big brainstorm on optimizing unions #169

{{title}}

Replies: 13 comments 6 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

The big brainstorm on optimizing unions #169

lf94 Feb 3, 2023

Replies: 13 comments · 6 replies

lf94 Feb 3, 2023 Author

doug-moen Feb 9, 2023 Maintainer

doug-moen Feb 9, 2023 Maintainer

doug-moen Feb 9, 2023 Maintainer

doug-moen Feb 9, 2023 Maintainer

lf94 Feb 9, 2023 Author

doug-moen Feb 9, 2023 Maintainer

lf94 Feb 9, 2023 Author

doug-moen Feb 9, 2023 Maintainer

lf94 Feb 9, 2023 Author

doug-moen Feb 9, 2023 Maintainer

doug-moen Feb 9, 2023 Maintainer

doug-moen Feb 9, 2023 Maintainer

doug-moen Feb 11, 2023 Maintainer

lf94 Feb 12, 2023 Author

doug-moen Feb 12, 2023 Maintainer

Xiaoyuew Aug 29, 2023

Xiaoyuew Aug 29, 2023

pca006132 Sep 21, 2023

lf94
Feb 3, 2023

Replies: 13 comments 6 replies

lf94
Feb 3, 2023
Author

doug-moen
Feb 9, 2023
Maintainer

doug-moen
Feb 9, 2023
Maintainer

doug-moen
Feb 9, 2023
Maintainer

doug-moen
Feb 9, 2023
Maintainer

lf94
Feb 9, 2023
Author

doug-moen
Feb 9, 2023
Maintainer

lf94 Feb 9, 2023
Author

doug-moen Feb 9, 2023
Maintainer

lf94 Feb 9, 2023
Author

doug-moen Feb 9, 2023
Maintainer

doug-moen
Feb 9, 2023
Maintainer

doug-moen
Feb 9, 2023
Maintainer

doug-moen
Feb 11, 2023
Maintainer

lf94 Feb 12, 2023
Author

doug-moen
Feb 12, 2023
Maintainer

Xiaoyuew
Aug 29, 2023

pca006132
Sep 21, 2023