<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://blog.voxagon.se/feed.xml" rel="self" type="application/atom+xml" /><link href="https://blog.voxagon.se/" rel="alternate" type="text/html" /><updated>2026-03-13T15:34:21+01:00</updated><id>https://blog.voxagon.se/feed.xml</id><title type="html">Voxagon Blog</title><subtitle>A game technology blog by Dennis Gustafsson</subtitle><entry><title type="html">The unlikely story of Teardown Multiplayer</title><link href="https://blog.voxagon.se/2026/03/13/teardown-multiplayer.html" rel="alternate" type="text/html" title="The unlikely story of Teardown Multiplayer" /><published>2026-03-13T00:00:00+01:00</published><updated>2026-03-13T00:00:00+01:00</updated><id>https://blog.voxagon.se/2026/03/13/teardown-multiplayer</id><content type="html" xml:base="https://blog.voxagon.se/2026/03/13/teardown-multiplayer.html"><![CDATA[<p>Multiplayer has been the single most requested feature for Teardown ever since before its initial release. Synchronizing physics over the network is already known to be hard, and on top of that we have a completely dynamic, destructible world with full modding support. For a long time, we considered the whole idea unrealistic.</p>

<p>Despite the scepticism, we did an internal experiment back in 2021, using a naive approach to synchronize moving objects and send altered voxel data as objects were destroyed. It used enormous amounts of bandwidth and completely choked the connection when large objects were destroyed. It was purely a learning project and never reached a usable state, but it taught us where the bottlenecks were.</p>

<p>Around the same time, a community project called <a href="https://github.com/TDMP-Team" target="_blank">TDMP</a> added rudimentary multiplayer support through reverse engineering and DLL injection. Despite being a bit janky, it completely blew my mind. It was an incredible technical achievement by the people involved. The mod mostly synchronized player position and player input, and since the engine isn’t deterministic, it could easily get out of sync, especially with destruction.</p>

<h1 id="a-semi-deterministic-approach">A semi-deterministic approach</h1>
<p><img src="/assets/2026-03-13-multiplayer.jpg" alt="" /></p>

<p>As we started bringing more people on board, we did a more serious investigation into a proper multiplayer implementation in late 2022. We knew we wanted perfect world sync. Anything else would quickly make simulations diverge in the chaotic world of Teardown. Sending large amounts of voxel data wasn’t an option because of bandwidth, so we had to rely on determinism. Early on, I dismissed the idea of full determinism for the entire engine (a view I have since reevaluated), so it needed to be a hybrid approach: destruction done deterministically, while most other things use state synchronization.</p>

<p>For the longest time (and for good reasons), floating point operations were considered unsafe for deterministic purposes. That is still true to some extent, but the picture is more nuanced than that. I have since learned a lot about floating point determinism, and these days I know it is mostly safe if you know how to navigate around the pitfalls. I won’t cover them all here, but I hope to do that in another post, because there’s a lot of confusion around this topic.</p>

<p>At the time, I decided to rewrite the destruction logic in fixed-point integer math, which is fairly straightforward given that we’re dealing with discrete voxel volumes. But there’s much more to destruction logic than cutting out voxels on a regular grid. Object hierarchies may separate, new objects can be created and joints can be affected or reattached. A lot of this still involves floating point math, so each breakage event is split into a stream of deterministic commands that are replicated on all clients: “cut hole in this shape at voxel coord x,y,z”, “change ownership of that shape”, “reconnect joint to this shape”, etc.</p>

<p>Our implementation does not use dedicated servers. The player hosting a game also acts as server for that session, so all mentions about the server below is really just the player who hosts the session.</p>

<h1 id="reliable-and-unreliable">Reliable and unreliable</h1>
<p>As long as the deterministic commands are applied to the world in exactly the same way, in exacly the same order, the resulting changes will be identical across all machines. The bandwidth requirements are small because commands are the same regardless of object size. Anything that modifies the scene content, such as spawning new objects or recoloring objects, is implemented using the same approach. We put all these commands on a <em>reliable</em> network stream, where everything is guaranteed to arrive in order and nothing is missed, just like a traditional data stream.</p>

<p>For anything that doesn’t affect the structure or contents of the scene, such as object transforms, velocities, and player positions, we use state synchronization with eventual consistency. For every update, the server selects a number of objects that should be synchronized and sends their state to the clients. The server keeps a priority queue to ensure everything is eventually sent, prioritizing objects visible to the player while staying within the allowed data budget (in Teardown, around one Mbit per client). Because nearby objects differ per client depending on player position, the server has to maintain this queue and make these decisions per client.</p>

<p>These packets are sent <em>unreliable</em>, meaning they are not guaranteed to arrive, and those that do arrive are not guaranteed to arrive in order. This is what the messy reality of internet packets looks like. Protocols like TCP are layers on top that, responsible for maintain ordering and resend data that never arrived. With unreliable state synchronization, you have to handle some of that yourself, but in many situations you don’t need strict ordering. If a packet gets lost, a newer packet with more recent state will arrive soon anyway. Many other games use a similar approach, so this aspect of our implementation is fairly traditional.</p>

<p>Each client runs a local simulation, as it normally would, but once new state arrives from the server the affected objects are corrected to keep everything in sync. In many cases, locally simulated objects are nearly identical to what comes from the server and the correction is invisible. But in complex situations with many simulated objects, the priority queue has to work harder, so that more objects will get corrected at a lower frequency, which can cause visible snapping.</p>

<h1 id="scripting">Scripting</h1>
<p>We knew from the start that the multiplayer version had to support scripting and modding, but scripts now had to be aware of the new architecture, where scene changes happen on the server and are automatically distributed to clients. Some script parts still have to run on clients, especially UI and overlay graphics. We ran experiments to automate this by running the exact same script on both server and clients, while ignoring certain API calls where they weren’t relevant. That turned out rather clunky, and a bad fit for certain use cases, so it was eventually dropped.</p>

<p>We also didn’t want to split everything into separate server and client scripts, so we landed somewhere in the middle: client and server parts exist in the same script, with some machinery in place (shared state table and remote calls) to simplify communication between them. It’s a pretty unusual multiplayer scripting approach, but it has served us well and the modding community seems to get the concept. There’s a tutorial available <a href="https://www.youtube.com/watch?v=qggrJi4GLEE&amp;list=PLFDYx1brPkKTeEMUhXqg7cbwZKtNuuVkl" target="_blank">here</a> with more details.</p>

<h1 id="terminal-and-ui">Terminal and UI</h1>
<p><img src="/assets/2026-03-13-screen.jpg" alt="" /></p>

<p>The in-game terminals were tricky to get right, because they don’t follow the conventional flow. Terminals are part of the scene, not just a UI layer on top, and can be controlled by any player, yet the interaction should be visible for everyone and the resulting actions should happen on the server.</p>

<p>We solved this by running terminal scripts entirely on the server, recording draw commands which are transfered to the clients using delta-compression. Hence, we stream the resulting terminal image to the clients, but not in the form of compressed video. Instead we stream the UI draw commands that build up the image, or to be more specific: the draw command <em>delta</em> from the previous frame to the next. The idea is similar to what the <a href="https://en.wikipedia.org/wiki/X_Window_System" target="_blank">X Window System</a> uses, allowing graphical user interfaces for remote applications on thin clients. The draw commands are often similar frame-to-frame even while animating, so the delta is usually tiny.</p>

<p>Once the system was in place, we started using it on the main menu as well, so that everyone can see what the host is doing when selecting level, game modes and mods for a session.</p>

<h1 id="the-big-merge">The big merge</h1>
<p>Our initial idea was that multiplayer Teardown should be a separate game. So while implementing the first version, we cleaned up the codebase and made reasonable adaptations to suit the new architecture. Meanwhile, our parent company at the time, Saber Interactive, was hard at work on console ports, rewriting many engine internals, adding localization support, reworking the UI framework, and optimizing many subsystems for better performance. On our end, we were also adding support for a third-person camera controller and an animation system. The situation was already messy, but it was about to get much worse.</p>

<p>As time went by, we reevaluated whether multiplayer really should be a separate game after all, and eventually concluded it would be better to retrofit multiplayer into the existing game. If successful, it would keep the community unified around one game and simplify porting existing mods. On the other hand, our multiplayer version had <em>intentially</em> diverged substantially from the main branch with console ports, optimizations and third-person camera support. Merging them would be a herculean task and a long-running effort. The console and DLC release lineup was scheduled for at least another year on the single-player version. To this day, I’m still unsure if it was the right decision to merge, but that’s the road we chose.</p>

<p>The merge itself took almost three months to complete, and for more than a year we had to manually merge changes weekly from the main branch onto our multiplayer branch to keep everything in sync. It was initially done by us, but eventually more people from Saber got involved in the multiplayer development.</p>

<p>For a long time we used Saber’s backend for network transport (shared with other Saber games), but following the switch in ownership from Saber to Coffee Stain, we swapped it out for the Steam Networking back-end.</p>

<h1 id="backwards-compatibility">Backwards compatibility</h1>
<p>The fact that Teardown is a released game and a modding platform with tons of existing content is something we couldn’t ignore. Backwards compatibility was a requirement, and it was also the single most time-consuming part of the implementation. Because of the deep conceptual changes it means to support multiple players, there was simply no way to make older scripts automatically support multiplayer, but the game still had to load and run existing mods in single-player mode (we actually do support loading levels from old mods into multiplayer, but scripts are disabled).</p>

<p>We wanted to avoid maintaining multiple script implementations, so we did our best to keep the existing API and add optional player-id parameters where applicable. There are a few exceptions, but for the most part the API is backwards compatible and old scripts can still run on the multiplayer code path.</p>

<h1 id="join-in-progress">Join in progress</h1>
<p>The next big hurdle was late joins. For a long time we dismissed also this as unrealistic. The scene in Teardown changes constantly, and since parts of the network implementation rely on determinism, it is critical that new clients join on <em>exactly</em> the same scene. There are essentially three ways to solve this:</p>

<ol>
  <li>
    <p>Serialize the entire scene, compress the data, and pass it to the joining client. We already do full scene serialization for quicksave and quickload, so this is possible, but the files are large: 30-50 MB is common, often more, so transfer would take a while.</p>
  </li>
  <li>
    <p>Serialize only the objects that changed since the scene was loaded, then compress and transfer those to the client. This is more complex and requires careful tracking of changes - another potential source of bugs. It would reduce data size, but it can still be quite large depending on the level of mayhem before the join.</p>
  </li>
  <li>
    <p>Record the deterministic command stream, pass it to the joining client, and have that client apply all changes to the loaded scene before joining the game. The amount of data is much smaller than in option 2 since we’re not sending any voxel data, but applying the changes can take a while since it involves a lot computation.</p>
  </li>
</ol>

<p>Once we started investigating option 3 we realized it was actually less data than we anticipated, but we still limit the buffer size and disable join-in-progress when it fills up. This allows late joins up to a certain amount of scene changes, beyond which applying the commands would simply take an unreasonably long time.</p>

<h1 id="development-tools">Development tools</h1>
<p>Multiplayer testing and debugging can be awful for developers. What used to be a single button press to debug the game quickly turns into a repetitive dance involving launching multiple clients and clicking through menus to connect them. On top of that, with multiple processes, debugging gets much more complex. Either being limited to debug a single instance, attach to a running process (and thereby guessing which process an instance is) or set up a more complex workflow where you can debug multiple processes at the same time.</p>

<p>To avoid this, we tried running several game instances in the same process and window, automatically connecting at startup and ticking them sequentially each frame, redirecting input to the active instance. It was a big relief, both because it made debugging simple again and because it gave the content team a reasonable test environment for new game modes. However, due to historical design choices and ongoing merge complexity, at one point we just couldn’t maintain this path and had to fall back to separate processes and multiple windows again. They still connect automatically (using a TCP layer instead of Steam Networking), and we do what we can to keep it coherent, but it pains me to see the single-window implementation go away because it was so much nicer to work with.</p>

<h1 id="conclusion">Conclusion</h1>
<p><img src="/assets/2026-03-13-multiplayer2.jpg" alt="" /></p>

<p>Looking back at the complexity of the task and the unfavorable circumstances we had at hand, I’m really proud that we finally pulled it off. That said, I’m not the one to take credit for this achievement. I was involved in the initial design and implementation, as well as the big merge, but for the lion’s share of the work, the really hard and tedious parts that actually made it work, I’ve only been peripherally involved, shifting my focus towards our new engine.</p>

<p>The multiplayer implementation in Teardown isn’t particularly elegant; it’s just <em>a lot of hard work</em> and <em>a lot of code</em>. It’s a mix of many techniques with tons of special cases for backwards compatibility. It has been a useful learning experience, one to make me think deeply about simplified approaches to multiplayer implementations, which we’re now trying out in the new engine. This post is already long, so I’ll save the details for the next one. Meanwhile, please enjoy a game of Teardown multiplayer and reflect on the hurdles we’ve gone through to make it happen!</p>]]></content><author><name>Dennis Gustafsson</name></author><category term="gamedev," /><category term="multiplayer," /><category term="network," /><category term="teardown" /><summary type="html"><![CDATA[Multiplayer has been the single most requested feature for Teardown ever since before its initial release. Synchronizing physics over the network is already known to be hard, and on top of that we have a completely dynamic, destructible world with full modding support. For a long time, we considered the whole idea unrealistic.]]></summary></entry><entry><title type="html">Thoughts on ECS</title><link href="https://blog.voxagon.se/2025/03/28/thoughts-on-ecs.html" rel="alternate" type="text/html" title="Thoughts on ECS" /><published>2025-03-28T00:00:00+01:00</published><updated>2025-03-28T00:00:00+01:00</updated><id>https://blog.voxagon.se/2025/03/28/thoughts-on-ecs</id><content type="html" xml:base="https://blog.voxagon.se/2025/03/28/thoughts-on-ecs.html"><![CDATA[<p>I mentioned in my year summary that I have a lot to say about ECS, and got several requests to write more about it, so I’ll do an attempt to write up my thoughts here.</p>

<p>First and foremost, I have very limited experience with ECS. I have implemented some prototypes using it, but my experience comes primarily from reading about it online and talking to people who have actually used it, so take all of this with a grain of salt. I’d be very interested to hear other peoples thoughts on it and how it aligns with my own conclusions.</p>

<p>The Entity Component System-model, or ECS for short, has been around for a few decades and it comes in many different flavors. In most implementations, an entity is just a unique identifier (a number). One or more components can be attached to each entity, but usually only one component of the same type per entity. A component is typically just data (a struct). Systems implement all logic and operate on one or several component types. Systems have an update function that gets called periodically, filters out relevant components to operate on and performs some operation. Systems are generally not aware of each other, though there are probably a few exceptions in most real worls scenarios.</p>

<p>It is often claimed that ECS would lead to increased performance because of data locality and cache coherency, more on that later on. Others use ECS primarily for code structure and separation of responsibilities, which could lead to improved reusability and code quality.</p>

<h1 id="the-problem-at-hand">The problem at hand</h1>

<p>ECS is often seen as the universal solution to deep, complex class hierarchies using inheritance. The argument usually goes something like this: “What if a monster in my game has the ability to walk, but also the ability to fly? Which base class should it inherit from? With ECS an entity can be given different capabilities arbitrarily by using composition instead of inheritance.”</p>

<p>It’s absolutely true that deep class hierarchies often run into these sort of issues and should be avoided (especially multiple ineheritance, which I think everyone now finally agree was a bad idea to begin with). However, ECS being the <em>only</em> alternative is simply not true. I have not seen anyone implement behaviors with deep class hierarchies for decades, at least not in the games industry. There are many other ways to do it. One pretty common alternative is to use a much more shallow class hierarchy with one common base class and all the different game objects immediately below that, occasionally using a shared base class where it makes sense, or using composition where appropriate. Others (myself included) primarily use scripts and tags to implement and composite behaviors.</p>

<h1 id="ecs-performance">ECS performance</h1>
<p>ECS is often seen as a technique that is inherently associated with good performance. I don’t think this is necessarily the case. ECS is usually implemented with components being packed nicely together in memory for good data-locality when looping over them. In reality though, any meaningful system will operate on multiple components, now scattered in memory instead of being close together. Components are only packed tighly in memory if you read from <em>one</em> component type. As soon as you read from multiple, this is no longer the case. A movement system might read colliders from the collision component, velocity from the physics component and modify the position from the transform component. That’s three potential cache misses instead of one.</p>

<p>Some ECS systems has a solution to this called <em>archetypes</em>, which is a rather complex system to store all components that belong to the same entity close in memory. Hang on a second, isn’t that exactly what the good old struct already does?</p>

<p>For archetypes to work with dynamic composition, data has to move around when new components are added or removed. This, of course, implies a performance hit, but it also means that any pointers to component data will be invalid after adding or removing components. The latter is particularly problematic and can lead to complicated bugs in systems altering components while iterating.</p>

<p>ECS encourages frequent iteration over combinations of components. In some game genres, like simulation games for instance, this is probably a common pattern, but in most games I have worked on, looping over lots of objects is generally something that can and should be avoided altogether. Instead of looping over all tripwires in a level to see if the player crosses one, it’s more efficient to use a spatial query around the player, collecting all tripwires that are close and check only those.</p>

<p>On top of this, ECS requires lots and lots of lookups for entity-component mappings. It’s usually a hashmap or some form of indirection table, but in either case it certainly doesn’t come for free. However, in defense of ECS I must add that if it’s used as an alternative to implement behaviors in scripts, it’s likely a big performance win regardless of implementation.</p>

<h1 id="cold-data">Cold data</h1>
<p>Having the ability to associate cold data (data that is rarely used) to any object is a great idea. Memory locality is a real issue. Keeping your hot data small is a big performance win and ECS makes this very easy. Just attach any cold data as extra components to your objects and they will never be touched unless queried. Unfortunately this clashes completely with the idea of archetypes, where components are layed out linearly in memory, just as if they were a struct. The ideal scenario would be to use archetype storage for hot components and sparse storage for cold components, but that would be a very complex ECS implementation.</p>

<h1 id="data-driven-design">Data-driven design</h1>
<p>In my opinion, this is the best (and possibly even the only) case for ECS, where it really has a clear advantage. Say the intent is to composite different behaviors and let the designer mix and match at runtime in an editor. This maps very well to ECS, since the dynamic nature of the system usually does not require entity-component relationships to be hard-coded. A lot of boilerplate components would likely be needed for any meaningful game object, and crucial systems might expect certain combinations of components to do any meaningful work, so a true mix-and-match-anything-you-want might not be as attainable as you might first think.</p>

<h1 id="debugging">Debugging</h1>
<p>One area of ECS that is rarely discussed is how complicated they are to debug when something goes sour. Visual debuggers are excellent at displaying values in structs and following pointers. With ECS you have integer identifiers that are associated with component data through indirection. If using archetypes, the component data is dynamically composed into a memory blob with no debug info whatsoever. While it <em>may</em> be possible to achieve walkable data structures using natvis magic, it would certainly be a very complex and fragile setup.</p>

<h1 id="pros-and-cons">Pros and cons</h1>
<p>To summarize here are the pros and cons of ECS, in my opinion:</p>

<p><strong>Pros</strong></p>
<ul>
  <li>
    <p>Entities can be dynamically composed at runtime in a data-driven way, for example via an editor. It also means that you can <em>alter</em> capabilities of existing entities at runtime by adding or removing components.</p>
  </li>
  <li>
    <p>It provides a framework for cold or optional data. It’s a great way to avoid expanding your base classes with data that only a fraction of all entities use or data that is rarely used. This can reduce the size of your objects and improve cache locality. Unfortunately this is rather complicated to combine with the archetype model.</p>
  </li>
  <li>
    <p>It provides a mental model that forces you to structure your code in a data-oriented way (this can be done also without ECS, but may require some diciplin)</p>
  </li>
</ul>

<p><strong>Cons</strong></p>
<ul>
  <li>
    <p>Added complexity. You need to understand the details and limitations of your ECS architecture in order to implement features and understand the code.</p>
  </li>
  <li>
    <p>The dynamic nature of ECS requires a lot of lookups between entities and their associated components. Depending on the method and implementation this can lead to performance problems that are difficult to detect, since they are scattered all over the code.</p>
  </li>
  <li>
    <p>Hard to debug</p>
  </li>
</ul>

<p><strong>Neutral</strong></p>
<ul>
  <li>
    <p>Performance can be better or worse depending on the implementation and the alternative considered.</p>
  </li>
  <li>
    <p>The code structure can be better or worse depending on the implementation.</p>
  </li>
</ul>

<h1 id="sensible-ecs">Sensible ECS</h1>
<p>I’d argue that a large portion of what ECS brings to be table can be implemented in a much simpler way, that is equally data-oriented, but does not require the complex, dynamic database that ECS really is. If your game does not require dynamic composition at run-time, you can simply create your entity types statically using composition in the following manner:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>struct Monster
{
	Transform transform;
	Velocity velocity;
	Collider collider;
	PathFinding pathFinding;
};
</code></pre></div></div>

<p>There is no need for inheritance here, just include whatever components that make sense for what you are implementing. Note that all data for a monster is layed out linearly in memory instead of being scattered.</p>

<p>Then instead of a function updating monsters:
<code class="language-plaintext highlighter-rouge">void updateMonster(Monster* monster);</code></p>

<p>You can create reusable, system-like update functions for relevant combinations of components:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>void updateMovement(Transform* t, Velocity* v, Collider* c);
void updatePathFinding(Transform* t, Collider* c, PathFinding* p);
</code></pre></div></div>

<p>Note that these functions operate only on components, without knowing which entity they belong to or how they are stored. Hence they can be reused for any entity type that use the same components without using inheritance.</p>

<p>This simple approach requires more explicit function calling. Nothing is automatic, so if you want to update movement, you have to manually do that for all the relevant entities:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>//Update movement for monsters
for(int i=0; i&lt;monsterCount; i++)
	updateMovement(&amp;monster[i].transform, &amp;monster[i].velocity, &amp;monster[i].collider);

//Update movement for peasants
for(int i=0; i&lt;peasantCount; i++)
	updateMovement(&amp;peasant[i].transform, &amp;peasant[i].velocity, &amp;peasant[i].collider);
</code></pre></div></div>

<p>This can get very explicit if you have a lot of entity types, but in my experience explicit is a <em>good thing</em>. The code is easy to read and understand, it literaly does exactly what it says from top to bottom, instead of implicitly calling things automatically. It interacts perfectly with any debugger, and most importantly you are in control over the exact control flow of the update loop and the order in which everything is called. What if you have an entity type with the same components, but that doesn’t need a movement update? No problem, you can easily do that be simply not calling updateMovement for such entities.</p>

<p>What about cold/optional data that ECS does so well? There is nothing stopping us from adding a similar system alongside the one described above. We can make all entities inherit from a common base with a unique identifier and create the same indirection tables as you normally would in ECS, but only use it for cold data, instead of using it for everything. Then we have hot data aggregated tightly in memory, while cold data is floating around somewhere else and only accessed when needed.</p>

<p>The main limitation is of course that entity types are locked down at compile-time and cannot be changed dynamically. Wether this is problematic or not depends on the use case. If you are making an actual game, not a generic engine, this simple approach will probably take you all the way to the finish line. If the goal is to create a data-driven, generic engine where designers can dream up composite behaviors in an editor, a full ECS implementation is probably the better choice, but be aware of the complexity it brings.</p>]]></content><author><name>Dennis Gustafsson</name></author><category term="gamedev," /><category term="architecture," /><category term="ecs," /><category term="engine" /><summary type="html"><![CDATA[I mentioned in my year summary that I have a lot to say about ECS, and got several requests to write more about it, so I’ll do an attempt to write up my thoughts here.]]></summary></entry><entry><title type="html">Year summary</title><link href="https://blog.voxagon.se/2024/12/29/year-summary.html" rel="alternate" type="text/html" title="Year summary" /><published>2024-12-29T00:00:00+01:00</published><updated>2024-12-29T00:00:00+01:00</updated><id>https://blog.voxagon.se/2024/12/29/year-summary</id><content type="html" xml:base="https://blog.voxagon.se/2024/12/29/year-summary.html"><![CDATA[<p>As the year concludes I wanted to give a small update on what I’ve been working on. From a technology perspecitve this has been one of the most intense years in a long time! I’ve had the opportunity to spend most of my time researching new technologies and algorithms for our next voxel engine. Here’s a summary:</p>

<h1 id="new-renderer">New renderer</h1>
<p>The first half of this year I spent researching advancements in path tracing, learning about Vulkan and hardware raytracing. I implemented my findings as a prototype renderer for the new engine. It relies heavily on hardware raytracing (using intersection shaders, still no triangles!), importance sampling and removes a lot of the limitations we’re seeing with our current engine. For instance it no longer uses a shadow volume, which enables unlimited world size, sharp shadows, no light leakage and vast amounts of moving objects at a very small cost. It also enables true reflections and global illumination, since all scene information is available on the GPU. For denoising I’ve been using NVIDIA DLSS Ray Reconstruction, and overall I’ve been very pleased with the results. It tends to be a bit aggressive in some scenarios, but still very impressive technology. The prototype I made has since been taken over by my colleague <a href="https://gaberundlett.com/" target="_blank">Gabe Rundlett</a>, who has done a lot of optimizations, improvemenents and cleanup.</p>

<p><img src="/assets/2024-12-29-path-tracing.png" alt="" />
<em>Screenshot from original prototype</em></p>

<h1 id="deformable-voxels">Deformable voxels</h1>
<p>I’ve had this idea for some time, that voxels wouldn’t necessarily need to conform to a perfectly orthogonal box. I remember doing some experiments for Teardown to enable trees swaying in the wind, but I could never really get it to work properly with lighting beacuse of the shadow volume. Raytracing voxels inside an arbitrarily skewed bounding box was a <em>much</em> harder problem than I anticipated. The only related information on this problem that I could find was <a href="https://www.geometrictools.com/Documentation/PerspectiveMappings.pdf">this</a> paper by David Eberly of  Geometric Tools. If you haven’t yet checked out <a href="https://www.geometrictools.com/">Geometric Tools</a>, it’s an invaluable source of information for all kinds of geometric problems. After e-mail conversations with Mr Eberly we came to the conclusion that an invertible affine transformation between a unit cube and an arbitrarily deformed cuboid does not exist, even if faces are co-planar (paper in link now updated). However, I managed to come up with an approximate solution that is good enough for my use case. It has limitations, but I’m quite happy with the results and can’t wait to integrate it properly into the new engine!</p>

<p><img src="/assets/2024-12-29-deformable.png" alt="" />
<em>A static image doesn’t really do these sausages any justice. <a href="https://x.com/voxagonlabs/status/1800149271918059937" target="_blank">Link to video.</a></em></p>

<h1 id="framework-update">Framework update</h1>
<p>The base framework, including vector math, containers, IO streams, threading, etc has gotten a major overhaul and cleanup. We had long internal discussions on wether we should reduce it in some areas and rely more on STL. I did some comparisons and the implementations in STL are for the most part really good these days (at least on msvc), but the long compile times, terrible debug mode performance and lack of control over memory allocations made us stick to our own implementations also going forward. One area I spent a lot of time with is the task manager, which distributes tasks on multiple threads. The Teardown engine is based around data parallelism, which I think has served us well and is something we want to keep, but in order to be efficient on modern machines it requires a task manager with fast synchronization and low overhead. In order to improve on this and fully embrace data parallelism going forward, I rewrote the task manager to use atomics and lock-less programming for fine grained task distribution and synchronization (instead of relying on operating system primitives, such as mutexes and condition variables). Lock-less programming is hard, and you never really know if you got it right, but it does make a big difference when used correctly.</p>

<h1 id="new-engine-architecture">New engine architecture</h1>
<p>We have a couple of really talented programmers on the team who speaks very fondly of ECS and convinced me to read up on it and give it a try. For a long time I’ve not been that keen on ECS and I still can’t say that I am, but I have really appreciated learning about it. It does have some very compelling properties, both from an architectural and performance perspective, and while the new engine is not a “pure ECS” implementation, it does use the concept of components and systems where it makes sense. I have a lot to say about ECS and my experiences with it, but that’s not the topic of this post. The new engine architecture is also centered around a new “virtual property” system that connects entities to the editor UI, serialization, undo/redo and the scripting system in a relatively clean and unified manner.</p>

<h1 id="sparse-voxel-objects">Sparse voxel objects</h1>
<p>Teardown uses a dense voxel format, where every shape stores voxel data in an uncompressed, regular 3D grid. This is very fast to access, but since it’s just a grid, regions of empty space within an object take up the same amount of memory as regions that actually contain voxels. This requires assets to be split up into multiple smaller shapes in order to get a better bounding box fit. The downside of this is of course that you end up with a lot of shapes, which also has an overhead. The new engine features a sparse voxel format that splits each shape into 8x8x8 voxel “chunks”, which are tracked with a 3D bitmap. This saves a lot of memory and enables shapes to put voxels everywhere inside the shape without worrying too much about emtpy space. From a technical perspective, the chunk-based format is more complex to work with, but it offers a lot advantages over the dense format - updates are fast, most algorithms can be implemented more efficiently, and it removes a lot of restrictions on voxel content. A welcome side effect is that shapes can now have voxels on negative coordinates, which greatly simplifies both modeling and procedural modification to existing objects.</p>

<h1 id="physics-overhaul">Physics overhaul</h1>

<p>Most recently and still ongoing I’ve been working on improving the physics simulation in the new engine. There has been several advancements in the field of game physics since I designed the physics engine for Teardown and I’ve been incorporating a selection of these improvements into the new engine. <a href="https://x.com/erin_catto">Erin Catto</a>, author of Box 2D has recently done very extensive research in the field and his findings have been extremely valuable. If you are interested, you can read about his work in great details on the <a href="https://box2d.org/posts/">Box2D blog</a>. The new physics engine now uses substepping instead of solver iteration (a method described <a href="https://mmacklin.com/smallsteps.pdf" target="_blank">here</a> and sometimes refered to as “Temporal Gauss Seidel”) and features a parallel solver that can solve large piles of objects on multiple threads. There have also been improvements to contact generation and broad phase.</p>

<p><img src="/assets/2024-12-29-physics.png" alt="" />
<em>A large pile of voxel objects solved on multiple threads</em></p>

<p>If there is any particular area you’d like to know more about, let me know and I’ll go over it in a future blog post. As always, you can reach me on X <a href="https://x.com/voxagonlabs">@voxagonlabs</a></p>]]></content><author><name>Dennis Gustafsson</name></author><category term="gamedev," /><category term="engine," /><category term="graphics," /><category term="vulkan," /><category term="physics" /><summary type="html"><![CDATA[As the year concludes I wanted to give a small update on what I’ve been working on. From a technology perspecitve this has been one of the most intense years in a long time! I’ve had the opportunity to spend most of my time researching new technologies and algorithms for our next voxel engine. Here’s a summary:]]></summary></entry><entry><title type="html">Streaming containers</title><link href="https://blog.voxagon.se/2023/06/01/streaming-containers.html" rel="alternate" type="text/html" title="Streaming containers" /><published>2023-06-01T00:00:00+02:00</published><updated>2023-06-01T00:00:00+02:00</updated><id>https://blog.voxagon.se/2023/06/01/streaming-containers</id><content type="html" xml:base="https://blog.voxagon.se/2023/06/01/streaming-containers.html"><![CDATA[<p>This blog post is about an issue I’ve run into several times, but still unsure how to solve in a nice way. Consider this more of an “organize-my-thoughts” type post, rather than a solution to the problem.</p>

<p>In many cases I end up in a situation where I want to pass a bunch of data from one subsystem to another. The data is organized into different types using structs or classes. Unless there is more than one type of data, they may have different sizes, and the most common use case, at least for me, is that such objects share a common base. For example different types of constraints generated by collision detection and joints fed into the rigid body solver or draw calls fed into the graphics system, but let’s take a traditional event system as a simplified example:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>struct Event
{
	unsigned char type;
	int flags;
}

struct SoundEvent : public Event
{
	Sound* sound;
	Vec3 worldPos;
	float volume;
};

struct CollideEvent : public Event
{
	Object* a;
	Object* b;
};
</code></pre></div></div>

<p>These events can be produced from various locations in the code and some data structure is needed to store them. At some point we need to traverse and dispatch them to listeners. In this case, one could imagine some direct callback approach, but it can get messy in a threaded environment, so let’s say we decide it’s a good idea to queue them up and dispatch them later.</p>

<p>There are multiple ways we could store these events in memory. We could, for instance, allocate each new event on the heap, store their pointers in an array, go through that when we dispatch and then delete them, but assuming this happens every frame in a game loop, that could potentially be a lot of allocations every frame, and they would all be scattered in memory, leading to a lot of cache misses.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>void addEvent(Event* event)
{
	events.push_back(event);
}

void dispatch()
{
	for(int i=0; i&lt;event.size(); i++)
	{
		dispatch(events[i]);
		T_DELETE(events[i]);
	}
}
</code></pre></div></div>

<p>Another solution is to keep a separate array for each object type, so each subclass of Event would have it’s own container in EventSystem. This will keep all events of the same type nicely packed in memory, but it leads to more code. Probably not a big problem if the number of different types is small, but once it get bigger, the code will be harder to maintain:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>void addEvent(Event&amp; event)
{
	if (event.mType == SOUND)
		soundEvents.push_back((SoundEvent&amp;)event);
	if (event.mType == COLLIDE)
		collideEvents.push_back((CollideEvent&amp;)event);
}

void dispatch()
{
	for(int i=0; i&lt;mSoundEvent.size(); i++)
		dispatch(soundEvent[i]);
	for(int i=0; i&lt;mCollideEvent.size(); i++)
		dispatch(collideEvent[i]);
}
</code></pre></div></div>

<p>One could of course also imagine multiple addEvent methods that take different types, but the concept and the amount of code duplication is similar. This solution also has the drawback of not maintaining the order in which the events were issued, which may or may not be a problem.</p>

<p>If the order is a important, we could keep a separate array of pointers into the other arrays that can be traversed when dispatching, adding an extra level of indirection, but it’s a bit awkard and sort of counteracts the whole idea of packing the objects tightly in memory.</p>

<p>A third option would be to group the different types into a single uber-type, for instance using a union of structs:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>struct Event
{
	unsigned char type;
	int flags;
	union
	{
		struct 
		{
			Sound* sound;
			Vec3 worldPos;
			float volume;
		} sound;
		struct 
		{
			Object* a;
			Object* b;
		} collide;
	} data;
};
</code></pre></div></div>

<p>This would allow us to easily put them all in the same vector and traverse that linearly when dispatching. But this obviously has the downside of all events now being as large as the largest subtype, wasting a lot of memory.</p>

<p>What we ideally want is actually really simple. We want to store differently sized objects in a tightly packed lump of memory that can later be traversed linearly. Think of it as a stream of data, very similar to how state is usually serialized to disk.</p>

<p>The problem is that C/C++ is ill-equipped for this access pattern, since built-in arrays and containers usually operate on one specific type. I believe the historical reason for this is because older CPUs couldn’t access certain types unless they were properly aligned in memory. Unaligned reads on the Alpha or SPARC CPU (and ARM CPUs up until ARMv6) may cause undefined behavior or simply crash on unaligned pointer access. Struct/class members in C and C++ are automatically aligned (padded with unused data) to circumvent this issue, and the size of a struct itself is also adjusted so that the largest member will always be aligned when stacked on top of each other. This assumes that only structs of the same type are stacked, so if we manually stack structs of different types on top of each other in memory, alignment goes out the door.</p>

<p>On modern 64-bit architectures, alignment is less of an issue, <a href="https://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/">if any at all</a>. The CPU will gladly read and write unaligned pointer access (it may sound simple, but there’s a good amount of complex circuitry to make it happen), but there might be a slight performance penalty in some specific cases. I have not done enough research on this, but it seems that unaligned reads and writes will only get a performance penalty if the access happens to straddle two cache lines. On the contrary, performance for a lot of software running on modern CPUs is restricted by accessing memory. Reducing memory footprint will generally increase performance, so it’s a balancing act.</p>

<p>So, if we would put our events of different types in a contiguous lump of memory <em>and</em> need to respect alignment requirements, it could be implemented with something like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>template &lt;class T&gt; void addEvent(const T&amp; event)
{
	memcpy(mBuffer+mOffset, &amp;event, sizeof(T));
	mOffset += sizeof(T);
}

addEvent(mySoundEvent);
</code></pre></div></div>

<p>This solves the problem of lining up objects of different sizes in memory, but how do we get them back when it’s time to dispatch? Since the type is stored in the first member of the base type, we can peek at the first byte to determine the type and use a similar template method for retrieving the data.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>unsigned char getNextType()
{
	return mBuffer[mOffset];
}

template &lt;class T&gt; void getNextEvent(T&amp; event)
{
	memcpy(&amp;event, mBuffer+mOffset, sizeof(T));
	mOffset += sizeof(T);
}

unsigned char t = getNextType();
switch(t)
{
	case SOUND:
	{
		SoundEvent sndEvt;
		getNextEvent(sndEvent);
		dispatch(sndEvent);
		break;
	}
	...
}
</code></pre></div></div>

<p>First copying the event into the memory stream and then copying it again to get it back somewhat counteracts the whole idea of the being really efficient, but if we disregard alignment and rework the interface we can read and write directly into the stream instead with something like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>template &lt;class T&gt; T&amp; addEvent()
{
	mOffset += sizeof(T);
	return *reinterpret_cast&lt;T*&gt;(mBuffer+mOffset-sizeof(T));
}

unsigned char getNextType()
{
	return mBuffer[mOffset];
}

template &lt;class T&gt; const T&amp; getNextEvent()
{
	mOffset += sizeof(T);
	return *reinterpret_cast&lt;const T*&gt;(mBuffer+mOffset-sizeof(T));
}

//Produce events
SoundEvent&amp; sndEvent = addEvent&lt;SoundEvent&gt;();
sndEvent.volume = 0.5f;
...

//Dispatch events
unsigned char t = getNextType();
switch(t)
{
	case SOUND:
	dispatch(getNextEvent&lt;SoundEvent&gt;());
	break;
	...
}
</code></pre></div></div>

<p>This would allow us to place all objects of different types linearly in a tightly packed chunk of memory and then retrieve them with no copying, no wasted memory or other overhead. Would this be faster than other methods, or would the unaligned access counteract the smaller memory footprint? Only one way to find out…</p>]]></content><author><name>Dennis Gustafsson</name></author><category term="gamedev," /><category term="containers," /><category term="memory" /><summary type="html"><![CDATA[This blog post is about an issue I’ve run into several times, but still unsure how to solve in a nice way. Consider this more of an “organize-my-thoughts” type post, rather than a solution to the problem.]]></summary></entry><entry><title type="html">My journey into game development</title><link href="https://blog.voxagon.se/2021/02/22/background.html" rel="alternate" type="text/html" title="My journey into game development" /><published>2021-02-22T00:00:00+01:00</published><updated>2021-02-22T00:00:00+01:00</updated><id>https://blog.voxagon.se/2021/02/22/background</id><content type="html" xml:base="https://blog.voxagon.se/2021/02/22/background.html"><![CDATA[<p>I often get the question how I got into game development and if I have any tips for beginners. Here’s my story and thoughts about getting into game development.</p>

<h1 id="childhood">Childhood</h1>
<p>I’ve never been particularly interested in playing games myself. I never had a gaming console as a kid, but ever since I was very young I’ve had a strong interest in engineering and technology. My early interest in computers was entirely centered around programming, and not playing games.</p>

<p>I somehow convinced my parents to get me a Commodore VIC 64, because that was what one of my friends had. I’m not sure how old I was, but I must have been eleven or twelve. Back then, the printed manual for a computer was an introduction to programming (BASIC, in this case). When turning the computer on, there was a prompt where you could start programming. Overall, the bar to enter programming was way lower than now. No choice of programming language, no game engine, no downloading and installing stuff, you just turned the computer on and could instantly start programming (like, literally instantly, the interpreter was burnt into a ROM chip).</p>

<p><a href="/assets/2021-02-22-vic64.jpg"><img src="/assets/2021-02-22-vic64.jpg" alt="" /></a></p>

<p>Programming languages sucked, performance was terrible and debuggers non-existent. If you made an error, the computer froze and you had to turn it off and back on again and start over. It was frustrating, tedious and very unintuitive, but at the same time an excellent introduction to how computers work. In order to put a sprite on the screen, you had no choice but to map out each pixel on paper, learn binary numbers, convert that to decimal and load it into a specific memory address. Since there were no tools, everything was cumbersome, but at the same time, everything also seemed within reach without having to learn that much. There was only one way to do things – the hard way.</p>

<p>A few years later I upgraded to a Commodore Amiga 1000 and a whole new world opened up. This was much more similar to computers as we know them today, with a proper desktop, multi-tasking, a file system, etc. It shipped with a programming language (AmigaBASIC), but for some reason I never really got into it. Instead, I got introduced to the AMOS programming language, which I remember as an absolutely fantastic environment for learning to make games. It had a lot of built-in functionality for doing the most basic things, like loading images, playing sounds, drawing lines, etc. It also had the ability to execute inline assembly code which made it very powerful.</p>

<p>Getting better at programming and learning the hardware I got more and more comfortable programming directly in assembly language instead of AMOS and finally swithed over to using AsmOne as my default programming environment. In retrospect this was a terrible move, because writing everything in assembly language is overly complicated compared to using something like C and just use with assembly were needed. I think this poor decision was mostly because I simply didn’t know that C existed, nor how to combine it with assembly. Remember that this was before the Internet was a thing, so the only knowledge you had access to was through your friends and good dose of curiosity and trial-and-error.</p>

<h1 id="university">University</h1>
<p>There were no game educations available in Sweden at the time, and I’m not sure I would have chosen one even if there was. At this point I had not decided on a career in game development, maybe because game development wasn’t really seen as a career option at all, so I went for a more traditional engineering program – Master of Science in Media Technology at Linköping University. This is where I first got in contact with object oriented programming through Java and later C++. I took classes in linear algebra, data structures, 3D rendering, physical modelling and animation, physics, acoustics, etc. It was definitely a good foundation for a game developer, even though this wasn’t a game centric education.</p>

<p><a href="/assets/2021-02-22-imp.jpg"><img src="/assets/2021-02-22-imp.jpg" alt="" /></a></p>

<p>It was at university I developed a passion for game physics. I can’t remember exactly what caught my attention, but I wrote my first rigid body simulator in 1998, inspired by the papers on impulse based dynamics by Brian Mirtich. At this time physics was rarely seen in video games. The only one I remember studying intensely was Carmageddon 2, which featured incredibly sophisticated rigid body simulation for a game at that time. My first simulator was written in Java, with collision detection in C through the JNI interface. It was later rewritten in C++ and featured a wrecking ball machine at a building site.</p>

<h1 id="game-physics-and-middleware">Game physics and middleware</h1>
<p>For the final exam project at Linköping University I decided to make a game physics SDK with Marcus Lysén. It never really reached a usable state, but was enough to encourage us to form a company around it and develop it further. We teamed up with Jonas Lindqvist and founded Meqon Research. Around the same time, other physics SDKs started popping up. The first verison of Havok got released. Mathengine was already on the market, and there was Ipion (mostly known for being used in half-life 2), PhysX by Novodex, and the open source project ODE. Even though I wouldn’t admit it at the time, we had the weakest product, no experience and no money, but somehow we managed to release the Meqon SDK a few years later and got a couple of customers. Most notably 3D Realms licensed our technology for Duke Nukem Forever, which gave us the confidence and credibility to push forward and grow the team to about a dozen people. All in all a very fun and intense period of my career, but completely unsustainable, stressful and unhealthy.</p>

<p><a href="/assets/2021-02-22-meqon.jpg"><img src="/assets/2021-02-22-meqon.jpg" alt="" /></a></p>

<p>In 2005, Meqon was acquired by AGEIA and the whole team was integrated into the PhysX machinery. I worked as one of three software architects and got the chance to work with some incredibly talented people across the world, many of them I’m still in contact with today. This was a fantastic journey and undoubtedly an important cornerstone of my career. The people I worked with at AGEIA also influenced my coding style in a very important way. Coming from an academic, object oriented programming background, I started to question everything when I got in contact with experienced game developers who routinely rejected most of that in favor of a more direct C-like programming style that I slowly started adopting myself and still use today.</p>

<p>I left AGEIA in 2007, just before they got acquired by NVIDIA to work on scientific visualization. At this point I also started working on my own C++ framework to use for future projects. It wasn’t a game engine, but more of a low level framework with the functionality needed to make a game engine, such as vector math, file IO, compression, geometry, input, audio, rendering, scripting, etc. Creating your own tech was already at the time considered doomed to failure (even more so today), but doing it was a lot of fun and was undoubtedly an important key decision in my career. With a programming framework that I wrote from scratch, thus knowing inside out, I could very quickly implement new ideas and projects on top of that without ever running into any limitations.</p>

<p>One of the first projects I created with the new framework was Dresscode, a game engine profiling tool that I later sold to RAD Game Tools (now reworked into a product called Telemetry). Even if the framework has been rewritten and improved upon in several iterations, I’m still using it today for almost everything I do.</p>

<h1 id="indie-game-development">Indie game development</h1>
<p>Up until this point I never really made an actual, released game, but that changed in 2010, when I teamed up with Henrik Johansson (one of the people I hung out with in the Amiga days) and founded Mediocre. Going from game technology and middleware to making actual games was equal parts fun and frustration. I had no experience with game design, but started appreciate it more than I thought I would. An interesting thing to note here is that both Henrik I had very little interest in playing games. We were not gamers, which I think is quite unusual for game developers, but it is my firm belief that playing games is orthogonal to being successful at making them. There are great developers who play a lot of games and there are great developers who never play games. Playing a lot of games is not a bad thing, but it does not make you good at making them, it makes you good at playing them (and this probably applies to a lot more than game development).</p>

<p>We did our first game, Sprinkle, as a part-time project while still doing contract work on the side to sustain our living and I think this was a really wise decision which allowed us to experiment and iterate on the game design to find something unique, with no real time pressure. It also allowed us to spend that extra time polishing the game prior to release.</p>

<p><a href="/assets/2021-02-22-sprinkle.jpg"><img src="/assets/2021-02-22-sprinkle.jpg" alt="" /></a></p>

<p>I think the primary reason Sprinkle became successful was because we found something uniqe, but as always it’s hard to pinpoint one single factor. We had good timing, both because the App Store and mobile gaming in general was still young, and not very exploited, but also because there was a general interest in indie games at the time. Previous connections from NVIDIA and Meqon also contributed to getting us introduced to Apple and Google prior to release, thus increasing our chances of getting featured.</p>

<p>There’s a lot more to the story, including the other Mediocre games and everything that led up to Teardown, but I think I’ll stop here, since at this point I’m already a full-time indie game developer.</p>

<h1 id="advice">Advice</h1>
<p>For learning programming and game development today I don’t really feel like I’m in a position to give beginner advice, because the conditions today are so different from when I started, but for programming in particular, it is my firm belief that experience is the most important factor. Write a lot of code and you’ll eventually get good at it. A good way to do this is to find a way to enjoy it rather than just trying to learn it. My career took a giant leap when I finally embraced that and focused on what I love the most – doing low level stuff and building things from scratch, but it may very well be something else for you.</p>

<p>As an indie developer you will often hear the advice to focus on marketing, otherwise you’re doomed. I don’t agree with that. Making a indie game is hard. Marketing an indie game is even harder. Marketing a <em>mediocre</em> indie game is nearly impossible. If you’re good at making games, focus your efforts on making a unique, fun and polished game instead. If your game isn’t appealing, make another iteration on the design, revisit the mechanics, the art style or whatever you’re good at until it has something that other games don’t. I think indie developers generally have a better chance of making a game that markets itself rather than trying to market a game that just isn’t very good.</p>

<p>Making something unique is probably the most important aspect. As a small developer you cannot realistically create a clone or even a variation of an existing game and expect it to be better than what’s already out there. With a tiny team you also cannot compete with large amounts of content, nor with technology. Originality is pretty much the only aspect of a game that works in your favor, so embrace it. Keep the scope as small as possible and polish until it shines in the dark.</p>]]></content><author><name>Dennis Gustafsson</name></author><category term="gamedev" /><summary type="html"><![CDATA[I often get the question how I got into game development and if I have any tips for beginners. Here’s my story and thoughts about getting into game development.]]></summary></entry><entry><title type="html">The Spraycan</title><link href="https://blog.voxagon.se/2020/12/03/spraycan.html" rel="alternate" type="text/html" title="The Spraycan" /><published>2020-12-03T00:00:00+01:00</published><updated>2020-12-03T00:00:00+01:00</updated><id>https://blog.voxagon.se/2020/12/03/spraycan</id><content type="html" xml:base="https://blog.voxagon.se/2020/12/03/spraycan.html"><![CDATA[<p>Teardown uses an 8-bit color palette for voxel materials, so any voxel volume can have up to 255 different materials and the representation per voxel is then just a single byte to save memory. A material specifies not only the color, but also things such as roughness, emissiveness, reflectivity and physical material type (wood, metal, foliage, etc). Each object can have a unique palette, but a lot of them share the same one. When something breaks, all the pieces inherit the original object palette, so the number of palettes do not increase over time. I pack all palettes in a texture that is 256 in width and number of materials in height and keep that on the GPU for rendering. This way of handling materials conflicts with one particular feature that I really wanted in the game - the spraycan.</p>

<p>If each voxel stored RGB values, recoloring them would be trivial, but doing it with a fixed palette is a whole different story, especially since I wanted the ability to paint with two different colors (yellow for spraycan and black from fire and explosions) and also allow recoloring in several shades to do gradients and antialiased edges. Here is what the end result looks like and how I solved the problem.</p>

<p><a href="/assets/2020-12-03-spraycan.png"><img src="/assets/2020-12-03-spraycan.png" alt="" /></a></p>

<p>The basic idea is to create color variations of all used materials in the palette and populate the unused areas with these variations as a precomputation step at load time. Allowing two color shades in four steps requires eight empty slots in the palette per used material, cutting the usable number of entries in the palette down from 255 to a mere 28.</p>

<p>Most objects in teardown actually only use a handful of materials, so this is rarely a problem. A simple prop, like a chair or a table might even use just a single material, but the more complex ones, like a large boat or a house might use dozens of materials. On top of this, small objects are often merged into larger volumes to improve performance and at this merge step, materials from all merged objects must be combined into the same palette, so it can fill up pretty quickly.</p>

<p>If running out of empty slots, I search for visually similar materials and try to squeeze as many as possible of them into the palette. Afterwards I create a translation table for each shade that can be used as a lookup when recoloring (I know DOOM did a similar thing back in the day to emulate lighting with a fixed palette by using a translation table to pick darker variants of existing colors by referencing the best match out all existing colors in the palette). For each palette there is a translation table like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>unsigned char yellowVariant[256][STEPS];
</code></pre></div></div>

<p>So, for instance if I want to tint a voxel one step towards yellow I do this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>yellow = yellowVariant[original][0];
</code></pre></div></div>

<p>And then if I want to make the voxel even more yellow I can do the same one more time, but now with the new index:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>moreYellow = yellowVariant[yellow][0];
</code></pre></div></div>

<p>This would have been the same things as using the second step of the table from the beginning (this is actaully not always true when running out of empty slots):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>moreYellow = yellowVariant[original][1];
</code></pre></div></div>

<p>There is a similar table for the black shades as well, and this is where it gets complex. Say you use the spraycan to paint something yellow, then you blast a bomb near that area to tint the yellow paint black. This is were palette indices start running out quickly, because we need a black variant, not only for all the original materials, but also for each yellow variant of each unique material and the same of course applies the other way around.</p>

<p>The implementation in teardown uses a rather naive implementation that merges visually similar colors and simply stops adding new colors when the palette is full. (It actually prioritizes opaque colors and adds transparent colors only if there are available slots in the palette, since blending transparency doesn’t work well in the engine anyway).</p>

<p>The translation table is done last in a separate pass when all new colors have been added. It can choose freely from all available materials and pick the best match, so it is totally possible that one original material gets translated to another original material if it happens to be a yellow or black variant.</p>

<p>At some point I would like to improve the generation of new materials and use some kind of optimization algorithm to select the materials that generate the best translation table given the constraints, but it’s a non-trivial task that might be quite hard to pull off.</p>

<p><a href="/assets/2020-12-03-castle.png"><img src="/assets/2020-12-03-castle.png" alt="" /></a></p>

<p>If you want to see the limitations of the current approach, open the Castle example we ship in Create mode and bring out the spray can. The castle level is built as one huge scene in MagicaVoxel and therefore uses a single palette for the entire level. You’ll notice that some materials get a brown tint instead of yellow (probably because there already are a lot of brown shades in the palette) and a couple of materials that won’t even change at all, most likely because the palette filled up before reaching that index in the palette, forcing the best yellow variant of that color to simply become the original color itself.</p>]]></content><author><name>Dennis Gustafsson</name></author><category term="teardown" /><category term="game" /><category term="palette" /><category term="spraycan" /><summary type="html"><![CDATA[Teardown uses an 8-bit color palette for voxel materials, so any voxel volume can have up to 255 different materials and the representation per voxel is then just a single byte to save memory. A material specifies not only the color, but also things such as roughness, emissiveness, reflectivity and physical material type (wood, metal, foliage, etc). Each object can have a unique palette, but a lot of them share the same one. When something breaks, all the pieces inherit the original object palette, so the number of palettes do not increase over time. I pack all palettes in a texture that is 256 in width and number of materials in height and keep that on the GPU for rendering. This way of handling materials conflicts with one particular feature that I really wanted in the game - the spraycan.]]></summary></entry><entry><title type="html">Teardown quicksave</title><link href="https://blog.voxagon.se/2020/11/18/teardown-quicksave.html" rel="alternate" type="text/html" title="Teardown quicksave" /><published>2020-11-18T00:00:00+01:00</published><updated>2020-11-18T00:00:00+01:00</updated><id>https://blog.voxagon.se/2020/11/18/teardown-quicksave</id><content type="html" xml:base="https://blog.voxagon.se/2020/11/18/teardown-quicksave.html"><![CDATA[<p>Saving the complete state of a game at any time and then restoring to that state is hard for any game, but in a fully dynamic voxel world that constantly changes, controlled by dozens of lua scripts, all with their own internal state, implementing this was quite a challenge.</p>

<p>The quicksave feature in Teardown is central to gameplay and needs to be extremely robust for the game to be playable, so I knew early on this had to work flawlessly. It also had to be relatively fast. A long delay would be annoying and cause players to use it less often, limiting creativity and experimentation. Furthermore, it is one of those features that’s quite unrewarding to work on, because no matter how good it is, it doesn’t really add anything to the game, other than working as expected, while even the smallest error instantly results in corrupt state and most likely a crash.</p>

<p>Let’s start with the world itself. It consists of thousands of individual voxel volumes, each with anything from a couple of hundred up to millions of voxels. These volumes are altered dynamically as the player causes destruction. Both voxel content and the size of the volumes change, new volumes are being added and others are removed. In theory, it would probably be possible to keep a diff for the world and use that for tracking state, but for robustness purposes I wanted to save the entire state of world for each save. The larger levels contain roughly half a billion voxels, so I first thought it would be unrealistic to save all that state, but since the voxel data compresses very well it turned out to actaully be a viable option. Proper entropy coding like zlib can easily get the size down to a few percent of the original size, but compressing half a gigabyte of data takes a while even on a fast machine. Instead I’m using simple <a href="https://en.wikipedia.org/wiki/Run-length_encoding">run-length encoding</a> which has almost zero cost, both on the compressing and the decompressing end. Using this gets the size down to 15-20% of original size, making quicksave files on the larger levels around 80 Mb in size. It’s still a lot, but acceptable and very quick to load. The binary mission content files that we ship with the game are actually just an initial quicksave snapshot of each level, but I run these through zlib at the bake step, cutting size down to about 20 Mb per mission.</p>

<p>Compared to the voxel data, all other game state is tiny, but equally important. I’m using an explicit form of serialization, where each object has callbacks for saving and loading state. This means each object can choose freely what to save, and might leave out cached state or temporary acceleration structures. The voxel objects, for instance, has a separate physics representation, which is also voxel data, but in a different format. This is 100% reproducible from the main voxel data, so instead of saving it, it’s generated at load time. The same goes for spatial acceleration structures for physics, culling, rendering and lighting. I’m using a serialization context that gets passed around to each object that keeps track of pointer serialization by iterating over the objects twice at load time.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>//Code example for saving the state of a physical body
void Body::saveState(TSerializeOutput&amp; ser)
{
	Entity::saveState(ser);
	ser.writeTransform(mTransform);
	ser.writeVec3(mLinVel);
	ser.writeVec3(mAngVel);
	ser.writeBool(mDynamic);
	ser.writeUInt8(mActive);
}
</code></pre></div></div>

<p>This explicit form of serialization is flexible, but it also makes it error prone. I tend to group class members in stateful and stateless sections in the header. This makes it a bit more explicit what needs to be saved and easier to check if everything is being saved, but I wish I had a better system in place to verify that all state is indeed saved.</p>

<p>Scripting was by far the hardest part to serialize. All gameplay logic in Teardown is implemented in lua scripts and since scripts can be written by anyone, even outside the development team (there is already an active modding community) I wanted state serialization for scripts to be fully automated. Hence, in contrast to the engine-side serialization, there should be no need for callbacks or explicit state serialization. One would have hoped that the lua library offered some way to serialize global state, but unfortunately there is not much in there to help (at least not in lua version 5.1 that I’m using, correct me if I’m wrong). Fortunately, traversing the global state of a lua context is relatively easy. All global variables show up in the globals table (_G), so if all handles and engine interaction is handled properly, serializing that table is enough. There is still a lot of non-trivial code to untangle table references and types correctly and the end result is not perfect, but it works for all our own scripts. There are corner cases, like multiple tables referencing other tables and circular dependencies that will not serialize correctly, so I’ll have to go over that at some point, but for the most part it works really well.</p>]]></content><author><name>Dennis Gustafsson</name></author><category term="teardown" /><category term="game" /><category term="quicksave" /><category term="serialization" /><summary type="html"><![CDATA[Saving the complete state of a game at any time and then restoring to that state is hard for any game, but in a fully dynamic voxel world that constantly changes, controlled by dozens of lua scripts, all with their own internal state, implementing this was quite a challenge.]]></summary></entry><entry><title type="html">Teardown design notes</title><link href="https://blog.voxagon.se/2020/11/05/teardown-design-notes.html" rel="alternate" type="text/html" title="Teardown design notes" /><published>2020-11-05T00:00:00+01:00</published><updated>2020-11-05T00:00:00+01:00</updated><id>https://blog.voxagon.se/2020/11/05/teardown-design-notes</id><content type="html" xml:base="https://blog.voxagon.se/2020/11/05/teardown-design-notes.html"><![CDATA[<p>Teardown started as a technology experiment and it’s one of those games where gameplay was designed to fit the technology, rather than the other way around. It’s not the first time I’ve been involved in such projects (Sprinkle, Smash Hit), and probably not the last, but Teardown was by far the most frustrating experience yet.</p>

<p>The idea of a fully destructible environment is compelling for the player but a nightmare for the game designer. Walls can no longer be used as obstacles, key objects that the player might need to complete an objective can break and the designer is no longer in control over a players path through the game, potentially breaking the intended progression. Not to mention all the technical hurdles a fully destructible environment implies when it comes to physics, lighting, scripting, etc, but more on that is a future blog post.</p>

<p>Destruction is often used in games as a decorative special effect, but for Teardown the intention was always to use destruction as the key element in gameplay and with a limited amount of action, allowing the player to do detailed precision work rather than total mayhem.</p>

<p><a href="/assets/2020-11-05-teardown.png"><img src="/assets/2020-11-05-teardown.png" alt="" /></a></p>

<p>After nearly a full year of experimentation and many failed prototypes, the idea of a two-phase heist setting was born. It’s compatible with all the limitations (or lack thereof) that a fully destructible environment impose, while still offering an interesting challenge. It allows the player to move around freely in a fully accessible environment, carefully planning the heist and creating shortcuts using destruction, vehicles and objects from the environment in a creative way. The player chooses when, and I think it’s important that this is the players decision, to go into action mode and try out the created path.</p>

<h1 id="level-design">Level design</h1>
<p>Allowing the player to destroy everything has a huge impact on level design. Since any wall can be torn down, the only true obstacles at our disposal are elevation, distance, water and unbreakable objects. We could use unbreakable objects more, but it would make the environment harder to read and imply a failure to deliver on the promise of a fully destructible environment. Therefore unbreakable objects are only used for rock formations and the ground you’re standing on.</p>

<p>The relatively small level size started as a technical limitation, but I don’t think the game would benefit from larger levels even if it was technically possible. Villa Gordon is currently the largest level in the game, and it can already be a bit tedious to walk around during the preparation phase. Personally I think the game shines in a more compact and cluttered environment like Hollowrock Island, with some verticality to allow for more interesting shortcuts.</p>

<p>The only place we found the level size to be a limitation was the end chase on Frustrum level. We originally anticipated it to be twice as long, but due to a 3D texture size limitation on AMD graphics cards, we had to restrict it to 400 meters. We could have made it twice as long using a U-shaped level, but we also wanted to keep the level straight to have the goal direction consistently aligned with the sun.</p>

<h1 id="the-timer-and-the-chopper">The timer and the chopper</h1>
<p>Nobody likes a timer, and in previous iterations of the game idea there was no timer. Since the game offers so much player freedom, the only viable option to impose any form of challenge would be resource limitations, and for a sandbox game where destruction plays a central role, adding restrictive resource limitations just doesn’t make the game fun. The goal with the alarm timer has always been to offer a challenge even with a generous amount of tools and resources. While I can agree that a timer is usually a bad idea in game design, I’m really happy the way it turned out in Teardown.</p>

<p>Along the way we’ve mixed up the timed missions with other types where the challenge comes more from moving heavy objects, demolishing buildings or putting out fires, but I’m not convinced that alone could support a whole game. In several missions there are alarmed targets attached to something heavy, allowing it to be moved around to some extent, which I think is a good mix, letting the player choose whether to tinker with the environment or just make a run for it.</p>

<p><a href="/assets/2020-11-05-chopper.png"><img src="/assets/2020-11-05-chopper.png" alt="" /></a></p>

<p>A popular suggestion has been to have the security chopper chase the player after arriving to the scene instead of 	instant failure, but as a general solution I don’t think it’s a good idea. It would introduce an element of randomness that would discourage the strategic thinking and careful planning that this game is all about, in favor of just replaying the mission until reaching the escape vehicle before dying. So instead we added a separate mission type that still allows the player to make preparations, but the chopper shows up shortly after clearing the first target, effectively replacing the timer with an enemy. I think both mission types work well, but that doesn’t necessarily mean it’s a good idea to combine them.</p>

<h1 id="trial-and-error">Trial and error</h1>
<p>Quicksave can be a sensitive topic in game design. For linear games it’s often a tough decision whether to offer quicksave at any time or save progression only at certain times or locations. Some players refuse to use a generous quicksave feature, as it could be considered cheating.</p>

<p>This is something I think turned out particularly well in Teardown - allowing just one save slot, freely available at any time during preparation, but disabled as soon as the alarm goes off. It encourages player experimentation during the preparation phase, but since there is only one slot, it must still be used wisely. Even with quicksave available, a major change of plans often requires a full restart anyway due to resource limitations, vehicle condition or broken objects.</p>

<p>Trying out a route and then go back to the planning phase for improvement is a key part of the core loop and so intrinsic to the game that we actually enforce it in the third mission to communicate that this is the intended way to play the game.</p>

<h1 id="replayability">Replayability</h1>
<p>Since any mission can be played in an infinte number of ways there is already natural incentive for replayability, but there are a couple of things in the game specifically designed to increase replayability. Most missions have optional targets that will increase the score. These optional targets are often placed in strategic locations that break up the most efficient path of the required targets, encouraging the player to use a different strategy and/or starting location.</p>

<p>New tools and upgrades introduced later in the game make all earlier missions easier to complete. It gives a natural incentive to go back and replay missions with better tools, clearing more optional targets, which increases score and gives even better tools, forming an outer game loop that can be quite rewarding. Admittedly, for this to have a strong impact on the game, there would need to be more optional targets. However, introducing a lot of optional targets early in the game can be quite overwhelming, so the whole thing might need to be redisigned a bit to work as intended.</p>

<h1 id="story">Story</h1>
<p>Let’s be honest - no one plays Teardown for the story, but I think it serves an important role to frame the missions and as an incentive for progression. It was an early decision to deliver the story in the form of one way e-mail communication and I’m quite happy the way it turned out. Since the player can go back and read old e-mails, it’s possible to catch up on the story when coming back after taking a break from the game. This is something I miss in a lot of other games - the ability to recap the story when coming back to them.</p>

<p>The reason e-mails cannot be replied to is part of the bigger goal of making the player fully anonymous. The main character in Teardown is intentionally lacking name, age, gender and personality traits to fully leave that up the players imagination.</p>

<p><a href="/assets/2020-11-05-mail.png"><img src="/assets/2020-11-05-mail.png" alt="" /></a></p>

<p>The story is also told through the environments, how they progress, descriptions of objects in them, themed valuables and last but not least the television. There’s a lot of room for improvements here. I originally envisioned much more environmental changes when coming back to the same environment (also involving procedural changes based on the players actions) but for several reasons we had to cut back on that.</p>

<p>Missions are kept separate from the e-mails on the Missions tab to give the player an overview of available missions for a particular location. This is to further incentivice replayability and make it clear where improvements are possible to increase score and rank.</p>

<h1 id="progression">Progression</h1>
<p>Whether to have sandbox levels directly accessible or tied to campaign progression has been a long internal discussion. Knowing that a lot of people would want to play Teardown just for the sandbox experience, it may seem a bit inconsiderate to enforce a complete playthrough to make everything accessible. On the other hand, keeping all environments and tools available in sandbox mode from the beginning would ruin the experience for campaign players.</p>

<p>The route we chose was to keep them locked, but introduce new environments and tools relatively early in the campaign. The first three environments can be unlocked after completing just five missions while the fourth one requires a bit more work. While not suiting everyone, I think it turned out quite well, and I hope more people play and enjoy the campaign because of this decision.</p>

<p>Tool upgrades also carry over to the sandbox mode, which gives a stronger incentive to scavenge valuables and upgrade tools in the campaign.</p>]]></content><author><name>Dennis Gustafsson</name></author><category term="teardown" /><category term="game" /><category term="design" /><summary type="html"><![CDATA[Teardown started as a technology experiment and it’s one of those games where gameplay was designed to fit the technology, rather than the other way around. It’s not the first time I’ve been involved in such projects (Sprinkle, Smash Hit), and probably not the last, but Teardown was by far the most frustrating experience yet.]]></summary></entry><entry><title type="html">The importance of good noise</title><link href="https://blog.voxagon.se/2018/12/07/the-importance-of-good-noise.html" rel="alternate" type="text/html" title="The importance of good noise" /><published>2018-12-07T00:00:00+01:00</published><updated>2018-12-07T00:00:00+01:00</updated><id>https://blog.voxagon.se/2018/12/07/the-importance-of-good-noise</id><content type="html" xml:base="https://blog.voxagon.se/2018/12/07/the-importance-of-good-noise.html"><![CDATA[<p>There are many articles to read about noise functions in computer graphics, especially now that a lot of people recently got interested in ray tracing, but it took me a long time to fully understood <em>why</em> noise characteristics are so important and I didn’t find a good resource on the Internet explaining it, so I’ll give it a shot.</p>

<h1 id="why-noise-is-needed">Why noise is needed</h1>

<p>Noise is used to generate sequences of semi-random numbers. I use these random numbers at several places in the rendering system, but here are a few examples:</p>

<ul>
  <li>
    <p>Soft shadows from light sources that are not a single point. In this case, the light source is a sphere, so for each ray I trace towards a random point on that sphere.</p>
  </li>
  <li>
    <p>Blurry reflections. For materials that are not perfect mirrors, I alter the surface normal a small amount for each ray. This gives the appearance of a rough surface.</p>
  </li>
  <li>
    <p>Ambient occlusion, which darkens concave areas which are blocked from the incoming environment light. I shoot a number of rays on the hemi-sphere for each surface point and randomize the direction.</p>
  </li>
  <li>
    <p>Volumetric fog, or god rays. In order to approximate lit fog I shoot rays along the line of sight towards each light source. Both the sample points along the line of sight and the direction towards the light source (if it’s not a point light) need noise.</p>
  </li>
</ul>

<p>I use noise in other places as well, but these are probably the easiest to explain. In some cases, several rays are shot for each pixel and the result is just the average of all samples. In other cases, there is just a single sample. In either cases, the visual result will be more or less noisy. I use denoising by spatially blurring and temporally accumulating the result over time.</p>

<h1 id="noise-characteristics">Noise characteristics</h1>

<p>In white noise, each sample is just a random number, without any consideration of the sequence as a whole, very much like rolling a die. Imagine the following sequence of random numbers between 0 and 9:</p>

<p>2 9 7 1 3 5 6 1 0 1 8 9 2 4 4</p>

<p>Since each sample has no “memory” of what has already been generated, it can generate similar numbers several times in a row, like the “1 0 1” found in the middle.</p>

<p>Now instead consider the following sequence, which are the exact same numbers but in a different order:</p>

<p>2 9 3 7 1 5 1 4 0 8 4 1 9 2 6</p>

<p>Since it’s the same numbers, they have the same average, but I swizzled them around so that adjacent numbers are always reasonably far apart. You can think of this as a signal with higher frequency. This is roughly what blue noise is trying to achieve, and that’s why noise functions are so intimately related to frequency spectrums.</p>

<h1 id="why-blue-noise-is-desirable">Why blue noise is desirable</h1>

<p>When working with computer graphics, blue noise is desirable, because we don’t want the same (or similar) result in two adjacent pixels on the screen, because that will make the spatial denoising filter less efficient.</p>

<p>Even without filtering, blue noise gives a smoother characteristic and more visually pleasing image, but that’s because there is a certain amount of filtering going on in our eyes and brains. Apparently, our retinal cells are arranged in a blue noise-like pattern. Pretty cool!</p>

<p><a href="/assets/2018-12-07-the-importance-of-good-noise-1.png"><img src="/assets/2018-12-07-the-importance-of-good-noise-1.png" alt="" /></a></p>

<p>Unfiltered ambient occlusion with white noise</p>

<p><a href="/assets/2018-12-07-the-importance-of-good-noise-2.png"><img src="/assets/2018-12-07-the-importance-of-good-noise-2.png" alt="" /></a></p>

<p>Same ambient occlusion with blue noise</p>

<p>To put it in simpler words, we want semi-random numbers that are as “spread out” as possible both vertically and horizontally. In one dimension, like the numbers above, it’s fairly easy, but when doing it in two dimensions it’s actually much harder, because you need to consider not only what’s before and and after each sample, but also what’s above and below. This is what two-dimensional blue noise does.</p>

<h1 id="temporal-aspect">Temporal aspect</h1>

<p>So far we didn’t consider time. Now imagine we have a good distribution of random numbers so that the number for a specific pixel is not similar to any of it’s neighbors. Let’s also add time.</p>

<p>Just like with pixel neighbors, we don’t want the random number for a specific pixel to be the same for two consecutive frames, because that will make the temporal filtering less efficient. You can think of this as just another dimension, so what we want is really three dimensional noise that is spread out both horizontally, vertically <em>and over time</em>.</p>

<p>This is where is gets complicated, because apparantly, when generating blue noise in three dimensions it loses some of it’s nice properties in two dimension, so a 2D slice of 3D blue noise will <em>not</em> be as good as pure 2D blue noise. I’m not good enough at math to fully understand why, but there is an in-depth article about it <a href="http://momentsingraphics.de/?p=148">here</a></p>

<p>To overcome this issue, I use a trick based on the golden ratio, which is the most irrational number there is. The golden ratio is very useful for many things, and if you haven’t added that to your bag of tricks, you should. There is a really cool video explaining why it is so irrational <a href="https://www.youtube.com/watch?v=sj8Sg8qnjOg">here</a></p>

<p>Irrational numbers in general and the golden ratio in particular has this property that if you add it to a number between zero and and one take the fraction of that, you will get a new number that is far apart from the old one, yet never repeating itself when you do it over and over. This is exactly what we want! So, instead of using 3D blue noise, I use 2D blue noise and animate it using the golden ratio. Note that this is not a novel idea, a lot of people have been doing this before, for instance <a href="https://blog.demofox.org/2017/10/31/animating-noise-for-integration-over-time/">here</a></p>

<h1 id="how-to-use-it">How to use it</h1>

<p>The most efficient way of using blue noise is probably to use a precomputed texture with blue noise samples in it. It doesn’t have to be the same resolution as the render target. It’s fine to repeat it a few times before it gets visually noticable. I use a 512x512 sized blue noise texture. In order to achieve all the good blue noise properties it is important to line up the blue noise texture <em>exactly</em> with the fragments in the render target and not using any filtering when sampling it. You need a pixel-perfect mapping for it to work effiently.</p>

<p>To animate the noise over time, just do <code class="language-plaintext highlighter-rouge">noise=mod(blueNoise+GOLDEN_RATIO*frameNumber, 1.0)</code> and you’re done. Make sure that frameNumber doesn’t get too big over time, or you’ll lose floating point precision, so <code class="language-plaintext highlighter-rouge">noise=mod(blueNoise+GOLDEN_RATIO*(frameNumber%100), 1.0)</code> or similar will work fine.</p>

<p><strong>Update:</strong> There has been a discussion on <a href="https://twitter.com/tuxedolabs/status/1070987893970223104">twitter</a> where <a href="https://twitter.com/TastyTexel">@tastytexel</a> pointed out that as the blue noise sample wraps around it can introduce low frequency components. Imagine two consecutive samples that are 0.0 and 0.9 (good distribution). When adding the same offset 0.1 to both with fraction wrapping, the result will be 0.1 and 0.0 (bad distribution). The suggested fix is to reshape the noise through a triangular filter so that it wraps around nicely. Will update with more findings as I have tried this.</p>

<h1 id="more-dimensions">More dimensions</h1>

<p>In many cases you want not just a single random number, but a 2D or 3D random vector. I use an RGB blue noise texture for that, so each pixel actually has three random numbers, where each channel has blue noise properties. You could just add the golden ratio to animate those, but I came to think of this <a href="http://extremelearning.com.au/unreasonable-effectiveness-of-quasirandom-sequences/">blog post</a> about the R2 sequence, which is a multi-dimensional generalization of the golden ratio.</p>

<p>So instead of adding the golden ration to each component when animating 3D noise, I add three <em>different</em> irrational numbers, one to each channel. In my experience this gives a better distribution of 2D and 3D vectors over time, since all components aren’t shifting the same amount.</p>

<p>Finally <a href="http://momentsingraphics.de/?p=127">here</a> is a link to a very useful <a href="http://momentsingraphics.de/Media/BlueNoise/FreeBlueNoiseTextures.zip">database</a> with free, precomputed blue noise textures of different sizes.</p>]]></content><author><name>Dennis Gustafsson</name></author><category term="noise" /><category term="rendering" /><category term="3d" /><category term="raytracing" /><summary type="html"><![CDATA[There are many articles to read about noise functions in computer graphics, especially now that a lot of people recently got interested in ray tracing, but it took me a long time to fully understood why noise characteristics are so important and I didn’t find a good resource on the Internet explaining it, so I’ll give it a shot.]]></summary></entry><entry><title type="html">From screen space to voxel space</title><link href="https://blog.voxagon.se/2018/10/17/from-screen-space-to-voxel-space.html" rel="alternate" type="text/html" title="From screen space to voxel space" /><published>2018-10-17T00:00:00+02:00</published><updated>2018-10-17T00:00:00+02:00</updated><id>https://blog.voxagon.se/2018/10/17/from-screen-space-to-voxel-space</id><content type="html" xml:base="https://blog.voxagon.se/2018/10/17/from-screen-space-to-voxel-space.html"><![CDATA[<p>There has been quite a few changes to my rendering pipeline over the last couple of months. The biggest being that I now do full raytracing in voxel space instead of the screen space counterpart.</p>

<p>This may sound like a major rewrite, but a lot of the pipeline actually stays the same. I simply trace rays in a huge 3D texture instead of screen space. This obviously has a number of benefits, like real ambient occlusion, long shadows, specular occlusion and no screens space artefacts, but it also comes with a number of drawbacks. The biggest one probably being memory consumption. I chose a texture resolution of 5 cm and combined with a world size of 100x100x25 meters this gives two billion voxels, or two gigabytes if storing one voxel per byte.</p>

<p>Since each object has it’s own transform and can be freely moved and rotated, I have to rasterize each object into the big world texture continuously. This is done on the cpu and the relevant parts of the texture is updated with <code class="language-plaintext highlighter-rouge">glTexSubImage3D</code>. For large objects, this can be rather slow, so the technique is not for everyone. I’ve been surprised by how well it works in practice though, since most dynamic objects are usually rather small. If there are several objects moving at the same time, I cluster them and send updates in larger chunks where they are needed.</p>

<h1 id="voxel-storage">Voxel storage</h1>
<p>Note that the big world space texture, the <em>shadow texture</em>, only requires one bit per voxel. Using a whole byte means wasting eight times the memory we really need. Therefore I store eight neighboring voxels per byte in an octree fashion, so each bit represents one octant in a 10 cm cube. If the byte is zero is means there are no voxels in any octant, and this can be exploited later to speed up raytracing. In addition to the base level 10 cm resolution (or 5 cm if you count the octants) I also store two mip levels, one for 20 cm and one for 40 cm. This gives a total of four mip levels, including the octant bits. This gives 256 + 32 + 4 = 292 Mb for the shadow texture instead of the two gigabytes, including two mip levels which can be used to speed up raytracing.</p>

<h1 id="raytracing-the-shadow-texture">Raytracing the shadow texture</h1>
<p>Raytracing in voxel space is actually much simpler than in screen space. Just start at the camera and walk the ray direction in voxels space until hitting something that isn’t zero. Note that walking the voxel space the same direction in fixed steps will <strong>not</strong> give a water tight result. Light can leak through voxels in certain scenarios as described in <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.42.3443&amp;rep=rep1&amp;type=pdf">this paper</a>, making implementation a little trickier. The paper also presents a solution to the problem, an algorithm sometimes referred to as “supercover” traversal.</p>

<p>I actually use both supercover and the cheaper fixed step tracing depending on the use case. Ambient occlusion, for instance, or volumetric fog, doesn’t require exact tracing. This is were voxel raytracing really shines. In triangle raytracing each ray has a fixed cost, while in voxel raytracing you can choose which fidelity you need for each particular ray and even change fidelity while walking the ray.</p>

<p>To further speed up raytracing I also utilize the other mip levels, starting in the lowest mip level and if it hits, switch to the larger ones progressively. The base mip level requires some bit masking to find out if it really hits, but the general algorithm is the same.</p>

<p>Here is test scene without shadows and ambient occlusion. It’s a bit unfortunate that these are voxel objects themselves. It’s not really necessary, since the shadow volume rasterization would work on any closed mesh, but I don’t have it in my code yet, since I’m currently working on a game with voxel graphics.</p>

<p><a href="/assets/2018-10-17-from-screen-space-to-voxel-space-1.png"><img src="/assets/2018-10-17-from-screen-space-to-voxel-space-1.png" alt="" /></a></p>

<p>Let’s add ambient occlusion with five rays per pixel in the most naive way possible. Just walk from the pixel in a semi-random direction on the hemisphere.</p>

<p><a href="/assets/2018-10-17-from-screen-space-to-voxel-space-2.png"><img src="/assets/2018-10-17-from-screen-space-to-voxel-space-2.png" alt="" /></a></p>

<p>As seen in the image, this won’t work very well, bacause the shadow volume is not perfectly aligned with the object surface. You can think of this as an extreme version of <a href="https://digitalrune.github.io/DigitalRune-Documentation/html/3f4d959e-9c98-4a97-8d85-7a73c26145d7.htm">shadow acne</a> found in regular shadow mapping. To overcome this problem, I don’t start tracing at the pixel position, but offset the ray origin a safe distance, based on the normal and ray direction. This will prevent the ray from hitting the shadow voxel that comes from the pixel surface.</p>

<p><a href="/assets/2018-10-17-from-screen-space-to-voxel-space-3.png"><img src="/assets/2018-10-17-from-screen-space-to-voxel-space-3.png" alt="" /></a></p>

<p>While capturing the overall occlusion very well, you might notice that it lacks fine detail. This is an artefact from the ray origin offset.</p>

<h1 id="combining-with-screen-space">Combining with screen space</h1>
<p>To capture finer detail, we need a backup method for the distance from the pixel out to the new ray origin. Fortuantely there is another technique that works really well at short distances – screen space raytracing! It turns out all my previous work in screen space raytracing now come in handy. Here is the contribution from a short distance of screen space raytracing:</p>

<p><a href="/assets/2018-10-17-from-screen-space-to-voxel-space-4.png"><img src="/assets/2018-10-17-from-screen-space-to-voxel-space-4.png" alt="" /></a></p>

<p>And finally combining the two by starting in screen space, trace the safe distance and unless it hits something switching over to voxel raytracing and continue through the world:</p>

<p><a href="/assets/2018-10-17-from-screen-space-to-voxel-space-5.png"><img src="/assets/2018-10-17-from-screen-space-to-voxel-space-5.png" alt="" /></a></p>

<h1 id="light-sources">Light sources</h1>

<p>The exact same techinque can also be used for light sources. Here is the scene with a spotlight with zero radius. One ray per pixel is shot from the pixel towards the light source, starting in screen space and then moving over to voxel space.</p>

<p><a href="/assets/2018-10-17-from-screen-space-to-voxel-space-6.png"><img src="/assets/2018-10-17-from-screen-space-to-voxel-space-6.png" alt="" /></a></p>

<p>As you might notice, the voxel grid becomes pretty noticable with sharp shadows and particularly so with light coming in at a sharp angle. However, since we are now raytracing the shadows, it’s really easy to make soft shadows by just jittering the light position. This will effectively hide artefacts from the voxel grid, at the same time producing accuracte soft shadows “for free”. Here is the same light but with a 30 cm radius:</p>

<p><a href="/assets/2018-10-17-from-screen-space-to-voxel-space-7.png"><img src="/assets/2018-10-17-from-screen-space-to-voxel-space-7.png" alt="" /></a></p>

<p>I also use the same raytracing technique for reflections and volumetric fog (god rays). For reflections, I use screen space reflections for the reflected image where available. Where it’s not available I fade to black, simply because I don’t have any information about the hit surface. The shadow volume is binary and knows nothing about materials. However, and more importantly, since I do the ray tracing in voxel space, I get <em>specular occlusion</em> for everything, not just what’s visible on screen. Blurry reflections generally look better for the same reason that soft shadows look better than sharp – because they hide blocky artefacts better.</p>

<p><a href="/assets/2018-10-17-from-screen-space-to-voxel-space-8.png"><img src="/assets/2018-10-17-from-screen-space-to-voxel-space-8.png" alt="" /></a></p>

<h1 id="performance">Performance</h1>

<p>Voxel raytracing performance depends largely on the length of the ray, the resolution and the desired quality (step length, mip map, etc), but it is generally really, <em>really</em> fast compared to polygon ray tracing. I don’t have any exact measures of how many rays per pixel I shoot, but for comparison I do all ambient occlusion, lighting, fog and reflections in this scene in about 9 ms, including denoising. The resolution is full HD and the scene contains about ten light sources, all with volumetric fog, soft shadows and no precomputed lighting. Timings taken on a GTX 1080.</p>

<p><a href="/assets/2018-10-17-from-screen-space-to-voxel-space-9.png"><img src="/assets/2018-10-17-from-screen-space-to-voxel-space-9.png" alt="" /></a></p>]]></content><author><name>Dennis Gustafsson</name></author><category term="voxels" /><category term="rendering" /><category term="3d" /><category term="raytracing" /><summary type="html"><![CDATA[There has been quite a few changes to my rendering pipeline over the last couple of months. The biggest being that I now do full raytracing in voxel space instead of the screen space counterpart.]]></summary></entry></feed>