Welcome to part 2 of this series. In the previous post we setup our problem which is to speed up the
SliceMap family of types for sparse data. We created benchmarks and measured the performance of the current implementation. I gave a brief overview of a new approach I had come up with and showed how it failed miserably.
We were in a depressing place at the end of the last post but hope burns eternal! I have already been researching approaches for this problem on and off for a year, so I didn’t expect the problem to be slain in a day. Rather than giving up, I went searching for answers.
Enter Data-Oriented Design
Recently I have been researching Data-Oriented Design. My first introduction to it was a great talk by Mike Acton as CppCon. I regularly watch talks on other languages and paradigms to grow my understanding of the field and this talk in particular struck a chord. While some may find Mike’s delivery a little brusque, I found it refreshing. The talk is littered with great lines, but the following is one of my favorite.
Reality is not a hack you’re forced to deal with to solve your abstract, theoretical problem. Reality is the actual problem.
Overall, Data-Oriented Design emphasizes data and its transformation as the key thing to design around. It generally eschews Object Orientation as a means of decomposing problems and instead looks at what data layouts and access patterns allow us to extract the maximum performance. This talk sent me deep down a rabbit hole. Eventually I found my way to Jonathan Blow who has given many great talks online. I decided to pick up the book “Data-Oriented Design” by Richard Fabian.
I’m still struggling with how I could use Data-Oriented Design to solve my slicing problem when I came to chapter 6 which discusses Searching. On page 114 of the paperback Richard describes how we can have data structures for looking up data that keep track of the query patterns being used. Once a threshold is met, the data could be re-ordered to better suit how the data is being accessed.
This was the moment of insight for me. “Wait!” I said to myself. “In real world use cases, you are often slicing across 1 dimension of the data many times in a row. Then you may start slicing across another dimension many times in a tight loop. Why not have the SliceMap re-order it’s data to be optimal for the types of lookups that are being performed!”
Idea 3: Reorganizing Internals
I went back to the drawing board and reworked how data was being stored in the
SliceMap types. The internal fields of the 1 dimensional
SliceMap remained simple. We give the
SliceMap a comparer for comparing the keys when performing the Hadamard Product.
keys is just a chunk of memory that is sorted.
values is contiguous memory where the position is what determines the key it goes with.
SliceMap2D gets a little more interesting. We need to remember that a
SliceMap2D can be thought of as a table in a database where the primary key is made up of two fields: Key1 and Key2. Here is what some example data could look like.
Since we are trying to optimize the speed of slicing the data, we are willing to do some work up front to organize the data. When we initially create the
SliceMap2D, we will sort the data by Key1 then Key2. This will allow us to use Run Length Encoding on the outer keys, Key1 in this case. We will store the length of the runs of the outer key in an
We will use two arrays for storing Key1 data. One array for the values of Key1, another for the
IndexRange that corresponds to the key. We will call these fields
OuterKeyRanges respectively. Key2 and Values will be stored in a
ReadOnlyMemory of their respective types. Key2 and Values have a 1 to 1 matching based on their location in their containers. We can now define
SliceMap2DInternals for storing this information.
Now, you may notice that I was talking about Key1 and Key2 but then switched to talking about OuterKey and InnerKey. This is where things may get confusing but trust me, we’ll get there! We need
SliceMap2D to be able to restructure itself in order to provide fast slicing across Key1 or Key2. If Key1 data is stored in the OuterKey fields, then it is much faster to slice along Key1 because all we need to do it find the range of values it applies to and simply just slice the memory for
Values to create a
If Key2 is stored in the
InnerKeyValues field, it is difficult to slice because a particular value of Key2 could occur in multiple places in
InnerKeyValues. But what if we were able to flip which key was stored in the
OuterKeyRanges fields and which one was stored in
InnerKeyValues? Well, then we could slice along the Key2 dimension quickly since all its values would be contiguous after flipping.
The “problem” is that F# is statically typed and doesn’t like you changing the type of fields. Fortunately, every problem in F# is solved with another type. Enter the
What this Discriminated Union is doing is containing the information for how the keys are stored in the
SliceMap2DInternals. It tells us if Key1 is in the outer fields or if Key2 is. Now we can define
SliceMap2D is storing its state in a mutable field so it can change it when it wants. When you go to slice along a dimension, it will check how the data is laid out. If the data is not laid out for efficient slicing, it will swap the keys around. Here is what the slicing method looks like.
The reason this is a valid optimization is that
SliceMap was never intended as a general-purpose data structure. It was built to make composing Mathematical Planning problems clean and simple. When creating constraints, the dominant usage pattern is to perform the same slice many times for different values. Honestly, I put too much functionality into the original
SliceMap. I lost focus on what the real problem was. You can see our solution up to this point at this repo and branch.
Did We Get Faster?
In the previous post we ran our benchmarks against the current implementation, and we got the following timings.
|DenseData||7.993 s||0.0748 s||0.0700 s|
|MediumSparsity||2.154 s||0.0176 s||0.0156 s|
|HighSparsity||1.209 s||0.0134 s||0.0126 s|
These are the timings we get for our new version of
SliceMap2D with self-adjusting internals.
|DenseData||379.05 ms||5.281 ms||4.940 ms|
|MediumSparsity||113.99 ms||0.647 ms||0.574 ms|
|HighSparsity||71.89 ms||0.636 ms||0.595 ms|
It looks a little better when we plot the performance against each other.
So, it got a little faster 😊. I almost cried when I saw this. The fact that this problem has been tormenting me for over a year problem had something to do with it. We have even more gains on the horizon! There are several other things we can do to speed this up. Feel free to check out this repo and branch to see what all of the code looks like and run the benchmarks for yourself! I welcome feedback and ideas!
Please send me an email at firstname.lastname@example.org if you have any questions and subscribe so you can stay on top new posts and products I am offering.