Web Performance Calendar

The speed geek's favorite time of year
2016 Edition

Peter van der Zee (@kuvos) is a freelance JavaScript engineer (c80.nl) specializing in vanilla JavaScript, optimization, code rewriting, and parsers.

When your library spends half its runtime in the GC it’s usually a bad sign. But sometimes … well sometimes it’s unavoidable. This is a story about finite domain solving. The queeste of finding a valuation for a set of variables such that a set of constraints still holds. An easy problem conceptually but a hard problem to actually solve generically.

My current contract is to improve a library called “finitedomain” (github) which is used by the Grid. It’s a fork of “FD.js” (github) which, I believe, was the result of a research paper. As such it was sound, but not very fast. My task was to improve performance because at the time the runtime was just dramatic for even relatively small problems. I say relative because those problems were already pretty big, but they pale in comparison to what the library can handle these days.

One example of the first low hanging fruit I picked was the initial representation of domains. A domain, in this context, is the list of values that a variable “might” attain as long as the constraints allow it. There is no guarantee that this is possible and that’s exactly what this library is supposed to determine. In “finite domain solving” the domain consists of integers. I know, shocking. In this library we only concern ourselves with non-negative integers up to an arbitrary integer value.

You could put all the numbers of the domain individually in an array but that could mean arrays with millions of values for no good reason. That’s no good so we need to compress them. Instead we use an array of pairs of <lo, hi>, inclusive. For example, <10, 340> would represent a domain of 331 elements (not 330!); all numbers from 10 through 340, inclusive. Initially these domains were represented as arrays of arrays like [[0, 5], [20, 25]], meaning the set {0, 1, 2, 3, 4, 5, 20, 21, 22, 23, 24, 25}.

While the “arrays of arrays” approach makes sense it’s far from efficient. Especially when you realize these arrays were being copied a few thousand times during a search. So whenever you copy a domain you’re generating a couple more arrays. I did outline the GC being a central problem, did I not?

The representation of domains went through a few stages during the course of refactoring. The first change was pretty simple; flatten the arrays. Instead of [[0, 5], [20, 25]] simply make it [0, 5, 20, 25] and have to code be “aware” of semantic data structure of the array. If I remember correctly this cut the runtime in half or something. That was crazy because the change was relatively simple, even though the core domain manipulation methods had to be adjusted of course.

The next representations were a little more involved and were not as profitable. One is the domain as a bitfield number. The idea is that the domain is a single number and every bit in that number represents whether or not the number by the bits index is included or not. In JS the bitwise operators (&, |, ^, ~, <<, >>, >>>) only work on 32bit numbers so you have 32 bits. That means that these numbers could only be used for domains whose largest value was below 32. That turned out to be the majority of the domains, although your mileage may vary here depending on your input problem. This representation was super fast for certain operations, like intersecting two domains (that would be now be the simple bitwise & operation) while others suffered (checking whether the domain was solved or counting the number of elements in a domain).

Another representation in order to combat the GC time was to encode the arrays as strings. This is a relatively straightforward change where you take each number in the domain and encode it as a double character. We need two chars because strings in JS are 16bit while the largest valid value in our library could be 27bit. This means domain manipulation of those kinds of domains consists of doing a .charCodeAt() and .fromCharCode(). Those are actually quite fast so it’s not a big deal.

The last representation was to encode “solved” domains, those with exactly one value left, as a number whose most significant bit was set. This overlaps with the other number representation so for those we reduced the largest allowed value from 32 to 31. That way we can “safely” check whether the last bit is set and branch accordingly.

This encoding works great but it makes the code a little convoluted. Every domain has three possible representations (dubbed “strdom”, “numdom”, and “soldom”, oh and don’t forget “arrdom” for api input) internally so code manipulating domains has three different paths to handle these cases. This leads to a lot of duplicate code both literally as well as semantically. That’s still a problem that needs to be resolved and this may result in dropping the “numdom” in favor of simpler code.

Getting back to the GC problem, domains have been eliminated so what’s next? Variables. Not the JS vars, but rather the variables within the library (you know, those things that have a domain you’re trying to solve). These were initially each represented by a class. And that makes sense, don’t get me wrong. The “fdvar” class would have a name, a domain, a flag which indicates the last update index, and some meta stats. The class instances had the advantage of being passed around by reference and having all the data “bundled” in a single object. The flag was important for optimization purposes of detecting whether another pass was required without a full rescan.

This class works okay if you only got a handful of problems but it scales terribly. The search will clone all vars at each search step because it needs to backtrack. So not only are you cloning the domains, but also the vars. That means duplicating the var name. Which shouldn’t be a huge problem when the engine “interns” the string, but the class footprint will still be bigger for it. And you do have this extra object to begin with. GC’s don’t to have like many objects and we’re generating thousands of them very quickly.

Fast forward to today; the class has been dropped and fdvars are now represented by an array of variable names and an(other) array of domains. Actually, the array of names has been centralized, so it’s not duped at every branch. Only the list of domains (strings and numbers) gets duplicated. That’s much much better! As a bonus the vars are internally identified by their index on these arrays. That means passing around numbers rather than arbitrary strings which gives a small performance boost as well. Comparing numbers is much cheaper than comparing strings (relatively speaking, on this level).

Getting back to the scaling, these arrays eventually exposed a significant problem. While profiling I noticed I kept seeing the internal code for indexOf flaring up in the profile. It was taking a lot of time for no good reason. At this point we scaled up the problems from a few dozen variables to thousands of vars. Tens of thousands of vars actually. Turns out that the code wasn’t quite ready for this scale and the tools were trying to tell me that while profiling. The indexOf turned out to be caused by variable name lookups. The public api end points require referring to vars by their name as the index is intentionally kept internal to the library. So whenever you declare a new constraint the library has to first figure out the index of those vars before doing anything else. You’d have that list of unordered variable names and you’d do an indexOf on it to resolve whether it exists and, if so, what it’s internal index should be. That’s fine to do on tens and even a few hundereds of elements. Not multiple times per element on thousands of elements. Wow!

The solution wasn’t very straightforward either. You could order the array and do a binary search but the sorting would be expensive (heavy array operations on thousands of elements, ugh) and the binary search would need manual implementation. Okay that’s actually simpler to do, but still, there’s no out of the box solution for that. You might think “just use an object”, but that solution doesn’t scale well with objects of thousands of arbitrary keys. We needed a different data structure for name lookups. I landed on a Trie.

A Trie, not a miss spelling of “tree”, is a structure where each step of the tree branches on one character of the string. Oh just check wikipedia. There’s little point in explaining it in depth here. The result is that the lookup time for an arbitrary string on 20.000 elements was reduced from a worst case of 20.000 to twice the fixed length of the string (in other words, from O(n) down to O(len), completely different orders). These strings aren’t exactly 20.000 characters so you can see that the perf gain is enormous. And that showed. The library runtime went down to something like 40% or 50% of what it was before.

As it stands the benchmark I use for making sure nothing breaks is a real-world input problem of 40.000 vars and 35.000 constraints (generating about 65.000 propagators). That’s huge, even by the problem’s theoretical standards. Yet it solves this in under a second. Heck, in almost half a second on my machine.

Yet, about half this time is spent in the GC. The reason is that for every step of the search, all domains are cloned. Those domains are then reduced during the next step but length of the domain array needs to persists throughout the search because the var indexes rely on it. So ultimately the domains may start out as strdoms but should result in soldoms, at least for the targeted vars. Be as it may, you’re still copying arrays of 40.000 elements on every step. That just doesn’t come cheap.

The domains need to be copied because everything is internally using indexes. If you would copy only the “unsolved” variables you would need to update the index in all of the library to make sure they point to the new right place. These indexes are used in many places, some are even central to the search config which means you’d need to deal with that too. So that’s a problem. You could try to clone the entire class and have the clone reduce its internals but that incurs a lot of different kind of overhead that’s not very trivial to change either.

There is also the list of unsolved variables that you need to track. As the name says, unsolved variables have a domain of at least two elements that still require reduction. We need a list of these because at every search branch we need to determine the next variable to “force” a decision on. We only pick from the vars that still need to be decided on and we don’t want to search for them through 40.000 vars if there are only 50 left to solve. This too could be solved in a different way but this too has its drawbacks short of a major rewrite. And those are expensive, not without danger, and may end up not being any better anyways.

The library in question, finitedomain, is open source and can be found on github: https://github.com/the-grid/finitedomain/

I really like working on the library. It’s posing an interesting challenge, it’s low level vanilla JS, it has zero dependencies, and it’s doing a crazy fast job on a problem of academic level difficulty.