Pitfalls of adding multithreading to a Node.js project using threads.js
Multithreading can be complicated; here are three mistakes I made trying to incorporate it into a Node.js project using threads.js.
As part of my series on random number generation, I mentioned a suite of statistical tests used to determine whether a set of numbers is independent and identically distributed. I wrote these tests using Node.js and TypeScript — for convenience — without much thought given to performance and whether this approach was truly the best available to me.
When I started actually running the tests on real datasets containing a million values, I was hit with a completion time nearing three hours. I thought a possible “easy” fix would be to utilise more CPU power, as it was clear that the bulk of the time was spent running each permutation of the dataset through the statistical tests. This didn’t have to occur synchronously; each array could have its test statistics calculated independently of the others. Since Node.js v12, the
worker_threads module has been a stable feature and this seemed like a good opportunity to try it out. I knew this would not be a panacea for my performance woes, but it couldn't make things worse.
Of course, the first thing I did was look for a higher-level library that abstracts away much of the worker threads syntax, and provides the thread-pooling logic I required. The titular threads.js was the package of choice. The documentation is light but functional, and the initial implementation was straightforward. I created a worker class which takes an array of values and runs the statistical tests on said array. A thread pool can then be initialised and tasks can be queued, executing when threads become available. I began to run the program on various datasets and everything looked great — it was utilising all of my CPU’s threads, memory usage was stable, and it finished nearly four times faster in comparison to the original implementation. A closer look at the actual results, however, led to a series of mistakes which meant that I was not nearly done.
Mistake one — shuffling the array in the parent thread
The test suite needs to be run on 10,000 permutations of the original dataset, and this permuting is done using the Fisher-Yates shuffle algorithm. The main loop, after implementing the workers and pools, looked like this:
When I ran it, I noticed my results were extremely uniform, i.e. every permutation held the exact same comparison to the original dataset. Prefixing
DEBUG=threads:* to my run command enabled thread.js' internal logging, and I could see clearly that the first shuffled array sent to each worker was being reused ad infinitum. I cannot explain why this occurs (and if someone can, I'd love to know), but the resolution was to expose the shuffling method on the worker class and ask the worker to perform it instead of the parent thread:
Problem solved, and onto the next.
Mistake two — updating the counters on Promise fulfilment
This one was mostly due to wishful thinking. As a way of keeping memory efficient, the original, synchronous code would run the statistical tests on a permutation, and when complete it would immediately compare against the original dataset’s results to determine which counter (if any) to increment. I simply moved the same logic to the fulfilment handler:
Atomics object, but I opted for the simpler strategy of utilising
It does mean that the 10,000 result objects need to be processed synchronously after all the statistical tests have completed, but in my case this has negligible performance impact:
The counters will now increment sequentially and contain the correct values when all is said and done.
Mistake three — passing objects between parent and workers
If you have some familiarity with worker threads (or possibly multithreading in general), you might have noticed something missing from the previous code snippets. Running the program on small datasets worked well. However, as soon as I started using my dataset of a million numbers, I would see the memory usage climb before eventually causing an error:
The heap runs out of space and the program crashes. Even when utilising the V8 option
--max-old-space-size, unused memory is not freed fast enough and the same error occurs. When objects are sent between worker threads, their structure is copied over. It also seems like these copied objects are not cleared off the heap once they've been used. Node has the function
process.memoryUsage() which provides a snapshot of your program's memory usage. Some analysis — excessive use of
console.log() and breakpoints — showed that whenever something was returned from a worker to the parent thread, the heap usage increased, eventually resulting in the fatal error.
Transfer() takes a transferable object (
ArrayBufferLike s) and marks it to be transferred between threads. The following shows this in action when returning objects from the worker thread to the parent:
For an object that doesn’t naturally fit into an
ArrayBufferLike, the V8
deserialize() methods work to reshape it into something that will, and eventually restore it to its former glory. Now when I ran the code, the heap usage did not increase at a rate the garbage collector cannot maintain, the results were accurate, and order seemed to be restored — until I noticed a spike in memory usage to 10 GB.
Uint32Array's constructor to convert it into an
ArrayBufferLike which can then be used with
If the functions your worker runs rely on non-
ArrayBufferLike variables, then you need to make sure to convert the input to the type expected by those functions:
With this final piece of duct tape applied to my leaky code, the program’s memory usage hovers around 1 GB with twelve worker threads.
The end result is a script which can employ multiple workers, each picking away at the dataset. Running the program on 1,000,000 values with one thread takes about two hours and forty-five minutes to complete. With twelve threads, it takes about forty-five minutes, roughly 3.5 times faster. A solid gain, but also indicative of other efficiencies that can be made in the statistical test implementations themselves. Retrospectively, opting for convenience and familiarity in using Node.js might not have been the best choice — perhaps another language with multithreading idiosyncrasies would have provided a better overall experience. However, that investigation is for another time and possibly another post.
The code snippets in this post omit a lot of detail in my worker thread usage. If you would like to see how it actually all works together, feel free to check out the project’s source code.
Originally published at https://www.zaljubouri.com/ on April 25, 2021.