Now, we must decide as to whether to use work-distribution and/or balancing in addition to work-requesting. Work-balancing is too cumbersome to implement and my educated guess is that it's not going to provide any significant speedup (it's more suitable for work DAGs with no “busy leaves” property).
Work distribution is a different story. It's easier to implement, and may provide some speedup. I decided to implement work distribution only on initial stage. Namely, all initial game states are evenly distributed across all worker threads in a round-robin fashion. So each worker thread has quite a lot of work to begin with, and work-requesting comes into play towards an end of computation.
Move on to Scheduler Algorithm