The ongoing explosion in variety and horsepower of commodity parallel hardware has engendered a renaissance in parallel computing. Once exclusively the domain of specialists, parallel computing has become a staple among developers, with numerous educational and popular sources devoting resources to the topic. Countering the zeal to parallelize anything and everything is the spectre of Amdahl’s Law (see Jace Mogill’s handy primer) which cautions that the sequential portion of a code is its ultimate barrier to speedup. A large class of codes, though, are invoked iteratively via loops. For at least some such codes the sequential portions are similar enough across iterations to be profitablly hoisted outside the loop, further streamlining an implicit “loopless” parallelization of the underlying body. Appealing as this might be, there is no common framework for implementing the idea, nor even a common phrase to describe it, at the application level.
Loopless reinvocation is not a new idea, and often occurs as a consequence of the expressive power of the programming language at hand. Parallelizing compilers have been able to optimize across entire multidimensional array sections for years. The R language, popular among statisticians and data modelers, achieves loopless reinvocation of key functions by means of its “apply” family of operators. These advantages derive primarily from the embarassingly-parallel nature of the invoked workload as well as from the known structure, from the perspective of the caller, of the domain of invocation. As noted, foreknowledge of parallelization-inhibiting barriers, or “critical sections”, could yield further benefits.
The idea of critical sections has been familiar for decades. Developers have long understood how to isolate critical sections from parallelizable segments of code within an application. With the advent of whole-program compilation, moreover, it has become practical to parallelize loops containing calls into libraries. It is now possible for library and application developers to collaborate, realizing more fully the potential for loop-level parallelization. When application and tool developers work independently,though, it is much harder to exploit this potential. The tool developer, for example, may not wish to expose internal details and may be largely unaware of actual real-world scenarios for invoking the tool. The application developer, similarly, may be reticent about these scenarios, for fear of revealing “secret sauce”. There may yet be a middle ground, a sort of contract, from which both groups can profit.
Our own product, the Arborist, is a data-mining tool which benefits from such an approach. The Arborist builds and evaluates decision trees to build predictive models. For a growing collection of domains and invocation scenarios, we allow the user to specify entire ranges of cases on which to iterate looplessly. For example, the Arborist can be invoked in variational and resampling settings without the need to reprocess arguments and restage data. We enforce a contract in the sense that the user can specify only a currently-supported domain or scenario, and we guarantee our best to achieve full performance. More solutions to the challange of iterative invocation are undoubtedly possible, and we would like to hear yours.
Mark Seligman is a Principal at Rapidics, which offers accelerated, scalable decision-tree solutions either in the Nimbix cloud or in the user’s own environment.