FastFlow (C++)

FastFlow is a parallel programming framework for multi-core platforms based upon non-blocking lockfree/fencefree synchronization mechanisms. The framework is composed of a stack of layers that progressively abstracts out the programming of shared-memory parallel applications. The goal of the stack is twofold: to ease the development of applications and make them very fast and scalable. FastFlow is particularly targeted to the development of streaming applications.

Vanilla or other flavours. In FastFlow, different layers are targeted to support different kinds of programmer. FastFlow can be directly used to set up an arbitrary network of parallel activities (low-level programming layer); at this level, similar to what happens when programming with POSIX threads, any orchestration of parallel activities can be expressed. However, as for POSIX threads, writing a correct and efficient program is a non-trivial activity. FastFlow synchronizations are usually faster than POSIX ones. At the next layer up (high-level programming layer), FastFlow provides programmers with a number of pre-defined parametric programming patterns (i.e. skeletons); at this level, similar to what happens when programming with Intel TBB, some orchestration of parallel activities can be expressed: programs are composed by configuring and combining patterns (skeletons), which have an optimised implementation. Writing a correct and efficient program is relatively easy (also see tutorial page). FastFlow-based applications have a speed edge on TBB-based ones.

The FastFlow high-level skeletal layer can be further abstracted (using skeletons as object factories) to define Problem Solving Environments (PSEs), which are programming frameworks designed to ease the development of efficient parallel applications in a specific domain. As an example, we are currently working on the following PSEs:

- FastFlow accelerator and offloading (completed);

- Parallel Monte Carlo and Gillespie simulations (ongoing);

- Parallel macro data-flow interpretation with automatic parallelization features supporting skeletal programming (ongoing);

- An extension of Intel TBB supporting general streaming networks (ongoing);

- A (blazing fast) parallel memory allocator that is faster than hoard and TBB allocators (completed).

The three described layers are intended for three kinds of user, respectively: FastFlow designers, skilled programmers (with some knowledge of parallel programming), and casual programmers (e.g. application domain experts).

FastFlow is fast. We experimentally demonstrate that FastFlow is always more efficient than state-of-the-art multi-core programming frameworks in a set of micro-benchmarks and on a real world applications; the speedup edge of FastFlow over other solutions might be substantial for fine grain tasks, as an example +35% on OpenMP, +226% on Cilk, +96% on TBB for the alignment of protein P01111 against UniProt DB using the Smith-Waterman algorithm.

For more information refer to the official FastFlow website.