In this section I am not going to provide lengthy tutorials, it's easy to find a lot of them on the Internet (once you know what you are looking for). I am going to provide a brief overview of what is available out there along with short descriptions and references, so that you will get some starting point for further study.
The first thing to say is that there are two main categories of libraries and technologies. The first category is targeted at parallel computations (HPC), that is, it will help you to parallelize computations (like matrix multiplication, sorting, etc). OpenMP, Cilk++, TBB, QuickThread, Ct, PPL, .NET TPL, Java Fork/Join fall into this category.
The second category is targeted at general concurrency, that is, it will help to implement things like multi-threaded servers, rich client application, games, etc. TBB, QuickThread, just::thread, PPL, AAL, AppCore fall into this category. .NET and Java standard libraries also include some things that may be of help (like concurrent containers, synchronization primitives, thread pools, etc).
As you may notice, some libraries are mentioned in both categories. For example, Intel TBB contains some primitives that help with parallel computations (parallel algorithms and task scheduler), and some things that help with general concurrency (threads, synchronization primitives, concurrent containers, atomics).
Efficient and scalable parallel computations is a rather difficult field, it's easy to create a parallel program that executes slower than original single-threaded version (and the more cores you add the slower it executes). So, if you are not doing something extraordinary, you may consider using internally parallelized libraries like Intel MKL, Intel IPP, AMD ACML. They are not only internally parallelized, they are also highly optimized and will be updates for future architectures. So you only need to call a function for matrix multiplication, FFT, video stream decoding, image processing or whatever you need.