Template Metaprogramming for Massively Parallel Scientific Computing
List of references, interesting links, and code examples with further details about the lecture series "Template Metaprogramming for Massively Parallel Scientific Computing" Presented at the inverted CERN School of Computing 2016.
Modern C++ fetaures
- A nice, often updated cheatsheet of C++11/14/17/20 features
L1 - Expression Templates
Sources
Sources for the simple Expression Template library built according the slides. You can directly reproduce the performance tests from these sources. Try to add more operators, build more complicated expressions, and look how the performance compares to the simple OO version.
cms_page_media/3/l1-simpletest.zip
Links
- A page about Template Metaprogramming, includes the source of the first C++ metaprogram
- C++11 move semantics vs. Expression Templates, includes an ET library implementation which served as basis for this lecture
- A presentation and an article about "Smart Expression Templates" show how the simple ET idiom isn't a silver bullet, and doesn't really deliver especially performance-wise
- Nice tutorial on Expression Evaluation (not Templates) and Abstract Syntax Tree parsing in C++. You can imagine this implemented using the ET idiom for a full ET library
L2 - Vectorization with Expression Templates
Sources
Sources for an integrator of equations of motion of a system of particles in electromagnetic field. There are two versions, both built using the provided Makefile. One is a pure serial verison with an AoS data layout, the other is an explicit AoSoA layout, where you have to manually specify the size of the register in the header file. You can reproduce the performance tests using these sources.
cms_page_media/3/simd-particles.zip
Simplified and super-simplified versions of Vincenzo Innocete's "UltimateSoA" illustrating core concepts as I understood them and explained in the lectures
cms_page_media/3/ultimatesoa-simplified.zip
Links
- TACC presentation about vectorization, including info about aliasing, dependencies, and auto-vectorization
- Notes on CPU caches by Scott Meyers
- Gallery of Processor Cache Efffects - with samples an performance analysis
- Discussion about AoS-SoA transformations (note the links therein)
- Possible approach to implement SoA storage, tutorial-style
- An interesting paper about implementing SoA storage with focus on CUDA applications
- Arrow Street - a C++ tepmlate library for semi-automatic SoA / AoSoA storage by Intel (and a presentation)
- UltimateSoA by Vincenzo Innocente - my favourite C++ SoA binding which served as the basis of presented examples
L3 - Templates for Iteration; Thread-level Parallelism
Sources
Sources for the FDTD Maxwell equation integrator which incorporates Expression Templates, SoA storage (easily subsituted for AoS by using an STL container), and functional-style iteration. It requires the BOOST libraries, and is built using CMake. It has been ripped out of a much larger codebase and simplified. I haven't ported the usit tests, so don't use it in important projects, as there might be bugs. If you do want to use it though, dont hesitate to contact me at jiri@vysko.cz, I'd be more than happy to help you adapt the sources for your purspose.
Notes
- Beware the slides about multi-dimensional container implementation, which I skipped due to lack of time during the lecture. They were only supposed to illustrate an approach to chaining operator[], but ignored the issues of compact data storage using nested std::vectors. Had you tried implementing it without changing to compact sotrage, the preformance would be horrible.
Links
- Nice tutorials on Lambda Expressions in C++
- C++ Core Guidelines, and Microsoft's Guidelines Support Library
- Recent development of the C++17 standard, note especially the Parallelism TS, and Ranges proposals
- "array_view" proposal (recently dropped in favour of "span" concept - see GSL)
- Interesting multi-dimensional array implementations (also see "span" from the GSL)
- HPX - a general purpose C++ runtime system for parallel and distributed applications