Please help transcribe this video using our simple transcription tool. You need to be logged in to do so.


Large-scale parallelism is a common feature of many neuro-inspired algorithms. In this short paper, we present a practical tutorial on ways that metaprogramming techniques – dynamically generating specialized code at runtime and compiling it just-in-time – can be used to greatly accelerate a large data-parallel algorithm. We use filter-bank convolution, a key component of many neural networks for vision, as a case study to illustrate these tech- niques. We present an overview of several key themes in template metaprogramming, and culminate in a full example of GPU auto-tuning in which an instrumented GPU kernel template is built and the space of all possible instantiations of this kernel is automatically grid- searched to find the best implementation on various hardware/software platforms. We show that this method can, in concert with traditional hand-tuning techniques, achieve significant speed-ups, particularly when a kernel will be run on a variety of hardware platforms.

Questions and Answers

You need to be logged in to be able to post here.