TechTalks from event: Big Learning: Algorithms, Systems, and Tools for Learning at Scale

We are still uploading slides and videos for this event. Please excuse any discrepancy.

Day 1 Morning Session

  • Opening Remarks Authors: Organizers
  • GPU Metaprogramming: A Case Study in Large-Scale Convolutional Neural Networks Authors: Nicolas Pinto
    Large-scale parallelism is a common feature of many neuro-inspired algorithms. In this short paper, we present a practical tutorial on ways that metaprogramming techniques – dynamically generating specialized code at runtime and compiling it just-in-time – can be used to greatly accelerate a large data-parallel algorithm. We use filter-bank convolution, a key component of many neural networks for vision, as a case study to illustrate these tech- niques. We present an overview of several key themes in template metaprogramming, and culminate in a full example of GPU auto-tuning in which an instrumented GPU kernel template is built and the space of all possible instantiations of this kernel is automatically grid- searched to find the best implementation on various hardware/software platforms. We show that this method can, in concert with traditional hand-tuning techniques, achieve significant speed-ups, particularly when a kernel will be run on a variety of hardware platforms.
  • Poster Spotlights Authors: Poster presenters
  • A Common GPU n-Dimensional Array for Python and C Authors: Arnaud Bergeron
    Currently there are multiple incompatible array/matrix/n-dimensional base object implementations for GPUs. This hinders the sharing of GPU code and causes duplicate development work.This paper proposes and presents a first version of a common GPU n-dimensional array(tensor) named GpuNdArray~\citep{GpuNdArray} that works with both CUDA and OpenCL.It will be usable from python, C and possibly other languages.
  • NeuFlow: A Runtime Reconfigurable Dataflow Processor for Vision Authors: Yann LeCun (with Clement Farabet)
    We present a scalable hardware architecture to implement general-purpose systems based on convolutional networks. We will first review some of the latest advances in convolutional networks, their applications and the theory behind them, then present our dataflow processor, a highly-optimized architecture for large vector transforms, which represent 99% of the computations in convolutional networks. It was designed with the goal of providing a high-throughput engine for highly-redundant operations, while consuming little power and remaining completely runtime reprogrammable. We present performance comparisons between software versions of our system executing on CPU and GPU machines, and show that our FPGA implementation can outperform these standard computing platforms.