-
Upload Video
videos in mp4/mov/flv
close
Upload video
Note: publisher must agree to add uploaded document -
Feedback
help us improve
close
Feedback
Please help us improve your experience by sending us a comment, question or concern
Please help transcribe this video using our simple transcription tool. You need to be logged in to do so.
Description
Recent advances in neuroscienti?c understanding make parallel computing devices modeled after the human neocortex
a plausible, attractive, fault-tolerant, and energy-ef?cient possibility. Such attributes have once again sparked an interest in
creating learning algorithms that aspire to reverse-engineer many of the abilities of the brain.
In this paper we describe a GPGPU-accelerated extension to an intelligent learning model inspired by the structural
and functional properties of the mammalian neocortex. Our cortical network, like the brain, exhibits massive amounts of
processing parallelism, making today’s GPGPUs a highly attractive and readily-available hardware accelerator for such a
model.
Furthermore, we consider two inef?ciencies inherent to our initial design: multiple kernel-launch overhead and poor
utilization of GPGPU resources. We propose optimizations such as a software work-queue structure and pipelining the
hierarchical layers of the cortical network to mitigate such problems. Our analysis provides important insight into the GPU
architecture details including the number of cores, the memory system, and the global thread scheduler. Additionally, we
create a runtime pro?ling tool for our parallel learning algorithm which proportionally distributes the cortical network across
the host CPU as well as multiple GPUs, whether homogeneous or heterogeneous, that may be available to the system. Using
the pro?ling tool with these optimizations on Nvidia’s CUDA framework, we achieve up to 60x speedup over a singlethreaded CPU implementation of the model.