-
Upload Video
videos in mp4/mov/flv
close
Upload video
Note: publisher must agree to add uploaded document -
Upload Slides
slides or other attachment
close
Upload Slides
Note: publisher must agree to add uploaded document -
Feedback
help us improve
close
Feedback
Please help us improve your experience by sending us a comment, question or concern
Please help transcribe this video using our simple transcription tool. You need to be logged in to do so.
Description
We have recently started investigating how to scale deep learning
techniques to much larger models in an effort to improve the accuracy
of such models in the domains of computer vision, speech recognition,
and natural language processing. Our largest models to date have more
than 1 billion parameters, and we utilize both supervised and
unsupervised training in our work. In order to train models of this
scale, we utilize clusters of thousands of machines, and exploit both
model parallelism (by distributing computation within a single replica
of the model across multiple cores and multiple machines) and data
parallelism (by distributing computation across many replicas of these
distributed models). In this talk I'll describe the progress we've
made on building training systems for models of this scale, and also
highlight a few results for using these models for tasks that are
important to improving Google's products.
This talk describes joint work with Kai Chen, Greg Corrado, Matthieu
Devin, Quoc Le, Rajat Monga, Andrew Ng, MarcAurelio Ranzato, Paul
Tucker, and Ke Yang.