-
Upload Video
videos in mp4/mov/flv
close
Upload video
Note: publisher must agree to add uploaded document -
Upload Slides
slides or other attachment
close
Upload Slides
Note: publisher must agree to add uploaded document -
Feedback
help us improve
close
Feedback
Please help us improve your experience by sending us a comment, question or concern
Please help transcribe this video using our simple transcription tool. You need to be logged in to do so.
Description
High performance and relatively low cost of GPU-based platforms provide an attractive alternative for general purpose
high performance computing (HPC). However, the emerging HPC applications have usually stricter output correctness requirements than typical GPU applications (i.e., 3D graphics). This paper ?rst analyzes the error resiliency of GPGPU platforms using a fault injection tool we have developed for commodity GPU devices. On average, 16-33% of injected faults
cause silent data corruption (SDC) errors in the HPC programs executing on GPU. This SDC ratio is signi?-cantly higher
than that measured in CPU programs (<2.3%). In order to tolerate SDC errors, customized error detectors are strategically
placed in the source code of target GPU programs so as to minimize performance impact and error propagation and maximize
recoverability. The presented Hauberk technique is deployed in seven HPC benchmark programs and evaluated using a fault
injection. The results show a high average error detection coverage (87%) with a small performance overhead (15%).