-
Upload Video
videos in mp4/mov/flv
close
Upload video
Note: publisher must agree to add uploaded document -
Feedback
help us improve
close
Feedback
Please help us improve your experience by sending us a comment, question or concern
Please help transcribe this video using our simple transcription tool. You need to be logged in to do so.
Description
Synchronization is of paramount importance to exploit thread-level parallelism on many-core CMPs. In these architectures, synchronization mechanisms usually rely on shared variables to coordinate multithreaded access to shared data
structures thus avoiding data dependency con?icts. Lock synchronization is known to be a key limitation to performance and
scalability. On the one hand, lock acquisition through busy waiting on shared variables generates additional coherence activity which interferes with applications. On the other hand, lock contention causes serialization which results in performance
degradation. This paper proposes and evaluates GLocks, a hardware-supported implementation for highly-contended locks
in the context of many-core CMPs. GLocks use a token-based message-passing protocol over a dedicated network built on
state-of-the-art technology. This approach skips the memory hierarchy to provide a non-intrusive, extremely ef?cient and fair
lock implementation with negligible impact on energy consumption or die area.
A comprehensive comparison against the most ef?cient shared-memory-based lock implementation for a set of microbenchmarks and real applications quanti?es the goodness of GLocks. Performance results show an average reduction
of 42% and 14% in execution time, an average reduction of 76% and 23% in network traf?c, and also an average reduction of
78% and 28% in energy-delay^2 product (ED^2P) metric for the full CMP for the microbenchmarks and the real applications,
respectively. In light of our performance results, we can conclude that GLocks satisfy our initial working hypothesis. GLocks
minimize cache-coherence network traf?c due to lock synchronization which translates into reduced power consumption and
execution time.