Synchronization is of paramount importance to exploit thread-level parallelism on many-core CMPs. In these architectures, synchronization mechanisms usually rely on shared variables to coordinate multithreaded access to shared data structures thus avoiding data dependency con?icts. Lock synchronization is known to be a key limitation to performance and scalability. On the one hand, lock acquisition through busy waiting on shared variables generates additional coherence activity which interferes with applications. On the other hand, lock contention causes serialization which results in performance degradation. This paper proposes and evaluates GLocks, a hardware-supported implementation for highly-contended locks in the context of many-core CMPs. GLocks use a token-based message-passing protocol over a dedicated network built on state-of-the-art technology. This approach skips the memory hierarchy to provide a non-intrusive, extremely ef?cient and fair lock implementation with negligible impact on energy consumption or die area. A comprehensive comparison against the most ef?cient shared-memory-based lock implementation for a set of microbenchmarks and real applications quanti?es the goodness of GLocks. Performance results show an average reduction of 42% and 14% in execution time, an average reduction of 76% and 23% in network traf?c, and also an average reduction of 78% and 28% in energy-delay^2 product (ED^2P) metric for the full CMP for the microbenchmarks and the real applications, respectively. In light of our performance results, we can conclude that GLocks satisfy our initial working hypothesis. GLocks minimize cache-coherence network traf?c due to lock synchronization which translates into reduced power consumption and execution time.
Questions and AnswersYou need to be logged in to be able to post here.