IEEE IPDPS 2011
TechTalks from event: IEEE IPDPS 2011
Note 1: Only plenary sessions (keynotes, panels, and best papers) are accessible without requiring log-in. For other talks, you will need to log-in using the email you registered for IPDPS 2011. Note 2: Many of the talks (those without a thumbnail next to the their description below) are yet to be uploaded. Some of them were not recorded because of technical problems. We are working with the corresponding authors to upload the self-recorded versions here. We sincerely thank all authors for their efforts in making their videos available.
SESSION 3: Hardware-Software Interaction
A Novel Power management for CMP Systems in Data-intensive EnvironmentThe emerging data-intensive applications of today are comprised of non-uniform CPU and I/O intensive workloads, thus imposing a requirement to consider both CPU and I/O effects in the power management strategies. Only scaling down the processorâ€™s frequency based on its busy/idle ratio cannot fully exploit opportunities of saving power. Our experiments show that besides the busy and idle status, each processor may also have I/O wait phases waiting for I/O operations to complete. During this period, the completion time is decided by the I/O subsystem rather than the CPU thus scaling the processor to a lower frequency will not affect the performance but save more power. In addition, the CPUâ€™s reaction to the I/O operations may be signi?cantly affected by several factors, such as I/O type (sync or unsync), instruction/job level parallelism; it cannot be accurately modeled via physics laws like mechanical or chemical systems. In this paper, we propose a novel power management scheme called MAR (modeless, adaptive, rule-based) in multiprocessor systems to minimize the CPU power consumption under performance constraints. By using richer feedback factors, e.g. the I/O wait, MAR is able to accurately describe the relationships among core frequencies, performance and power consumption.We adopt a modeless control model to reduce the complexity of system modeling. MAR is designed for CMP (Chip Multi Processor) systems by employing multi-input/multi-output (MIMO) theory and percore level DVFS (Dynamic Voltage and Frequency Scaling). Our extensive experiments on a physical test bed demonstrate that, for the SPEC benchmark and data-intensive (TPC-C) benchmark, the ef?ciency of MAR is 93.6-96.2% accurate to the ideal power saving strategy calculated off-line. Compared with baseline solutions, MAR could save 22.5-32.5% more power while keeping the comparable performance loss of about 1.8-2.9%. In addition, simulation results show the ef?ciency of our design for various CMP con?gurations.
Characterization of System Services and Their Performance Impact in Multicore NodesThe performance of parallel applications on large scale systems is shown to disproportionately degrade due to interference from system services. This interference from system services is also known as jitter. However, there is limited understanding of sources and patterns of jitter on multi-core systems. In this paper, we identify and characterize jitter sources in terms of their amplitude and execution interval distributions on multi-core IBM Power systems with UNIX-based general purpose operating systems: AIX and Linux. Our analysis shows that there are various kinds of jitter sources and their execution varies drastically between different cores and between hardware threads within each core for practical reasons. This in-depth knowledge of jitter events is leveraged to devise effective approaches to mitigate the jitter impact on application performance in large scale systems. Moreover, such knowledge would provide useful insights to a new generation of operating system designs such as multikernel or satellite kernel for multi-core systems.
Automatic Recognition of Performance Idioms in Scientific ApplicationsBasic data ?ow patterns that we call performance idioms, such as stream, transpose, reduction, random access and stencil, are common in scienti?c numerical applications. We hypothesize that a small number of idioms can cover most programming constructs that dominate the execution time of scienti?c codes and can be used to approximate the application performance. To check these hypotheses, we proposed an automatic idioms recognition method and implemented the method, based on the open source compiler Open64. With the NAS Parallel Benchmark (NPB) as a case study, the prototype system is about 90% accurate compared with idiom classi?cation by a human expert. Our results showed that the above ?ve idioms suf?ce to cover 100% of the six NPB codes (MG, CG, FT, BT, SP and LU). We also compared the performance of our idiom benchmarks with their corresponding instances in the NPB codes on two different platforms with different methods. The approximation accuracy is up to 96:6%. The contribution is to show that a small set of idioms can cover more complex codes, that idioms can be recognized automatically, and that suitably de?ned idioms may approximate application performance.
Iso-energy-efficiency: An approach to power-constrained parallel computationFuture large scale high performance supercomputer systems require high energy ef?ciency to achieve exa?ops computational power and beyond. Despite the need to understand energy ef?ciency in high-performance systems, there are few techniques to evaluate energy ef?ciency at scale. In this paper, we propose a system-level iso-energy-ef?ciency model to analyze, evaluate and predict energy-performance of data intensive parallel applications with various execution patterns running on large scale power-aware clusters. Our analytical model can help users explore the effects of machine and application dependent characteristics on system energy ef?ciency and isolate ef?cient ways to scale system parameters (e.g. processor count, CPU power/frequency, workload size and network bandwidth) to balance energy use and performance. We derive our iso-energy-ef?ciency model and apply it to the NAS Parallel Benchmarks on two power-aware clusters. Our results indicate that the model accurately predicts total system energy consumption within 5% error on average for parallel applications with various execution and communication patterns. We demonstrate effective use of the model for various application contexts and in scalability decision-making.