TechTalks from event: IEEE IPDPS 2011

Note 1: Only plenary sessions (keynotes, panels, and best papers) are accessible without requiring log-in. For other talks, you will need to log-in using the email you registered for IPDPS 2011. Note 2: Many of the talks (those without a thumbnail next to the their description below) are yet to be uploaded. Some of them were not recorded because of technical problems. We are working with the corresponding authors to upload the self-recorded versions here. We sincerely thank all authors for their efforts in making their videos available.

Intel Platinum Patron Night

  • Architecting Parallel Software: Design patterns in practice and teaching Authors: Michael Wrinn, Intel
    Design patterns can systematically identify reusable elements in software engineering, and have been particularly effective in codifying practice in object-oriented software. A team of researchers centered at UC Berkeley’s Parallel Computing Laboratory continues to investigate a design pattern approach to parallel software; the effort has matured to the point that an undergraduate course was delivered on the topic in Fall 2010. This talk will briefly describe the pattern language itself, then demonstrate its application in examples from both image processing and game design.
  • Teaching Parallelism Using Games Authors: Ashish Amresh, Intel; Amit Jindal, Intel
    Academic institutions do not have to spend expensive multi-core hardware to support game-based courses to teach parallelism. We will discuss what teaching methodologies educators can use for integrating parallel computing curriculum inside a game engine. We will talk about the full game development process, from game design to game engineering and how parallelism is critical. We will show five game demos that mirror current trends in the industry and how educators can use in these games in the classroom. We will also show the learning outcomes, what parallelism topics are appropriate to teach students at various levels. We will demonstrate how to take games running serially and modify them to run parallel.
  • Starting Your Future Career at Intel Authors: Dani Napier, Intel; Lauren Dankiewicz, Intel
    Intel's Dani Napier will introduce why Intel is a great place to work-- it's challenging, has great benefits and is abundant with rewarding growth opportunities. She will expand on why parallelism is crucial to Intel's growth strategy and give an overview of the various types of jobs in which knowledge of parallel and distributed processing apply at Intel. Finally, Dani will explain the new hire development process and why Intel is the company that will help you become successful in your desired career path. Lauren Dankiewicz will discuss her background from the University of California, Berkeley. She gives an insightful and humorous commentary on the interview process at Intel, drawing similarities to dating. Lauren describes the excitement, the uncertainty, and what it takes to make the right choice! Listen to this fun and engaging real-life clip of how an intern became a full-time employee at Intel.
  • Opening Remarks Authors:
    Intel Platinum Patron Night will be held on Thursday evening, 5:30-8:30pm, in the Kuskokwim Ballroom. This will be an exciting opportunity for IPDPS attendees to network and learn about the Intel Academic Community’s free resources to support parallel computing research and teaching. Intel recruiters will share information about engineering internships and careers for recent college graduates.

25th Year IPDPS Celebration

SESSION 2: Communication & I/O Optimization

  • Communication-Avoiding QR Decomposition for GPUs Authors: Michael Anderson (University of California, Berkeley, USA); Grey Ballard (UC Berkeley, USA); James Demmel (University of Califo
    The increasing energy demand coupled with emerging sustainability concerns requires a re-examination of power/thermal issues in data centers from the perspective of short term energy de?ciencies. Such energy de?cient scenarios arise for a variety of reasons including variable energy supply from renewable sources and inadequate power, thermal and cooling capacities. In this paper we propose a hierarchical control scheme to adapt assignments of tasks to servers in a way that can cope with the varying energy limitations and still provide necessary QoS . The rescheduling of tasks on different servers has direct (migration related) and indirect (changed traf?c patterns) network energy impacts that we also consider. We show the stability of our scheme and evaluate its performance via detailed simulations and experiments.
  • Overlapping Computation and Communication for Advection on Hybrid Parallel Computers Authors: James B White (National Center for Atmospheric Research, USA); Jack Dongarra (University of Tennessee, Knoxville, USA)
    We describe computational experiments exploring the performance improvements from overlapping computation and communication on hybrid parallel computers. Our test case is explicit time integration of linear advection with constant uniform velocity in a three-dimensional periodic domain. The test systems include a Cray XT5, a Cray XE6, and two multicore In?niband clusters with different generations of NVIDIA graphics processing units (GPUs). We describe results for Fortran implementations using various combinations of MPI, OpenMP, and CUDA, with and without overlap of computation and communication. We ?nd that overlapping CPU computation, GPU computation, parallel communication, and CPU-GPU communication can provide performance improvements of more than a factor of two.
  • VisIO: Enabling Interactive Visualization of Ultra-Scale, Time Series Data via High-Bandwidth Distributed I/O Systems Authors: Christopher Mitchell (University of Central Florida, USA); James Ahrens (Los Alamos National Laboratory, USA); Jun Wang (Univer
    Petascale simulations compute at resolutions ranging into billions of cells and write terabytes of data for visualization and analysis. Interactive visualization of this time series is a desired step before starting a new run. The I/O subsystem and associated network often are a signi?cant impediment to interactive visualization of time-varying data; as they are not con?gured or provisioned to provide necessary I/O read rates. In this paper, we propose a new I/O library for visualization applications: VisIO. Visualization applications commonly use N-to-N reads within their parallel enabled readers which provides an incentive for a shared-nothing approach to I/O, similar to other data-intensive approaches such as Hadoop. However, unlike other data-intensive applications, visualization requires: (1) interactive performance for large data volumes, (2) compatibility with MPI and POSIX ?le system semantics for compatibility with existing infrastructure, and (3) use of existing ?le formats and their stipulated data partitioning rules. VisIO, provides a mechanism for using a non-POSIX distributed ?le system to provide linear scaling of I/O bandwidth. In addition, we introduce a novel scheduling algorithm that helps to co-locate visualization processes on nodes with the requested data. Testing using VisIO integrated into ParaView was conducted using the Hadoop Distributed File System (HDFS) on TACC’s Longhorn cluster. A representative dataset, VPIC, across 128 nodes showed a 64.4% read performance improvement compared to the provided Lustre installation. Also tested, was a dataset representing a global ocean salinity simulation that showed a 51.4% improvement in read performance over Lustre when using our VisIO system. VisIO, provides powerful high-performance I/O services to visualization applications, allowing for interactive performance with ultra-scale, time-series data.
  • Architectural constraints to attain 1 Exaflop/s on three scientific application classes Authors: Abhinav Bhatele (University of Illinois at Urbana-Champaign, USA); Pritish Jetley (University of Illinois at Urbana-Champaign,
    The first Teraflop/s computer, the ASCI Red, became operational in 1997, and it took more than 11 years for a Petaflop/s performance machine, the IBM Roadrunner, to appear on the Top500 list. Efforts have begun to study the hardware and software challenges for building an exascale machine. It is important to understand and meet these challenges in order to attain Exa?op/s performance. This paper presents a feasibility study of three important application classes to formulate the constraints that these classes will impose on the machine architecture for achieving a sustained performance of 1 Exaflop/s. The application classes being considered in this paper are – classical molecular dynamics, cosmological simulations and unstructured grid computations (?nite element solvers). We analyze the problem sizes required for representative algorithms in each class to achieve 1 Exaflop/s and the hardware requirements in terms of the network and memory. Based on the analysis for achieving an Exaflop/s, we also discuss the performance of these algorithms for much smaller problem sizes.