TechTalks from event: IEEE IPDPS 2011

Note 1: Only plenary sessions (keynotes, panels, and best papers) are accessible without requiring log-in. For other talks, you will need to log-in using the email you registered for IPDPS 2011. Note 2: Many of the talks (those without a thumbnail next to the their description below) are yet to be uploaded. Some of them were not recorded because of technical problems. We are working with the corresponding authors to upload the self-recorded versions here. We sincerely thank all authors for their efforts in making their videos available.

Intel Platinum Patron Night

  • Architecting Parallel Software: Design patterns in practice and teaching Authors: Michael Wrinn, Intel
    Design patterns can systematically identify reusable elements in software engineering, and have been particularly effective in codifying practice in object-oriented software. A team of researchers centered at UC Berkeley’s Parallel Computing Laboratory continues to investigate a design pattern approach to parallel software; the effort has matured to the point that an undergraduate course was delivered on the topic in Fall 2010. This talk will briefly describe the pattern language itself, then demonstrate its application in examples from both image processing and game design.
  • Teaching Parallelism Using Games Authors: Ashish Amresh, Intel; Amit Jindal, Intel
    Academic institutions do not have to spend expensive multi-core hardware to support game-based courses to teach parallelism. We will discuss what teaching methodologies educators can use for integrating parallel computing curriculum inside a game engine. We will talk about the full game development process, from game design to game engineering and how parallelism is critical. We will show five game demos that mirror current trends in the industry and how educators can use in these games in the classroom. We will also show the learning outcomes, what parallelism topics are appropriate to teach students at various levels. We will demonstrate how to take games running serially and modify them to run parallel.
  • Starting Your Future Career at Intel Authors: Dani Napier, Intel; Lauren Dankiewicz, Intel
    Intel's Dani Napier will introduce why Intel is a great place to work-- it's challenging, has great benefits and is abundant with rewarding growth opportunities. She will expand on why parallelism is crucial to Intel's growth strategy and give an overview of the various types of jobs in which knowledge of parallel and distributed processing apply at Intel. Finally, Dani will explain the new hire development process and why Intel is the company that will help you become successful in your desired career path. Lauren Dankiewicz will discuss her background from the University of California, Berkeley. She gives an insightful and humorous commentary on the interview process at Intel, drawing similarities to dating. Lauren describes the excitement, the uncertainty, and what it takes to make the right choice! Listen to this fun and engaging real-life clip of how an intern became a full-time employee at Intel.
  • Opening Remarks Authors:
    Intel Platinum Patron Night will be held on Thursday evening, 5:30-8:30pm, in the Kuskokwim Ballroom. This will be an exciting opportunity for IPDPS attendees to network and learn about the Intel Academic Community’s free resources to support parallel computing research and teaching. Intel recruiters will share information about engineering internships and careers for recent college graduates.

25th Year IPDPS Celebration

SESSION 20: Operating Systems and Resource Management

  • Reducing fragmentation on torus-connected supercomputers Authors: Wei Tang (Illinois Institute of Technology, USA); Zhiling Lan (Illinois Institute of Technology, USA); Narayan Desai (Argonne N
    Torus-based networks are prevalent on leadership-class petascale systems, providing a good balance between network cost and performance. The major disadvantage of this network architecture is its susceptibility to fragmentation. Many studies have attempted to reduce resource fragmentation in this architecture. Although the approaches suggested can make good allocation decisions reducing fragmentation at job start time, none of them considers a job’s walltime, which can cause resource fragmentation when neighboring jobs do not complete closely. In this paper, we propose a walltime-aware job allocation strategy, which adjacently packs jobs that ?nish around the same time, in order to minimize resource fragmentation caused by job length, discrepancy. Event-driven simulations using real job traces from a production Blue Gene/P system at Argonne National Laboratory demonstrate that our walltime-aware strategy can effectively reduce system fragmentation and improve overall system performance.
  • Co-Analysis of RAS Log and Job Log on Blue Gene/P Authors: Ziming Zheng (Illinois Institute of Technology, USA); Li Yu (Illinois Institute of Technology, USA); Wei Tang (Illinois Institu
    With the growth of system size and complexity, reliability has become of paramount importance for petascale systems. Reliability, Availability, and Serviceability (RAS) logs have been commonly used for failure analysis. However, analysis based on just the RAS logs has proved to be insuf?cient in understanding failures and system behaviors. To overcome the limitation of this existing methodologies, we analyze the Blue Gene/P RAS logs and the Blue Gene/P job logs in a cooperative manner. From our co-analysis effort, we have identi?ed a dozen important observations about failure characteristics and job interruption characteristics on the Blue Gene/P systems. These observations can signi?cantly facilitate the research in fault resilience of large-scale systems.
  • Decal: Transparent Checkpointing and Process Migration of OpenCL Applications Authors: Hiroyuki Takizawa (Tohoku University, Japan); Kentaro Koyama (Tohoku University, Japan); Katsuto Sato (Tohoku University, Japan
    In this paper, we propose a new transparent checkpoint/restart (CPR) tool, named CheCL, for high-performance and dependable GPU computing. CheCL can perform CPR on an OpenCL application program without any modi?cation and recompilation of its code. A conventional checkpointing system fails to checkpoint a process if the process uses OpenCL. Therefore, in CheCL, every API call is forwarded to another process called an API proxy, and the API proxy invokes the API function; two processes, an application process and an API proxy, are launched for an OpenCL application. In this case, as the application process is not an OpenCL process but a standard process, it can be safely checkpointed. While CheCL intercepts all API calls, it records the information necessary for restoring OpenCL objects. The application process does not hold any OpenCL handles, but CheCL handles to keep such information. Those handles are automatically converted to OpenCL handles and then passed to API functions. Upon restart, OpenCL objects are automatically restored based on the recorded information. This paper demonstrates the feasibility of transparent checkpointing of OpenCL programs including MPI applications, and quantitatively evaluates the runtime overheads. It is also discussed that CheCL can enable process migration of OpenCL applications among distinct nodes, and among different kinds of compute devices such as a CPU and a GPU.