IEEE IPDPS 2011
TechTalks from event: IEEE IPDPS 2011
Note 1: Only plenary sessions (keynotes, panels, and best papers) are accessible without requiring log-in. For other talks, you will need to log-in using the email you registered for IPDPS 2011. Note 2: Many of the talks (those without a thumbnail next to the their description below) are yet to be uploaded. Some of them were not recorded because of technical problems. We are working with the corresponding authors to upload the self-recorded versions here. We sincerely thank all authors for their efforts in making their videos available.
Intel Platinum Patron Night
Architecting Parallel Software: Design patterns in practice and teachingDesign patterns can systematically identify reusable elements in software engineering, and have been particularly effective in codifying practice in object-oriented software. A team of researchers centered at UC Berkeleyâ€™s Parallel Computing Laboratory continues to investigate a design pattern approach to parallel software; the effort has matured to the point that an undergraduate course was delivered on the topic in Fall 2010. This talk will briefly describe the pattern language itself, then demonstrate its application in examples from both image processing and game design.
Teaching Parallelism Using GamesAcademic institutions do not have to spend expensive multi-core hardware to support game-based courses to teach parallelism. We will discuss what teaching methodologies educators can use for integrating parallel computing curriculum inside a game engine. We will talk about the full game development process, from game design to game engineering and how parallelism is critical. We will show five game demos that mirror current trends in the industry and how educators can use in these games in the classroom. We will also show the learning outcomes, what parallelism topics are appropriate to teach students at various levels. We will demonstrate how to take games running serially and modify them to run parallel.
Starting Your Future Career at IntelIntel's Dani Napier will introduce why Intel is a great place to work-- it's challenging, has great benefits and is abundant with rewarding growth opportunities. She will expand on why parallelism is crucial to Intel's growth strategy and give an overview of the various types of jobs in which knowledge of parallel and distributed processing apply at Intel. Finally, Dani will explain the new hire development process and why Intel is the company that will help you become successful in your desired career path. Lauren Dankiewicz will discuss her background from the University of California, Berkeley. She gives an insightful and humorous commentary on the interview process at Intel, drawing similarities to dating. Lauren describes the excitement, the uncertainty, and what it takes to make the right choice! Listen to this fun and engaging real-life clip of how an intern became a full-time employee at Intel.
Opening RemarksIntel Platinum Patron Night will be held on Thursday evening, 5:30-8:30pm, in the Kuskokwim Ballroom. This will be an exciting opportunity for IPDPS attendees to network and learn about the Intel Academic Communityâ€™s free resources to support parallel computing research and teaching. Intel recruiters will share information about engineering internships and careers for recent college graduates.
25th Year IPDPS Celebration
SESSION 17: Parallel Algorithms
A New Data Layout For Set Intersection on GPUsSet intersection is the core in a variety of problems, e.g. frequent itemset mining and sparse boolean matrix multiplication. It is well-known that large speed gains can, for some computational problems, be obtained by using a graphics processing unit (GPU) as a massively parallel computing device. However, GPUs require highly regular control ?ow and memory access patterns, and for this reason previous GPU methods for intersecting sets have used a simple bitmap representation. This representation requires excessive space on sparse data sets. In this paper we present a novel data layout, â€BatMapâ€, that is particularly well suited for parallel processing, and is compact even for sparse data. Frequent itemset mining is one of the most important applications of set intersection. As a case-study on the potential of BatMaps we focus on frequent pair mining, which is a core special case of frequent itemset mining. The main ?nding is that our method is able to achieve speedups over both Apriori and FP-growth when the number of distinct items is large, and the density of the problem instance is above 0.01. Previous implementations of frequent itemset mining on GPU have not been able to show speedups over the best single-threaded implementations.
Partitioning Spatially Located Computations using RectanglesThe ideal distribution of spatially located heterogeneous workloads is an important problem to address in parallel scienti?c computing. We investigate the problem of partitioning such workloads (represented as a matrix of positive integers) into rectangles, such that the load of the most loaded rectangle (processor) is minimized. Since ?nding the optimal arbitrary rectangle-based partition is an NP-hard problem, we investigate particular classes of solutions, namely, rectilinear partitions, jagged partitions and hierarchical partitions. We present a new class of solutions called m-way jagged partitions, propose new optimal algorithms for m-way jagged partitions and hierarchical partitions, propose new heuristic algorithms, and provide worst case performance analyses for some existing and new heuristics. Moreover, the algorithms are tested in simulation on a wide set of instances. Results show that two of the algorithms we introduce lead to a much better load balance than the state-of-the-art algorithms.
Reduced-Bandwidth Multithreaded Algorithms for Sparse-Matrix Vector MultiplicationOn multicore architectures, the ratio of peak memory bandwidth to peak ?oating-point performance (byte:?op ratio) is decreasing as core counts increase, further limiting the performance of bandwidth limited applications. Multiplying a sparse matrix (as well as its transpose in the unsymmetric case) with a dense vector is the core of sparse iterative methods. In this paper, we present a new multithreaded algorithm for the symmetric case which potentially cuts the bandwidth requirements in half while exposing lots of parallelism in practice. We also give a new data structure transformation, called bitmasked register blocks, which promises signi?cant reductions on bandwidth requirements by reducing the number of indexing elements without introducing additional ?ll-in zeros. Our work shows how to incorporate this transformation into existing parallel algorithms (both symmetric and unsymmetric) without limiting their parallel scalability. Experimental results indicate that the combined bene?ts of bitmasked register blocks and the new symmetric algorithm can be as high as a factor of 3.5x in multicore performance over an already scalable parallel approach. We also provide a model that accurately predicts the performance of the new methods, showing that even larger performance gains are expected in future multicore systems as current trends (decreasing byte:?op ratio and larger sparse matrices) continue.