IEEE IPDPS 2011
TechTalks from event: IEEE IPDPS 2011
Note 1: Only plenary sessions (keynotes, panels, and best papers) are accessible without requiring log-in. For other talks, you will need to log-in using the email you registered for IPDPS 2011. Note 2: Many of the talks (those without a thumbnail next to the their description below) are yet to be uploaded. Some of them were not recorded because of technical problems. We are working with the corresponding authors to upload the self-recorded versions here. We sincerely thank all authors for their efforts in making their videos available.
SESSION 25: Algorithms for Distributed Computing
I/O-optimal Algorithms for Orthogonal Problems for Private-Cache Chip MultiprocessorsThe parallel external memory (PEM) model has been used as a basis for the design and analysis of a wide range of algorithms for private-cache multi-core architectures. As a tool for developing geometric algorithms in this model, a parallel version of the I/O-ef?cient distribution sweeping framework was introduced recently, and a number of algorithms for problems on axis-aligned objects were obtained using this framework. The obtained algorithms were ef?cient but not optimal. In this paper, we improve the framework to obtain algorithms with the optimal I/O complexity of O(sortP(N) + K=PB) for a number of problems on axis-aligned objects; P denotes the number of cores/processors, B denotes the number of elements that ?t in a cache line, N and K denote the sizes of the input and output, respectively, and sortP(N) denotes the I/O complexity of sorting N items using P processors in the PEM model. To obtain the above improvement, we present a new one-dimensional batched range counting algorithm on a sorted list of ranges and points that achieves an I/O complexity of O((N + K)=PB), where K is the sum of the counts of all the ranges. The key to achieving ef?cient load balancing among the processors in this algorithm is a new method to count the output without enumerating it, which might be of independent interest.
A Fast Algorithm for Constructing Inverted Files on Heterogeneous PlatformsGiven a collection of documents residing on a disk, we develop a new strategy for processing these documents and building the inverted ?les extremely fast. Our approach is tailored for a heterogeneous platform consisting of a multicore CPU and a highly multithreaded GPU. Our algorithm is based on a number of novel techniques including: (i) a high-throughput pipelined strategy that produces parallel parsed streams that are consumed at the same rate by parallel indexers; (ii) a hybrid trie and B-tree dictionary data structure in which the trie is represented by a table for fast look-up and each B-tree node contains string caches; (iii) allocation of parsed streams with frequent terms to CPU threads and the rest to GPU threads so as to match the throughput of parsed streams; and (iv) optimized CUDA indexer implementation that ensures coalesced memory accesses and effective use of shared memory. We have performed extensive tests of our algorithm on a single node (two Intel Xeon X5560 Quad-core) with two NVIDIA Tesla C1060 attached to it, and were able to achieve a throughput of more than 262 MB/s on the ClueWeb09 dataset. Similar results were obtained for widely different datasets. The throughput of our algorithm is superior to the best known algorithms reported in the literature even when compared to those run on large clusters.
Graph Partitioning with Natural CutsWe present a novel approach to graph partitioning based on the notion of natural cuts. Our algorithm, called PUNCH, has two phases. The ?rst phase performs a series of minimum-cut computations to identify and contract dense regions of the graph. This reduces the graph size, but preserves its general structure. The second phase uses a combination of greedy and local search heuristics to assemble the ?nal partition. The algorithm performs especially well on road networks, which have an abundance of natural cuts (such as bridges, mountain passes, and ferries). In a few minutes, it obtains the best known partitions for continental-sized networks, signi?cantly improving on previous results.
Reader Activation Scheduling in Multi-Reader RFID Systems: A Study of General CaseRadio frequency identi?cation (RFID) is a technology where a reader device can â€œsenseâ€ the presence of a close by object by reading a tag device attached to the object. To guarantee the coverage quality, multiple RFID readers can be deployed in the given region. In this paper, we consider the problem of activation schedule for readers in a multi-reader environment. In particular, we try to design a schedule for readers to maximize the number of served tags per time-slot while avoiding various interferences. We ?rst develop a centralized algorithm under the assumption that different readers may have different interference and interrogation radius. Next, we propose a novel algorithm which does not need any location information of the readers. Finally, we extend the previous algorithm in distributed manner in order to suit the case where no central entity exists. We conduct extensive simulations to study the performances of our proposed algorithm. And our evaluation results corroborate our theoretical analysis.