September 28, 2021
Due to their highly parallel multi-cores architecture, GPUs are being increasingly used in a wide range of computationally intensive applications. Compared to CPUs, GPUs can achieve higher performances at accelerating the programs' execution in an energy-efficient way. Therefore GPGPU computing is useful for high performance computing applications and in many scientific research fields. In order to bring further performance improvements, GPU clusters are increasingly adopted....
April 13, 2005
This paper presents a stream-oriented architecture for structuring cluster applications. Clusters that run applications based on this architecture can scale to tenths of thousands of nodes with significantly less performance loss or reliability problems. Our architecture exploits the stream nature of the data flow and reduces congestion through load balancing, hides latency behind data pushes and transparently handles node failures. In our ongoing work, we are developing an i...
February 2, 2017
A recent development in radio astronomy is to replace traditional dishes with many small antennas. The signals are combined to form one large, virtual telescope. The enormous data streams are cross-correlated to filter out noise. This is especially challenging, since the computational demands grow quadratically with the number of data streams. Moreover, the correlator is not only computationally intensive, but also very I/O intensive. The LOFAR telescope, for instance, will p...
September 3, 2013
Nowadays, the data to be processed by database systems has grown so large that any conventional, centralized technique is inadequate. At the same time, general purpose computation on GPU (GPGPU) recently has successfully drawn attention from the data management community due to its ability to achieve significant speed-ups at a small cost. Efficient skew handling is a well-known problem in parallel queries, independently of the execution environment. In this work, we investiga...
August 13, 2020
Computing is bottlenecked by data. Large amounts of application data overwhelm storage capability, communication capability, and computation capability of the modern machines we design today. As a result, many key applications' performance, efficiency and scalability are bottlenecked by data movement. In this keynote talk, we describe three major shortcomings of modern architectures in terms of 1) dealing with data, 2) taking advantage of the vast amounts of data, and 3) expl...
December 18, 2022
Filters approximately store a set of items while trading off accuracy for space-efficiency and can address the limited memory on accelerators, such as GPUs. However, there is a lack of high-performance and feature-rich GPU filters as most advancements in filter research has focused on CPUs. In this paper, we explore the design space of filters with a goal to develop massively parallel, high performance, and feature rich filters for GPUs. We evaluate various filter designs i...
December 14, 2011
With the rapid advances in mobile technology many mobile devices are capable of capturing high quality images and video with their embedded camera. This paper investigates techniques for real-time processing of the resulting images, particularly on-device utilizing a graphical processing unit. Issues and limitations of image processing on mobile devices are discussed, and the performance of graphical processing units on a range of devices measured through a programmable shade...
September 20, 2007
We discuss an implementation of molecular dynamics (MD) simulations on a graphic processing unit (GPU) in the NVIDIA CUDA language. We tested our code on a modern GPU, the NVIDIA GeForce 8800 GTX. Results for two MD algorithms suitable for short-ranged and long-ranged interactions, and a congruential shift random number generator are presented. The performance of the GPU's is compared to their main processor counterpart. We achieve speedups of up to 80, 40 and 150 fold, respe...
August 9, 2013
The number of triangles in a graph is a fundamental metric, used in social network analysis, link classification and recommendation, and more. Driven by these applications and the trend that modern graph datasets are both large and dynamic, we present the design and implementation of a fast and cache-efficient parallel algorithm for estimating the number of triangles in a massive undirected graph whose edges arrive as a stream. It brings together the benefits of streaming alg...
January 26, 2020
Understanding and tuning the performance of extreme-scale parallel computing systems demands a streaming approach due to the computational cost of applying offline algorithms to vast amounts of performance log data. Analyzing large streaming data is challenging because the rate of receiving data and limited time to comprehend data make it difficult for the analysts to sufficiently examine the data without missing important changes or patterns. To support streaming data analys...