The Graphics Card as a Streaming Compute...

Power Consumption Analysis of Parallel Algorithms on GPUs

September 28, 2021

85% Match

Frédéric Magoulès, Abal-Kassim Cheik Ahamed, Alban Desmaison, Jean-Christophe Léchenet, François Mayer, ... , Zhu Thomas

Distributed, Parallel, and C...

Mathematical Software

Numerical Analysis

Performance

Numerical Analysis

Due to their highly parallel multi-cores architecture, GPUs are being increasingly used in a wide range of computationally intensive applications. Compared to CPUs, GPUs can achieve higher performances at accelerating the programs' execution in an energy-efficient way. Therefore GPGPU computing is useful for high performance computing applications and in many scientific research fields. In order to bring further performance improvements, GPU clusters are increasingly adopted....

Find SimilarView on arXiv

A Scalable Stream-Oriented Framework for Cluster Applications

April 13, 2005

85% Match

Tassos S. Argyros, David R. Cheriton

Distributed, Parallel, and C...

Databases

Networking and Internet Arch...

Operating Systems

Programming Languages

This paper presents a stream-oriented architecture for structuring cluster applications. Clusters that run applications based on this architecture can scale to tenths of thousands of nodes with significantly less performance loss or reliability problems. Our architecture exploits the stream nature of the data flow and reduces congestion through load balancing, hides latency behind data pushes and transparently handles node failures. In our ongoing work, we are developing an i...

Find SimilarView on arXiv

Correlating Radio Astronomy Signals with Many-Core Hardware

February 2, 2017

85% Match

Nieuwpoort Rob V. van, John W. Romein

Instrumentation and Methods ...

A recent development in radio astronomy is to replace traditional dishes with many small antennas. The signals are combined to form one large, virtual telescope. The enormous data streams are cross-correlated to filter out noise. This is especially challenging, since the computational demands grow quadratically with the number of data streams. Moreover, the correlator is not only computationally intensive, but also very I/O intensive. The LOFAR telescope, for instance, will p...

Find SimilarView on arXiv

Skew Handling in Aggregate Streaming Queries on GPUs

September 3, 2013

85% Match

Georgios Koutsoumpakis, Iakovos Koutsoumpakis, Anastasios Gounaris

Databases

Distributed, Parallel, and C...

Nowadays, the data to be processed by database systems has grown so large that any conventional, centralized technique is inadequate. At the same time, general purpose computation on GPU (GPGPU) recently has successfully drawn attention from the data management community due to its ability to achieve significant speed-ups at a small cost. Efficient skew handling is a well-known problem in parallel queries, independently of the execution environment. In this work, we investiga...

Find SimilarView on arXiv

Intelligent Architectures for Intelligent Machines

August 13, 2020

85% Match

Onur Mutlu

Hardware Architecture

Computing is bottlenecked by data. Large amounts of application data overwhelm storage capability, communication capability, and computation capability of the modern machines we design today. As a result, many key applications' performance, efficiency and scalability are bottlenecked by data movement. In this keynote talk, we describe three major shortcomings of modern architectures in terms of 1) dealing with data, 2) taking advantage of the vast amounts of data, and 3) expl...

Find SimilarView on arXiv

High-Performance Filters For GPUs

December 18, 2022

85% Match

Hunter McCoy, Steven Hofmeyr, ... , Pandey Prashant

Distributed, Parallel, and C...

Data Structures and Algorith...

Filters approximately store a set of items while trading off accuracy for space-efficiency and can address the limited memory on accelerators, such as GPUs. However, there is a lack of high-performance and feature-rich GPU filters as most advancements in filter research has focused on CPUs. In this paper, we explore the design space of filters with a goal to develop massively parallel, high performance, and feature rich filters for GPUs. We evaluate various filter designs i...

Find SimilarView on arXiv

GPU-based Image Analysis on Mobile Devices

December 14, 2011

85% Match

Andrew Ensor, Seth Hall

Graphics

Computer Vision and Pattern ...

With the rapid advances in mobile technology many mobile devices are capable of capturing high quality images and video with their embedded camera. This paper investigates techniques for real-time processing of the resulting images, particularly on-device utilizing a graphical processing unit. Issues and limitations of image processing on mobile devices are discussed, and the performance of graphical processing units on a range of devices measured through a programmable shade...

Find SimilarView on arXiv

Harvesting graphics power for MD simulations

September 20, 2007

85% Match

Meel J. A. van, A. Arnold, D. Frenkel, ... , Belleman R. G.

Other Condensed Matter

Soft Condensed Matter

We discuss an implementation of molecular dynamics (MD) simulations on a graphic processing unit (GPU) in the NVIDIA CUDA language. We tested our code on a modern GPU, the NVIDIA GeForce 8800 GTX. Results for two MD algorithms suitable for short-ranged and long-ranged interactions, and a congruential shift random number generator are presented. The performance of the GPU's is compared to their main processor counterpart. We achieve speedups of up to 80, 40 and 150 fold, respe...

Find SimilarView on arXiv

Parallel Triangle Counting in Massive Streaming Graphs

August 9, 2013

85% Match

Kanat Tangwongsan, A. Pavan, Srikanta Tirthapura

Databases

Distributed, Parallel, and C...

Data Structures and Algorith...

Social and Information Netwo...

The number of triangles in a graph is a fundamental metric, used in social network analysis, link classification and recommendation, and more. Driven by these applications and the trend that modern graph datasets are both large and dynamic, we present the design and implementation of a fast and cache-efficient parallel algorithm for estimating the number of triangles in a massive undirected graph whose edges arrive as a stream. It brings together the benefits of streaming alg...

Find SimilarView on arXiv

A Visual Analytics Framework for Reviewing Streaming Performance Data

January 26, 2020

85% Match

Suraj P. Kesavan, Takanori Fujiwara, Jianping Kelvin Li, Caitlin Ross, Misbah Mubarak, Christopher D. Carothers, ... , Ma Kwan-Liu

Distributed, Parallel, and C...

Human-Computer Interaction

Machine Learning

Performance

Understanding and tuning the performance of extreme-scale parallel computing systems demands a streaming approach due to the computational cost of applying offline algorithms to vast amounts of performance log data. Analyzing large streaming data is challenging because the rate of receiving data and limited time to comprehend data make it difficult for the analysts to sufficiently examine the data without missing important changes or patterns. To support streaming data analys...

Find SimilarView on arXiv

The Graphics Card as a Streaming Computer

Power Consumption Analysis of Parallel Algorithms on GPUs

A Scalable Stream-Oriented Framework for Cluster Applications

Correlating Radio Astronomy Signals with Many-Core Hardware

Skew Handling in Aggregate Streaming Queries on GPUs

Intelligent Architectures for Intelligent Machines

High-Performance Filters For GPUs

GPU-based Image Analysis on Mobile Devices

Harvesting graphics power for MD simulations

Parallel Triangle Counting in Massive Streaming Graphs

A Visual Analytics Framework for Reviewing Streaming Performance Data