The Graphics Card as a Streaming Compute...

Deep Learning and Machine Learning with GPGPU and CUDA: Unlocking the Power of Parallel Computing

October 8, 2024

87% Match

Ming Li, Ziqian Bi, Tianyang Wang, Yizhu Wen, Qian Niu, Junyu Liu, Benji Peng, Sen Zhang, Xuanhe Pan, Jiawei Xu, Jinlang Wang, Keyu Chen, Caitlyn Heqi Yin, ... , Liu Ming

Distributed, Parallel, and C...

Hardware Architecture

General Purpose Graphics Processing Unit (GPGPU) computing plays a transformative role in deep learning and machine learning by leveraging the computational advantages of parallel processing. Through the power of Compute Unified Device Architecture (CUDA), GPUs enable the efficient execution of complex tasks via massive parallelism. This work explores CPU and GPU architectures, data flow in deep learning, and advanced GPU features, including streams, concurrency, and dynamic ...

Find SimilarView on arXiv

Advanced Architectures for Astrophysical Supercomputing

January 13, 2010

87% Match

Benjamin R. Barsdell, David G. Barnes, Christopher J. Fluke

Instrumentation and Methods ...

Astronomers have come to rely on the increasing performance of computers to reduce, analyze, simulate and visualize their data. In this environment, faster computation can mean more science outcomes or the opening up of new parameter spaces for investigation. If we are to avoid major issues when implementing codes on advanced architectures, it is important that we have a solid understanding of our algorithms. A recent addition to the high-performance computing scene that high...

Find SimilarView on arXiv

Sorting with GPUs: A Survey

September 8, 2017

87% Match

Dmitri I. Arkhipov, Di Wu, ... , Regan Amelia C.

Distributed, Parallel, and C...

Sorting is a fundamental operation in computer science and is a bottleneck in many important fields. Sorting is critical to database applications, online search and indexing,biomedical computing, and many other applications. The explosive growth in computational power and availability of GPU coprocessors has allowed sort operations on GPUs to be done much faster than any equivalently priced CPU. Current trends in GPU computing shows that this explosive growth in GPU capabilit...

Find SimilarView on arXiv

Graphics Processing Units and High-Dimensional Optimization

March 16, 2010

87% Match

Hua Zhou, Kenneth Lange, Marc A. Suchard

Computation

This paper discusses the potential of graphics processing units (GPUs) in high-dimensional optimization problems. A single GPU card with hundreds of arithmetic cores can be inserted in a personal computer and dramatically accelerates many statistical algorithms. To exploit these devices fully, optimization algorithms should reduce to multiple parallel tasks, each accessing a limited amount of data. These criteria favor EM and MM algorithms that separate parameters and data. T...

Find SimilarView on arXiv

Lightning: Scaling the GPU Programming Model Beyond a Single GPU

February 11, 2022

87% Match

Stijn 1 and 2 Heldens, Pieter 2 and 3 Hijma, Werkhoven Ben 1 and 2 van, ... , van Nieuwpoort Rob. V. 1 and 2

Distributed, Parallel, and C...

The GPU programming model is primarily aimed at the development of applications that run one GPU. However, this limits the scalability of GPU code to the capabilities of a single GPU in terms of compute power and memory capacity. To scale GPU applications further, a great engineering effort is typically required: work and data must be divided over multiple GPUs by hand, possibly in multiple nodes, and data must be manually spilled from GPU memory to higher-level memories. W...

Find SimilarView on arXiv

Streaming Computations with Region-Based State on SIMD Architectures

June 12, 2020

87% Match

Stephen Timcheck, Jeremy Buhler

Distributed, Parallel, and C...

Streaming computations on massive data sets are an attractive candidate for parallelization, particularly when they exhibit independence (and hence data parallelism) between items in the stream. However, some streaming computations are stateful, which disrupts independence and can limit parallelism. In this work, we consider how to extract data parallelism from streaming computations with a common, limited form of statefulness. The stream is assumed to be divided into variabl...

Find SimilarView on arXiv

Technical Report: Accelerating Dynamic Graph Analytics on GPUs

September 15, 2017

87% Match

Mo Sha, Yuchen Li, ... , Tan Kian-Lee

Data Structures and Algorith...

Distributed, Parallel, and C...

As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative graphs evolve frequently and one has to perform a rebuild of the graph structure on GPUs to incorporate the updates. Hence, rebuilding the graphs becomes the bottleneck of processing high-speed graph streams. In this paper, we propose a GPU-...

Find SimilarView on arXiv

QCD on GPUs: cost effective supercomputing

December 11, 2009

87% Match

M. A. Clark

High Energy Physics - Lattic...

The exponential growth of floating point power in graphics processing units (GPUs), together with their low cost, has given rise to an attractive platform upon which to deploy lattice QCD calculations. GPUs are essentially many (O(100)) core chips, that are programmed using a massively threaded environment, and so are representative of the future of high performance computing (HPC). The large ratio of raw floating point operations per second to memory bandwidth that is charac...

Find SimilarView on arXiv

Computing trends using graphic processor in high energy physics

June 30, 2011

86% Match

Mihai Niculescu, Sorin-Ion Zgura

Distributed, Parallel, and C...

Computational Physics

One of the main challenges in Heavy Energy Physics is to make fast analysis of high amount of experimental and simulated data. At LHC-CERN one p-p event is approximate 1 Mb in size. The time taken to analyze the data and obtain fast results depends on high computational power. The main advantage of using GPU(Graphic Processor Unit) programming over traditional CPU one is that graphical cards bring a lot of computing power at a very low price. Today a huge number of applicatio...

Find SimilarView on arXiv

GPUs as Storage System Accelerators

February 16, 2012

86% Match

Samer Al-Kiswany, Abdullah Gharaibeh, Matei Ripeanu

Distributed, Parallel, and C...

Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of system components, triggers the opportunity to redesign systems and to explore new ways to engineer them to recalibrate the cost-to-performance relation. This project explores the fe...

Find SimilarView on arXiv

The Graphics Card as a Streaming Computer

Deep Learning and Machine Learning with GPGPU and CUDA: Unlocking the Power of Parallel Computing

Advanced Architectures for Astrophysical Supercomputing

Sorting with GPUs: A Survey

Graphics Processing Units and High-Dimensional Optimization

Lightning: Scaling the GPU Programming Model Beyond a Single GPU

Streaming Computations with Region-Based State on SIMD Architectures

Technical Report: Accelerating Dynamic Graph Analytics on GPUs

QCD on GPUs: cost effective supercomputing

Computing trends using graphic processor in high energy physics

GPUs as Storage System Accelerators