HGPU group
@hgpu.bsky.social
85 followers
10 following
210 posts
High performance computing on graphics processing units (GPU): AMD, Nvidia, Intel, CUDA, OpenCL, OpenGL, HPC
Posts
Media
Videos
Starter Packs
HGPU group
@hgpu.bsky.social
· 10d
VibeCodeHPC: An Agent-Based Iterative Prompting Auto-Tuner for HPC Code Generation Using LLMs
We propose VibeCodeHPC, an automatic tuning system for HPC programs based on multi-agent LLMs for code generation. VibeCodeHPC tunes programs through multi-agent role allocation and iterative promp…
hgpu.org
HGPU group
@hgpu.bsky.social
· 10d
Opal: A Modular Framework for Optimizing Performance using Analytics and LLMs
Large Language Models (LLMs) show promise for automated code optimization but struggle without performance context. This work introduces Opal, a modular framework that connects performance analytic…
hgpu.org
HGPU group
@hgpu.bsky.social
· 10d
Performance and Numerical Aspects of Decompositional Factorizations with FP64 Floating-Point Emulation in INT8
Mixing precisions for performance has been an ongoing trend as the modern hardware accelerators started including new, and mostly lower-precision, data formats. The advantage of using them is a gre…
hgpu.org
HGPU group
@hgpu.bsky.social
· 10d
exa-AMD: An Exascale-Ready Framework for Accelerating the Discovery and Design of Functional Materials
Exascale computing is transforming the field of materials science by enabling simulations of unprecedented scale and complexity. We present exa-AMD, an open-source, high-performance simulation code…
hgpu.org
HGPU group
@hgpu.bsky.social
· 10d
Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework
GPU APIs such as OpenCL require correct host-side sequencing of buffer and queue operations; errors in state transitions and synchronization typically only become visible at runtime in C bindings. …
hgpu.org
HGPU group
@hgpu.bsky.social
· 18d
Towards GPU Parallelism Abstractions in Rust: A Case Study with Linear Pipelines
Programming Graphics Processing Units (GPUs) for general-purpose computation remains a daunting task, often requiring specialized knowledge of low-level APIs like CUDA or OpenCL. While Rust has eme…
hgpu.org
HGPU group
@hgpu.bsky.social
· 18d
TRUST: the HPC open-source CFD platform – from CPU to GPU
Since 1993, the CEA has developed TRUST, an open-source CFD software platform designed to address a wide range of thermohydraulic problems. Initially focused on nuclear applications, the platform h…
hgpu.org
HGPU group
@hgpu.bsky.social
· 18d
Cost-Performance Analysis: A Comparative Study of CPU-Based Serverless and GPU-Based Training Architectures
The field of distributed machine learning (ML) faces increasing demands for scalable and cost-effective training solutions, particularly in the context of large, complex models. Serverless computin…
hgpu.org
HGPU group
@hgpu.bsky.social
· 18d
Mojo: MLIR-Based Performance-Portable HPC Science Kernels on GPUs for the Python Ecosystem
We explore the performance and portability of the novel Mojo language for scientific computing workloads on GPUs. As the first language based on the LLVM’s Multi-Level Intermediate Representa…
hgpu.org
HGPU group
@hgpu.bsky.social
· 24d
Evolution of Kernels: Automated RISC-V Kernel Optimization with Large Language Models
Automated kernel design is critical for overcoming software ecosystem barriers in emerging hardware platforms like RISC-V. While large language models (LLMs) have shown promise for automated kernel…
hgpu.org
HGPU group
@hgpu.bsky.social
· 24d
Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization
Recent advances in large language models (LLMs) demonstrate their effectiveness in scaling test-time compute for software engineering tasks. However, these approaches often focus on high-level solu…
hgpu.org
HGPU group
@hgpu.bsky.social
· 24d
Dato: A Task-Based Programming Model for Dataflow Accelerators
Recent deep learning workloads increasingly push computational demand beyond what current memory systems can sustain, with many kernels stalling on data movement rather than computation. While mode…
hgpu.org
HGPU group
@hgpu.bsky.social
· 24d
High Performance GPU Implementation of KNN Algorithm: A Review
With large volumes of complex data generated by different applications, Machine Learning (ML) algorithms alone may not yield significant performance benefits on a single or multi-core CPU. Applying…
hgpu.org
HGPU group
@hgpu.bsky.social
· Sep 14
An HPC Benchmark Survey and Taxonomy for Characterization
The field of High-Performance Computing (HPC) is defined by providing computing devices with highest performance for a variety of demanding scientific users. The tight co-design relationship betwee…
hgpu.org
HGPU group
@hgpu.bsky.social
· Sep 14
Towards Calculating HPC CUDA Kernel Performance on Nvidia GPUs
This thesis aims at providing the ground work to facilitate a performance estimation model for CUDA kernels using a cycle counting model. After a short overview of past GPU performance modeling tec…
hgpu.org
HGPU group
@hgpu.bsky.social
· Sep 14
Combining Performance and Productivity: Accelerating the Network Sensing Graph Challenge with GPUs and Commodity Data Science Software
The HPEC Graph Challenge is a collection of benchmarks representing complex workloads that test the hardware and software components of HPC systems, which traditional benchmarks, such as LINPACK, d…
hgpu.org