Lightnews — Scholar-powered news

HGPU group @hgpu.bsky.social · 3d

Thesis: High-Performance Computing: from Optimization to Automation

#CUDA #HIP #HPC

hgpu.org?p=30292

High-Performance Computing: from Optimization to Automation

The digital revolution of our society is driven by major technological advancements, enabled not only by the growing capabilities of computers but also by the evolution of their uses. These develop…

hgpu.org

1 1

HGPU group @hgpu.bsky.social · 3d

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

#MLIR #OpenCL #Testing #Package

hgpu.org?p=30291

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

MLIR (Multi-Level Intermediate Representation) has rapidly become a foundational technology for modern compiler frameworks, enabling extensibility across diverse domains. However, ensuring the corr…

hgpu.org

HGPU group @hgpu.bsky.social · 3d

ConCuR: Conciseness Makes State-of-the-Art Kernel Generation

#CUDA #CodeGeneration #LLM #DeepLearning #DL #Package

hgpu.org?p=30290

ConCuR: Conciseness Makes State-of-the-Art Kernel Generation

GPU kernel generation by LLMs has recently experienced rapid development, leveraging test-time scaling and reinforcement learning techniques. However, a key challenge for kernel generation is the s…

hgpu.org

1

HGPU group @hgpu.bsky.social · 3d

Accelerating cosmological simulations on GPUs: a portable approach using OpenMP

#OpenMP #HPC #Astrophysics #Package

hgpu.org?p=30289

Accelerating cosmological simulations on GPUs: a portable approach using OpenMP

In this work we present the porting to Graphics Processing Units (GPUs, using OpenMP target directives) and optimization of a key module within the cosmological {pinocchio} code, a Lagrangian Pertu…

hgpu.org

2

HGPU group @hgpu.bsky.social · 3d

EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models

#CUDA #LLM #AI #DeepLearning #DL #PyTorch

hgpu.org?p=30288

EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models

CUDA kernel optimization has become a critical bottleneck for AI performance, as deep learning training and inference efficiency directly depends on highly optimized GPU kernels. Despite the promis…

hgpu.org

1

HGPU group @hgpu.bsky.social · 10d

VibeCodeHPC: An Agent-Based Iterative Prompting Auto-Tuner for HPC Code Generation Using LLMs

#CUDA #OpenMP #OpenACC #HPC #LLM #CodeGeneration #Package

hgpu.org?p=30280

VibeCodeHPC: An Agent-Based Iterative Prompting Auto-Tuner for HPC Code Generation Using LLMs

We propose VibeCodeHPC, an automatic tuning system for HPC programs based on multi-agent LLMs for code generation. VibeCodeHPC tunes programs through multi-agent role allocation and iterative promp…

hgpu.org

1 1

HGPU group @hgpu.bsky.social · 10d

Opal: A Modular Framework for Optimizing Performance using Analytics and LLMs

#CUDA #ROCm #Performance #LLM #AI

hgpu.org?p=30279

Opal: A Modular Framework for Optimizing Performance using Analytics and LLMs

Large Language Models (LLMs) show promise for automated code optimization but struggle without performance context. This work introduces Opal, a modular framework that connects performance analytic…

hgpu.org

HGPU group @hgpu.bsky.social · 10d

Performance and Numerical Aspects of Decompositional Factorizations with FP64 Floating-Point Emulation in INT8

#NVIDIA #Int8 #FP64 #Factorization

hgpu.org?p=30278

Performance and Numerical Aspects of Decompositional Factorizations with FP64 Floating-Point Emulation in INT8

Mixing precisions for performance has been an ongoing trend as the modern hardware accelerators started including new, and mostly lower-precision, data formats. The advantage of using them is a gre…

hgpu.org

HGPU group @hgpu.bsky.social · 10d

exa-AMD: An Exascale-Ready Framework for Accelerating the Discovery and Design of Functional Materials

#Physics #Chemistry #CondensedMatter #MaterialsScience #Package

hgpu.org?p=30277

exa-AMD: An Exascale-Ready Framework for Accelerating the Discovery and Design of Functional Materials

Exascale computing is transforming the field of materials science by enabling simulations of unprecedented scale and complexity. We present exa-AMD, an open-source, high-performance simulation code…

hgpu.org

HGPU group @hgpu.bsky.social · 10d

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

#CUDA #Rust #Performance #Security #Package

hgpu.org?p=30276

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

GPU APIs such as OpenCL require correct host-side sequencing of buffer and queue operations; errors in state transitions and synchronization typically only become visible at runtime in C bindings. …

hgpu.org