Underfox
banner
underfox3.bsky.social
Underfox
@underfox3.bsky.social
Physicist, Telecom Engineering lover, HPC Enthusiast. Prog Rock/Metal fan.
---
Independent tech analyst focused on semiconductors, patent analysis and emerging technologies.
August 29, 2025 at 8:56 AM
Each query dynamically selects a few informational blocks, as well as mandatory anchors, with causal routing that avoids loop closures. The model is able to allocate computation to relevant histories, preserving identities, actions, and scenes across minutes of content.
August 29, 2025 at 8:56 AM
This work could pave the way not only for automatic optimizations for ML and science kernels but also for the development of LLM-optimized AMD GPU drivers. Congrats to the authors for this excellent work.
August 29, 2025 at 8:35 AM
SwizzlePerf is the first work that adds rich context from a suite of profilers into the context to directly reflect cache-locality improvements and improve LLM optimization.
August 29, 2025 at 8:35 AM
This isn't the first time AMD researchers have ventured into AI-powered GPU optimization. The biggest and most important difference is that this work takes hardware-awareness into account.
August 29, 2025 at 8:35 AM
By grouping cooperative blocks into a single XCD, the proposed workflow reduces off-chip traffic and stabilizes residency in the disaggregated caches, reducing the average energy per instruction even in kernels whose execution time is dominated by arithmetic throughput.
August 29, 2025 at 8:35 AM
While the primary focus of the presented work was performance, it is clear that the same remapping will also have pronounced benefits in terms of energy efficiency.
August 29, 2025 at 8:35 AM
The results show that SwizzlePerf can achieve on a wide range of ML and scientific GPU kernels of up to a 2.1x speedup and 70% L2 hit rate improvement.
August 29, 2025 at 8:35 AM
This work will be presented at the in 58th IEEE/ACM International Symposium on Microarchitecture (MICRO 25), which will be held October 18 - 22, 2025 at Seoul, Korea.
August 29, 2025 at 6:04 AM
OmniSim is able to successfully simulate 11 designs previously unsupported by any HLS tool, achieving up to 35.9x speedup over traditional C/RTL co-simulation, and up to 6.61x speedup over the state-of-the-art yet less capable simulator, LightningSim, on its own benchmark suite.
August 29, 2025 at 6:04 AM
OmniSim carefully orchestrates functionality and performance simulation threads to accurately model hardware-level behavior under arbitrary OS scheduling, achieving near-C simulation speed with near-RTL accuracy for both functionality and performance.
August 29, 2025 at 6:04 AM
The implemented proof of concept is capable of demonstrating softmax computation and invertible logic without the need to create a network of probabilistic devices, offering major scalability advantages.
August 29, 2025 at 5:05 AM
Excerpt from: Y. Wong, G. Zocchi, Spontaneous spiral patterns etched on Germanium, Arxiv, 2025

Link: arxiv.org/pdf/2508.16764
August 29, 2025 at 3:59 AM
These findings represent a major step toward lower-power and faster spintronic devices for memory logic applications, creating new possibilities for electrical modulation of spin dynamics and ultrafast spin injection into two-dimensional quantum material.
August 28, 2025 at 11:31 PM
The experimental results employing direct contacts as well as contacts involving tunnel barriers show efficient gate control, with over 100% enhancement in the demagnetization rate compared to bare Cobalt by modulating the junction resistance.
August 28, 2025 at 11:31 PM
It is important to note that the proposed experiment in this work also revealed that the topology-aware losses could also contribute to improving the geometry of the interpolated data.
August 28, 2025 at 11:19 PM
Given an input sequence of persistence diagrams and a sparse temporal sampling of the corresponding data, the porposed approach inverts the non-keyframe diagrams to produce plausible estimations of the missing data.
August 28, 2025 at 11:19 PM