Lightnews — Scholar-powered news

Tavian Barnes

@tavianator.com

Nah man rent used to actually be lower

July 1, 2025 at 4:37 PM

Tavian Barnes

@tavianator.com

imo the third post is better: tavianator.com/2022/ray_box...

Fast, Branchless Ray/Bounding Box Intersections, Part 3: Boundaries - tavianator.com

tavianator.com

June 27, 2025 at 5:22 PM

Tavian Barnes

@tavianator.com

38 seconds total, if you read the whole line.

Anyway I'm not suggesting that an "actual project" spend time on this, but maybe a build system should?

April 27, 2025 at 5:35 PM

Tavian Barnes

@tavianator.com

Keep going? No, fail fast!

April 14, 2025 at 3:52 PM

Tavian Barnes

@tavianator.com

And the explanation: tavianator.com/2025/shlxpla...

The Alder Lake anomaly, explained - tavianator.com

tavianator.com

January 4, 2025 at 7:08 PM

Tavian Barnes

@tavianator.com

Re-measured it: the prefetch is still a significant improvement, ~11% higher throughput at 8 threads, ~30% higher at 12 threads. I should probably write this up somewhere

January 2, 2025 at 3:36 PM

Tavian Barnes

@tavianator.com

In both cases the counters claim 1 uop, so I don't think so

December 31, 2024 at 10:55 PM

Tavian Barnes

@tavianator.com

It's fast for me in practice on Zen 2 at least. Maybe someday I'll microbenchmark it. But also I've dramatically optimized the MPMC queue recently, not sure the prefetch still makes a difference

December 31, 2024 at 10:53 PM

Reposted by Tavian Barnes

Tavian Barnes

@tavianator.com

I wrote more details here: github.com/andreas-abel...

uops.info Alder Lake-P latency for SHLX · Issue #33 · andreas-abel/nanoBench

https://uops.info/html-instr/SHLX_R64_R64_R64.html#ADL-P lists SHLX as having 3-cycle latency for both operands. This is in contrast to Intel's docs and InstLatx64's measurements. So what gives? I ...

github.com

December 31, 2024 at 8:12 PM

Tavian Barnes

@tavianator.com

I wrote more details here: github.com/andreas-abel...

uops.info Alder Lake-P latency for SHLX · Issue #33 · andreas-abel/nanoBench

https://uops.info/html-instr/SHLX_R64_R64_R64.html#ADL-P lists SHLX as having 3-cycle latency for both operands. This is in contrast to Intel's docs and InstLatx64's measurements. So what gives? I ...

github.com

December 31, 2024 at 8:12 PM

Tavian Barnes

@tavianator.com

Okay new discovery: "MOV R10D, 1" also gives 1c latency. But "MOV R10, 1" gives 3c latency. Something to do with whether the top half of the count register is zeroed and how.

December 31, 2024 at 7:42 PM

Tavian Barnes

@tavianator.com

Oh I found this: stackoverflow.com/a/77705726/5...

Compiler flags of GCC/CLANG to generate "BEXTR" instruction (of IA32's BMI1)

I am looking for compiler flags of GCC/CLANG to generate BEXTR instruction. template <auto uSTART, auto uLENGTH, typename Tunsigned> constexpr Tunsigned bit_extract(Tunsigned uInput) { re...

stackoverflow.com

December 31, 2024 at 7:19 PM

Tavian Barnes

@tavianator.com

Maybe a register renaming issue?

December 31, 2024 at 7:07 PM

Tavian Barnes

@tavianator.com

So I suspect some code alignment issue or something is at fault

December 31, 2024 at 6:50 PM

Tavian Barnes

@tavianator.com

Hmm, something very strange is going on. I can reproduce their benchmark. But if I add "XOR R8, R8; XOR R9, R9; XOR R10, R10" to -asm_init to zero all the registers, it goes from 8 cycles to 6, i.e. 1-cycle latency.

*But*, if I instead use "MOV R10, 0" to zero it out, it's back to 8 cycles!

December 31, 2024 at 6:49 PM

Tavian Barnes

@tavianator.com

Crashpad knows about this but otherwise it doesn't seem documented: github.com/chromium/cra...

mac-arm64: Cope with signal handling quirks · chromium/crashpad@e0d8a0a

On x86_64, it’s impossible for a signal handler distinguish between SIGBUS caused synchronously by a hardware fault and SIGBUS raised asynchronously by software. This remains true on arm64, and is ...

github.com

December 22, 2024 at 7:25 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news