Diving deep into software variability spaces, from Airbus to Linux.
@rennesuniv.bsky.social #INSA #IUF @InstUnivFr @Inria #IRISA
Code: github.com/acherm/gptch...
with deeper insights, such as:
* o3 can sometimes synthesize code to play chess, but fails
* o3-high seems a special beast, but it is an unreliable model (illegal move may occur after 10 moves) and 15$ for a game!
Code: github.com/acherm/gptch...
with deeper insights, such as:
* o3 can sometimes synthesize code to play chess, but fails
* o3-high seems a special beast, but it is an unreliable model (illegal move may occur after 10 moves) and 15$ for a game!
- #Devoxx www.youtube.com/watch?v=bO96...
- la vidéo originale www.youtube.com/watch?v=6D1X... qui est plus longue et a le temps de (notamment) expliquer mes expériences
blog.mathieuacher.com/GPTsChessElo... 2/3
- #Devoxx www.youtube.com/watch?v=bO96...
- la vidéo originale www.youtube.com/watch?v=6D1X... qui est plus longue et a le temps de (notamment) expliquer mes expériences
blog.mathieuacher.com/GPTsChessElo... 2/3
✅ Reproducibility matters—always verify results.
✅ Replicability matters even more.
✅ Depth sensitivity and domain specificities are critical in SE.
✅ MT needs refinement.
Study:
hal.science/hal-04943474v2
(published at IST journal)
Blog post: blog.mathieuacher.com/Reproducibil...
✅ Reproducibility matters—always verify results.
✅ Replicability matters even more.
✅ Depth sensitivity and domain specificities are critical in SE.
✅ MT needs refinement.
Study:
hal.science/hal-04943474v2
(published at IST journal)
Blog post: blog.mathieuacher.com/Reproducibil...
MT is powerful & could work well for LLM-based chess engines. But for Stockfish, MRs must account for depth & move ordering.
MT is powerful & could work well for LLM-based chess engines. But for Stockfish, MRs must account for depth & move ordering.
🚨 The original study didn't parameterize metamorphic relations by depth!
Metamorphic testing (MT) needs depth-aware refinement—some violations at low depth have limited interest.
No impact on Stockfish depsite alarming claims
🚨 The original study didn't parameterize metamorphic relations by depth!
Metamorphic testing (MT) needs depth-aware refinement—some violations at low depth have limited interest.
No impact on Stockfish depsite alarming claims
We found where this happens exactly in the code. Symmetry can be enforced, but… it adds overhead/complexity.
We found where this happens exactly in the code. Symmetry can be enforced, but… it adds overhead/complexity.
Stockfish orders legal moves differently depending on board symmetry. This affects search results at some depths.
❌ Not a bug, a feature of how the engine explores positions.
Stockfish orders legal moves differently depending on board symmetry. This affects search results at some depths.
❌ Not a bug, a feature of how the engine explores positions.
These mirrored positions should have the same evaluation, but at depth=20:
📊 Left: +0.66
📊 Right: -2.17
This is not just a low-depth issue—it rings a bell.
These mirrored positions should have the same evaluation, but at depth=20:
📊 Left: +0.66
📊 Right: -2.17
This is not just a low-depth issue—it rings a bell.