Lightnews — Scholar-powered news

Mathieu Acher

@macher.bsky.social

Chess-loving professor and researcher who champion the integration of software engineering and AI for reproducible science.
Diving deep into software variability spaces, from Airbus to Linux.
@rennesuniv.bsky.social #INSA #IUF @InstUnivFr @Inria #IRISA

Posts Replies Media Videos

Mathieu Acher

@macher.bsky.social

Blog post: blog.mathieuacher.com/GPTReasoning...
Code: github.com/acherm/gptch...
with deeper insights, such as:
* o3 can sometimes synthesize code to play chess, but fails
* o3-high seems a special beast, but it is an unreliable model (illegal move may occur after 10 moves) and 15$ for a game!

General-Purpose AI in the Endgame: The Chess Limitations of o3/o4-mini

o3 and o4-mini are large language models recently realeased by OpenAI and augmented with chain-of-thought reinforcement learning, designed to “think before they speak” by generating explicit, multi-st...

blog.mathieuacher.com

June 26, 2025 at 3:31 PM

Mathieu Acher

@macher.bsky.social

Un élément nouveau de la vidéo #Devoxx concerne ce comportement étrange de gpt-3.5-turbo-instruct. A voir s'il est possible de reproduire ;) Assez lié à une autre série d'expériences où j'ai montré comment gagner en 4 ou 7 coups de manière systématique blog.mathieuacher.com/ChessWinning... 3/3

May 10, 2025 at 9:34 PM

Mathieu Acher

@macher.bsky.social

Les deux vidéos sur Youtube:
- #Devoxx www.youtube.com/watch?v=bO96...
- la vidéo originale www.youtube.com/watch?v=6D1X... qui est plus longue et a le temps de (notamment) expliquer mes expériences
blog.mathieuacher.com/GPTsChessElo... 2/3

www.youtube.com

May 10, 2025 at 9:34 PM

Mathieu Acher

@macher.bsky.social

Blog post: blog.mathieuacher.com/AIScientific...
Video:
www.youtube.com/watch?v=N_3F...

Creative Collaboration with AI: Insights from Hugo Duminil-Copin on Mathematics and Discovery

I recently watched a great interview of the mathematician and Fields medalist (2022) Hugo Duminil-Copin by Science étonnante (aka David Louapre). At some point, there was an interesting discussion on ...

blog.mathieuacher.com

April 17, 2025 at 2:23 PM

Mathieu Acher

@macher.bsky.social

I like the simple examples given throughout the talk that give an intuition of the complexity problems. The kinds of issues mentioned are not necessarily new, but are very well articulated.

April 2, 2025 at 8:56 AM

Mathieu Acher

@macher.bsky.social

Final thoughts?

✅ Reproducibility matters—always verify results.
✅ Replicability matters even more.
✅ Depth sensitivity and domain specificities are critical in SE.
✅ MT needs refinement.
Study:
hal.science/hal-04943474v2
(published at IST journal)
Blog post: blog.mathieuacher.com/Reproducibil...

Re-evaluating Metamorphic Testing of Chess Engines: A Replication Study

Context: This study aims to confirm, replicate and extend the findings of a previous article entitled ”Metamorphic Testing of Chess Engines” that reported inconsistencies in the analyses provided by S...

hal.science

March 20, 2025 at 10:41 AM

Mathieu Acher

@macher.bsky.social

A call to refine, not dismiss.

MT is powerful & could work well for LLM-based chess engines. But for Stockfish, MRs must account for depth & move ordering.

March 20, 2025 at 10:41 AM

Mathieu Acher

@macher.bsky.social

Key takeaway?

🚨 The original study didn't parameterize metamorphic relations by depth!
Metamorphic testing (MT) needs depth-aware refinement—some violations at low depth have limited interest.
No impact on Stockfish depsite alarming claims

March 20, 2025 at 10:41 AM

Mathieu Acher

@macher.bsky.social

Can we fix this? Yes!

We found where this happens exactly in the code. Symmetry can be enforced, but… it adds overhead/complexity.

March 20, 2025 at 10:41 AM

Mathieu Acher

@macher.bsky.social

The culprit? Move ordering.

Stockfish orders legal moves differently depending on board symmetry. This affects search results at some depths.

❌ Not a bug, a feature of how the engine explores positions.

March 20, 2025 at 10:41 AM

Mathieu Acher

@macher.bsky.social

🔎 A Chess Mystery

These mirrored positions should have the same evaluation, but at depth=20:
📊 Left: +0.66
📊 Right: -2.17
This is not just a low-depth issue—it rings a bell.

March 20, 2025 at 10:41 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news