Ange
angealbertini.bsky.social
Ange
@angealbertini.bsky.social
Reverse engineer, file formats expert.
Corkami, CPS2Shock, PoC||GTFO, Sha1tered, Magika...
Security engineer @ Google. He/him.
To check if a file starts with MZ or GIF, just use file/libmagic.
You don't need AI or Magika for that.
TrID has a lot of heuristics, but a lot of false positives.

Magika is useful in different ways, across binary and source types, and is quite fast. But not fit for weird files.
November 14, 2025 at 9:54 AM
Magika is a fast file type identifier that covers many file types, binary formats or source texts.
It's not made to detect adversarial attacks.
It's useful for different things that classic binary scanning can't do at this speed.

Magika was trained on all file types with enough available samples.
November 14, 2025 at 9:54 AM
Weird files are out of scope of Magika. It just wasn't trained on them.

It's trivial to inject some data in a file and keep it functional (w/ my tool Mitra, for example).
So take a JPG, inject a lot of JavaScript data, and ...guess what ?

Check it out: github.com/corkami/mitra
GitHub - corkami/mitra: A generator of weird files (binary polyglots, near polyglots, polymocks...)
A generator of weird files (binary polyglots, near polyglots, polymocks...) - corkami/mitra
github.com
November 14, 2025 at 9:54 AM
Of course, it's possible to create weird files that will fool Magika and other tools.
Polymocks, polyglots...

I made quite a few - check my CCC talk last year:
speakerdeck.com/ange/fearsom...
Fearsome File Formats
Presented at 38C3 in Hamburg on the 28th December 2024. Video recording: https://media.ccc.de/v/38c3-fearsome-file-formats With so many open-sou…
speakerdeck.com
November 14, 2025 at 9:54 AM
Magika uses the first and last kilobytes of the files.
That way, if the file is slightly corrupted, the filetype might still be properly identified.
Magika returns several file types if needed.
It's one of its advantages, but a double-edged sword.
November 14, 2025 at 9:54 AM
So file contents are used to determine the file type.
To check if the file starts with '\x7FELF', 'MZ' or 'GIF', you don't need IA.

But some file formats don't start with a clear 'magic' signature at offset zero.
And what if you also want to tell apart C++, RUST and HTML ?
No magic for source files
November 14, 2025 at 9:54 AM
To identify file types, the worst way are file extensions:
the extension is stored in the filesystem entry, not in the file content.
It can be lost, modified, variable...

Almost all file formats are known under several file extensions:
.JPG/.JPEG, .ZIP/.APK/.DOCX, .EXE/.DLL, .ELF/.SO ...
November 14, 2025 at 9:54 AM
In the process, I also studied in depth other file type identifiers, and I've been contributing to most of them before, including LibMagic, TrID...
Check my talk: speakerdeck.com/ange/overvie...
Overview of file type identifiers
Yara, LibMagic (file, binwalk, polyfile), TrID, Yara, Magika, PeID, Pronom, FDD, ShareMime, DiE... How do they work? What are their pros and cons, th…
speakerdeck.com
November 14, 2025 at 9:54 AM
I love the little details:
Drop and run, piezo sound…
November 8, 2025 at 12:43 PM
You can do that with my Mitra tool with the `--force` parameter (for arbitrary content injection) on ~40 standard formats (which covers many more subformats).
github.com/corkami/mitra
GitHub - corkami/mitra: A generator of weird files (binary polyglots, near polyglots, polymocks...)
A generator of weird files (binary polyglots, near polyglots, polymocks...) - corkami/mitra
github.com
August 28, 2025 at 9:42 AM