You don't need AI or Magika for that.
TrID has a lot of heuristics, but a lot of false positives.
Magika is useful in different ways, across binary and source types, and is quite fast. But not fit for weird files.
You don't need AI or Magika for that.
TrID has a lot of heuristics, but a lot of false positives.
Magika is useful in different ways, across binary and source types, and is quite fast. But not fit for weird files.
It's not made to detect adversarial attacks.
It's useful for different things that classic binary scanning can't do at this speed.
Magika was trained on all file types with enough available samples.
It's not made to detect adversarial attacks.
It's useful for different things that classic binary scanning can't do at this speed.
Magika was trained on all file types with enough available samples.
It's trivial to inject some data in a file and keep it functional (w/ my tool Mitra, for example).
So take a JPG, inject a lot of JavaScript data, and ...guess what ?
Check it out: github.com/corkami/mitra
It's trivial to inject some data in a file and keep it functional (w/ my tool Mitra, for example).
So take a JPG, inject a lot of JavaScript data, and ...guess what ?
Check it out: github.com/corkami/mitra
Polymocks, polyglots...
I made quite a few - check my CCC talk last year:
speakerdeck.com/ange/fearsom...
Polymocks, polyglots...
I made quite a few - check my CCC talk last year:
speakerdeck.com/ange/fearsom...
That way, if the file is slightly corrupted, the filetype might still be properly identified.
Magika returns several file types if needed.
It's one of its advantages, but a double-edged sword.
That way, if the file is slightly corrupted, the filetype might still be properly identified.
Magika returns several file types if needed.
It's one of its advantages, but a double-edged sword.
To check if the file starts with '\x7FELF', 'MZ' or 'GIF', you don't need IA.
But some file formats don't start with a clear 'magic' signature at offset zero.
And what if you also want to tell apart C++, RUST and HTML ?
No magic for source files
To check if the file starts with '\x7FELF', 'MZ' or 'GIF', you don't need IA.
But some file formats don't start with a clear 'magic' signature at offset zero.
And what if you also want to tell apart C++, RUST and HTML ?
No magic for source files
the extension is stored in the filesystem entry, not in the file content.
It can be lost, modified, variable...
Almost all file formats are known under several file extensions:
.JPG/.JPEG, .ZIP/.APK/.DOCX, .EXE/.DLL, .ELF/.SO ...
the extension is stored in the filesystem entry, not in the file content.
It can be lost, modified, variable...
Almost all file formats are known under several file extensions:
.JPG/.JPEG, .ZIP/.APK/.DOCX, .EXE/.DLL, .ELF/.SO ...
Check my talk: speakerdeck.com/ange/overvie...
Check my talk: speakerdeck.com/ange/overvie...
Drop and run, piezo sound…
Drop and run, piezo sound…
github.com/corkami/mitra
github.com/corkami/mitra