SSL for plant images
Interested in Computer Vision, Natural Language Processing, Machine Listening, and Biodiversity Monitoring
Website: ilyassmoummad.github.io
🔹 Unified supervised + unsupervised hashing
🔹 Flexible: works via probing or LoRA
🔹 SOTA hashing in minutes on a single GPU
📄 Paper: arxiv.org/abs/2510.27584
💻 Code: github.com/ilyassmoumma...
Shoutout to my wonderful co-authors Kawtar, Hervé, and Alexis.
🔹 Unified supervised + unsupervised hashing
🔹 Flexible: works via probing or LoRA
🔹 SOTA hashing in minutes on a single GPU
📄 Paper: arxiv.org/abs/2510.27584
💻 Code: github.com/ilyassmoumma...
Shoutout to my wonderful co-authors Kawtar, Hervé, and Alexis.
CroVCA produces compact codes that transfer efficiently:
✅ Single HashCoder trained on ImageNet-1k works on downstream datasets without retraining (More experiments and ablations in the paper)
CroVCA produces compact codes that transfer efficiently:
✅ Single HashCoder trained on ImageNet-1k works on downstream datasets without retraining (More experiments and ablations in the paper)
CroVCA retrieves correct classes even for fine-grained or ambiguous queries (e.g., indigo bird, grey langur).
✅ Outperforms Hashing-Baseline
✅ Works with only 16 bits and without supervision
CroVCA retrieves correct classes even for fine-grained or ambiguous queries (e.g., indigo bird, grey langur).
✅ Outperforms Hashing-Baseline
✅ Works with only 16 bits and without supervision
Even with just 16 bits, CroVCA preserves class structure.
t-SNE on CIFAR-10 shows clear, separable clusters — almost identical to the original 768-dim embeddings.
Even with just 16 bits, CroVCA preserves class structure.
t-SNE on CIFAR-10 shows clear, separable clusters — almost identical to the original 768-dim embeddings.
Tested on multiple vision encoders (SimDINOv2, DINOv2, DFN…), CroVCA achieves SOTA unsupervised hashing:
Tested on multiple vision encoders (SimDINOv2, DINOv2, DFN…), CroVCA achieves SOTA unsupervised hashing:
CroVCA trains in just ~5 epochs:
✅ COCO (unsupervised) <2 min
✅ ImageNet100 (supervised) ~3 min
✅ Single GPU
Despite simplicity, it achieves state-of-the-art retrieval performance.
CroVCA trains in just ~5 epochs:
✅ COCO (unsupervised) <2 min
✅ ImageNet100 (supervised) ~3 min
✅ Single GPU
Despite simplicity, it achieves state-of-the-art retrieval performance.
A lightweight MLP with final BatchNorm for balanced bits (inspired by OrthoHash). Can be used as:
🔹 Probe on frozen features
🔹 LoRA-based fine-tuning for efficient encoder adaptation
A lightweight MLP with final BatchNorm for balanced bits (inspired by OrthoHash). Can be used as:
🔹 Probe on frozen features
🔹 LoRA-based fine-tuning for efficient encoder adaptation
Can supervised + unsupervised hashing be done in one framework?
CroVCA aligns binary codes across semantically consistent views:
Augmentations → unsupervised
Class-consistent samples → supervised
🧩 One BCE loss + coding-rate regularizer
Can supervised + unsupervised hashing be done in one framework?
CroVCA aligns binary codes across semantically consistent views:
Augmentations → unsupervised
Class-consistent samples → supervised
🧩 One BCE loss + coding-rate regularizer
Foundation models (DINOv3, DFN, SWAG…) produce rich embeddings, but similarity search in high-dimensional spaces is expensive.
Hashing provides fast Hamming-distance search, yet most deep hashing methods are complex, slow, and tied to a single paradigm.
Foundation models (DINOv3, DFN, SWAG…) produce rich embeddings, but similarity search in high-dimensional spaces is expensive.
Hashing provides fast Hamming-distance search, yet most deep hashing methods are complex, slow, and tied to a single paradigm.
I can't wait to see them this summer in Motocultor Festival
I can't wait to see them this summer in Motocultor Festival