Lightnews — Scholar-powered news

Chris Offner

@chrisoffner3d.bsky.social

2.6K followers 1.2K following 2K posts

Student Researcher @ RAI Institute, MSc CS Student @ ETH Zurich visual computing, 3D vision, spatial AI, machine learning, robot perception. 📍Zurich, Switzerland

Posts Media Videos Starter Packs

Pinned

Chris Offner @chrisoffner3d.bsky.social · Nov 11

When vision people do graphics, they call it "image synthesis." When graphics people do vision, they call it "inverse rendering". ;)

5 55

Reposted by Chris Offner

Andreas Geiger @andreasgeiger.bsky.social · 15d

#TTT3R: 3D Reconstruction as Test-Time Training
TTT3R offers a simple state update rule to enhance length generalization for #CUT3R — No fine-tuning required!
🔗Page: rover-xingyu.github.io/TTT3R
We rebuilt @taylorswift13’s "22" live at the 2013 Billboard Music Awards - in 3D!

4 36

Reposted by Chris Offner

European Commission @ec.europa.eu · Sep 6

🚀 Europe’s first exascale supercomputer is here!

JUPITER, launched in Germany, is the EU’s most powerful system and fourth fastest worldwide.

100% powered by renewables, it has also ranked first in energy efficiency. It will boost AI, science, and climate research.

Read more - europa.eu/!vcWBqW

11 52 230

Reposted by Chris Offner

Ryan Moulton @moultano.bsky.social · Sep 5

There is a lot to hate about the politics of the silicon valley right, but they do actually want to build stuff, and I would prefer if the left didn't cede "we should be able to build stuff" to the right.

290 17 260

Chris Offner @chrisoffner3d.bsky.social · Sep 5

People often use "smart" when they mean "wise" and I don't think it's too controversial to doubt the wisdom of some tech elites. Other than that I certainly agree with you.

1 4

Reposted by Chris Offner

Keenan Crane @keenancrane.bsky.social · Aug 29

I can't* fathom why the top picture, and not the bottom picture, is the standard diagram for an autoencoder.

The whole idea of an autoencoder is that you complete a round trip and seek cycle consistency—why lay out the network linearly?

11 25 160

Chris Offner @chrisoffner3d.bsky.social · Aug 23

I love both.

Chris Offner @chrisoffner3d.bsky.social · Aug 23

Great video on the convergent evolution from hierarchical military command structures to cybernetics to centralized AI coordination across political ideologies:
www.youtube.com/watch?v=mayo...

1 2

Chris Offner @chrisoffner3d.bsky.social · Aug 22

I'd also welcome a Bayesian framing. I know Andrew Davison's group has done work on Gaussian belief propagation for SLAM factor graphs (gaussianbp.github.io) but other than that and arxiv.org/abs/1703.04977, I'm not aware of of much Bayesian (deep) learning in (3D) vision right now.

Gaussian Belief Propagation

gaussianbp.github.io

Reposted by Chris Offner

Johan Edstedt @parskatt.bsky.social · Aug 22

In general I think 3D vision would do well to take some inspiration from Bayesians. I guess these days they lost their glamour, but imo it's a very nice way of thinking that feels somewhat lost currently.

2 1 2

Chris Offner @chrisoffner3d.bsky.social · Aug 22

"It is beautiful. It is elegant. Does it work well in practice? Not really. This is often the caveat we face in research: the things that are beautiful don't work and the things that work are not beautiful." – Daniel Cremers

2 5 36

Chris Offner @chrisoffner3d.bsky.social · Aug 22

You follow him. Andrew Davison from Imperial College London.

Chris Offner @chrisoffner3d.bsky.social · Aug 21

"As roboticists and computer vision people [outside of big tech], do we have to just wait for the next foundation model?"

I share the frustration. It's disempowering when most major progress recently is downstream of "foundation models" that you don't have the compute or data to train yourself.

5 2 24

Reposted by Chris Offner

Bibliome @bibliome.club · Aug 20

We're live on bluesky! bibliome.club is the platform for creating, collaborating on and sharing reading lists with your Bluesky network - open source and decentralised via ATProto.

Bibliome - Building the very best reading lists, together

Create collaborative bookshelves, discover new books, and build reading communities with friends. Join the decentralized reading revolution powered by Bluesky.

bibliome.club

7 70 250

Chris Offner @chrisoffner3d.bsky.social · Aug 19

a man with a beard and glasses is making a funny face .

ALT: a man with a beard and glasses is making a funny face .

media.tenor.com

Chris Offner @chrisoffner3d.bsky.social · Aug 19

Sort of, but DINOv3 also seems to (inadvertently?) point towards the limits of pure scaling.
x.com/chrisoffner3...

2 3

Reposted by Chris Offner

Chris Offner @chrisoffner3d.bsky.social · Mar 4

The US calculus seems to be:
- The main 21st century story is US v. China.
- The US thus needs to focus on the Pacific.
- They need to peel Russia off of China and make it an ally.
- If this happens at the cost of the Europeans, so be it.
- Europe is useless as an ally and harmless as an adversary.

8 1 13

Chris Offner @chrisoffner3d.bsky.social · Aug 15

If you maximize cosine similarity, aren't you left with only a single dimension (i.e. scaling the vector norm) as CosSim-invariant "wiggle room" to encode geometric information that isn't also captured by the language?

Chris Offner @chrisoffner3d.bsky.social · Aug 15

Yes but that's an additional training objective beyond merely minimizing cosine similarity. You'd need to introduce something that ensures that pixel features don't just collapse to language semantics, via some auxiliary task, no?

Chris Offner @chrisoffner3d.bsky.social · Aug 15

It just seems to me that mapping pixels and language to highly similar internal representations means that you'll drop a lot of information that is not (or cannot) be accurately described by language.

3 1

Chris Offner @chrisoffner3d.bsky.social · Aug 15

If we try to perfectly reconstruct, e.g., a complex 3D mesh from a natural language description, we'll find that the two modalities operate on very different levels of precision and abstraction.

Chris Offner @chrisoffner3d.bsky.social · Aug 15

My concern is that language as a modality inherently biases the data towards coarser labels/concepts. You won't perfectly describe per-pixel normals and depth in natural language. Geometry is continuous and "raw", language is discrete and abstract.

2 2

Chris Offner @chrisoffner3d.bsky.social · Aug 15

Oh, interesting. I'll check that out!

Chris Offner @chrisoffner3d.bsky.social · Aug 14

Yay, DINOv3 is out!

SigLIP (VLMs) and DINO are two competing paradigms for image encoders.

My intuition is that joint vision-language modeling works great for semantic problems but may be too coarse for geometry problems like SfM or SLAM.

Most animals navigate 3D space perfectly without language.

1 5 31

Chris Offner @chrisoffner3d.bsky.social · Aug 12

What are the best resources to learn about VLMs? Papers, tutorials, courses, blog posts, whatever is good. I can read the Kimi-VL or GLM tech reports and follow the breadcrumbs but I'd appreciate any and all recommendations towards a useful VLM curriculum! 🙏

1 2 8

Chris Offner @chrisoffner3d.bsky.social · Aug 7

The tiny hand of the market.