Lightnews — Scholar-powered news

Niladri Shekhar Dutt

@niladridutt.bsky.social

🧵10/10 Lastly, huge thanks to my co-advisors Niloy and Duygu!
For more details check out our paper below-

🌐 Project Website: monetgpt.github.io
📄 Arxiv: arxiv.org/abs/2505.06176

MonetGPT

monetgpt.github.io

May 27, 2025 at 3:13 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

🧵9/10 We quantitaively evaluate on the Adobe5k dataset as well as conduct user studies by expert and novice users. Our evaluations show that MonetGPT outperforms open-source alternatives and performs comparably to Google Photos AutoEnhance (closed-source).

May 27, 2025 at 3:13 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

🧵8/10 Photo editing is subjective 🎨. Our framework adapts to user preference by guidance from natural language tags like ‘vibrant’ or ‘retro vibe’ to produce personalized and stylistically distinct retouching plans from the same input image.

May 27, 2025 at 3:13 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

🧵7/10 Our puzzle-based training with a 'reasoning as a pathway' approach allows MonetGPT to generate detailed justifications for each edit, delivering truly explainable image retouching

May 27, 2025 at 3:13 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

🧵6/10 🧩 Puzzle C builds planning capabilities. The model learns to generate a complete, multi-step retouching plan to enhance a photo, structuring its reasoning as a sequence of discrete issues and solutions for clarity and control.

May 27, 2025 at 3:13 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

🧵5/10 🧩 Puzzle B imparts aesthetic judgement. By ranking professionally edited photos against altered versions, the MLLM learns to recognize the visual characteristics of an optimally adjusted image for any given operation, building an internal aesthetic model.

May 27, 2025 at 3:13 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

🧵4/10 🧩 Puzzle A builds an understanding of individual operations. The MLLM learns to map visual changes in before/after images to a specific tool and its precise parameter value, effectively learning the semantics of our procedural library.

May 27, 2025 at 3:13 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

🧵3/10 Our key recipe: MLLMs struggle to predict edit values directly. We solve this by generating rich textual reasoning for each puzzle ✍️. We then fine-tune MonetGPT on this data, creating a 'reasoning pathway' that enables it to regress final adjustment values.

May 27, 2025 at 3:13 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

🧵2/10 MLLMs lack the visual understanding to plan edits. 🧠 So, we use expert photos as our ground truth and work backward, procedurally creating puzzles by assuming any change to an expert edit makes it less optimal

May 27, 2025 at 3:13 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

🧵10/10 Lastly, huge thanks to my co-advisors Niloy and Duygu!
For more details check out our paper below-

🌐 Project Website: monetgpt.github.io
📄 Arxiv: arxiv.org/abs/2505.06176

MonetGPT

monetgpt.github.io

May 27, 2025 at 3:04 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

🧵9/10 We quantitaively evaluate on the Adobe5k dataset as well as conduct user studies by expert and novice users. Our evaluations show that MonetGPT outperforms open-source alternatives and performs comparably to Google Photos AutoEnhance (closed-source).

May 27, 2025 at 3:04 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

🧵8/10 Photo editing is subjective 🎨. Our framework adapts to user preference by guidance from natural language tags like ‘vibrant’ or ‘retro vibe’ to produce personalized and stylistically distinct retouching plans from the same input image.

May 27, 2025 at 3:04 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

🧵7/10 Our puzzle-based training with a 'reasoning as a pathway' approach allows MonetGPT to generate detailed justifications for each edit, delivering truly explainable image retouching

May 27, 2025 at 3:04 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

🧵6/10 🧩 Puzzle C builds planning capabilities. The model learns to generate a complete, multi-step retouching plan to enhance a photo, structuring its reasoning as a sequence of discrete issues and solutions for clarity and control.

May 27, 2025 at 3:04 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

🧵5/10 🧩 Puzzle B imparts aesthetic judgement. By ranking professionally edited photos against altered versions, the MLLM learns to recognize the visual characteristics of an optimally adjusted image for any given operation, building an internal aesthetic model.

May 27, 2025 at 3:04 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

🧵4/10 🧩 Puzzle A builds an understanding of individual operations. The MLLM learns to map visual changes in before/after images to a specific tool and its precise parameter value, effectively learning the semantics of our procedural library.

May 27, 2025 at 3:04 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

🧵3/10 Our key recipe: MLLMs struggle to predict edit values directly. We solve this by generating rich textual reasoning for each puzzle ✍️. We then fine-tune MonetGPT on this data, creating a 'reasoning pathway' that enables it to regress final adjustment values.

May 27, 2025 at 3:04 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

🧵2/10 MLLMs lack the visual understanding to plan edits. 🧠 So, we use expert photos as our ground truth and work backward, procedurally creating puzzles by assuming any change to an expert edit makes it less optimal

May 27, 2025 at 3:04 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

Amazon came pretty late to India and we already had some homegrown companies like Flipkart which now competes with Amazon and is valued at $40B.
I think big tech's early access killed homegrown companies. China and to an extent South Korea (Naver) has some great tech companies because of barriers

February 27, 2025 at 11:50 AM

Niladri Shekhar Dutt

@niladridutt.bsky.social

Who will tell the silicon valley tech bros that it wasn't them alone

February 7, 2025 at 8:34 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

Haha exactly what I did today!

January 24, 2025 at 5:27 PM

Niladri Shekhar Dutt

@niladridutt.bsky.social

Have a great time in Seattle!

December 2, 2024 at 7:40 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news