Lightnews — Scholar-powered news

Scott Lowe

@scottclowe.bsky.social

There will also be continued interest in methods that allow more controllable manipulation of generated images (e.g. UIP2P arxiv.org/abs/2412.15216, Imagic arxiv.org/abs/2210.09276).

But maybe the pending copyright lawsuits will have major impacts on GenAI.

UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency

We propose an unsupervised model for instruction-based image editing that eliminates the need for ground-truth edited images during training. Existing supervised methods depend on datasets containing ...

arxiv.org

January 3, 2025 at 5:40 PM

Scott Lowe

@scottclowe.bsky.social

For GenAI, improvements to generation quality are going to come from better data curation and value fns to drive the model toward high-quality outputs. Standard model training results in outputs representative of the training distribution, but users don't want average—we want the best quality.

January 3, 2025 at 5:39 PM

Scott Lowe

@scottclowe.bsky.social

This is crucial for AGI, and will pose serious safety concerns. Models better at thinking outside the box and coming up with creative solutions will have broader implications than the prompter anticipated arxiv.org/abs/2412.04984.

Frontier Models are Capable of In-context Scheming

Frontier models are increasingly trained and deployed as autonomous agent. One safety concern is that AI agents might covertly pursue misaligned goals, hiding their true capabilities and objectives - ...

arxiv.org

January 3, 2025 at 5:38 PM

Scott Lowe

@scottclowe.bsky.social

Humans solve novel problems on-the-fly using System 2 reasoning: AI needs this too. By learning reasoning steps at training time, at deployment the model can build new sequences of reasoning steps, enabling it to extrapolate.

January 3, 2025 at 5:37 PM

Scott Lowe

@scottclowe.bsky.social

Reasoning capabilities are essential to reach robust perf in key ML products e.g. full self-driving. The distribution of driving scenarios is long-tailed, so even a model that covers most situations well may be faced with a novel situation outside its training data, but needs to respond correctly.

January 3, 2025 at 5:37 PM

Scott Lowe

@scottclowe.bsky.social

One way to do this is hierarchical LLMs like the Large Concept Model arxiv.org/abs/2412.08821, Byte Latent Transformer arxiv.org/abs/2412.09871, and Block Transformer arxiv.org/abs/2406.02657.

Large Concept Models: Language Modeling in a Sentence Representation Space

LLMs have revolutionized the field of artificial intelligence and have emerged as the de-facto tool for many tasks. The current established technology of LLMs is to process input and generate output a...

arxiv.org

January 3, 2025 at 5:35 PM

Scott Lowe

@scottclowe.bsky.social

For agentic models, the focus is shifting to System 2-like reasoning. OpenAI’s o1/o3 models demonstrate reasoning step-by-step can improve output quality by leveraging test-time compute. But impressive results on ARC are expensive, hence there will be a focus on improving test-compute efficiency.

January 3, 2025 at 5:34 PM

Scott Lowe

@scottclowe.bsky.social

But the LLM training corpus is now the majority of worthwhile text humanity has ever written, and can't be meaningfully scaled further. As Ilya Sutskever put it at NeurIPS, "big data is the fossil fuel of AI".

With this in mind, what will be the next stage of AI development?

January 3, 2025 at 5:31 PM

Scott Lowe

@scottclowe.bsky.social

This has some serious AI safety implications. Having an AI model able to classify what is in an image better than a human doesn't pose an existential threat. But when an AI model can perform long-term planning better than a human, "just unplug it" ceases to be a reliable solution

December 9, 2024 at 5:16 PM

Scott Lowe

@scottclowe.bsky.social

In "System 2 Reasoning Capabilities Are Nigh", I lay out comparisons between human reasoning and reasoning in AI models, and argue that all the components needed to create AI models that can perform human-like reasoning already exist.
arxiv.org/abs/2410.03662

System 2 Reasoning Capabilities Are Nigh

In recent years, machine learning models have made strides towards human-like reasoning capabilities from several directions. In this work, we review the current state of the literature and describe t...

arxiv.org

December 9, 2024 at 5:13 PM

Scott Lowe

@scottclowe.bsky.social

It's very easy to get started with using the dataset. The commands to download it and load it for PyTorch training fit in less than half a tweet:

!pip install bioscan-dataset
from bioscan_dataset import BIOSCAN5M
ds = BIOSCAN5M("~/Datasets/bioscan-5m", download=True)

December 9, 2024 at 5:08 PM

Scott Lowe

@scottclowe.bsky.social

The dataset should be useful for a variety of research topics:
- multimodal learning
- fine-grained classification
- hierarchical labelling
- open-world classification/clustering
- semi- and self-supervised learning

December 9, 2024 at 5:00 PM

Scott Lowe

@scottclowe.bsky.social

BIOSCAN-5M is a multimodal dataset for insect biodiversity monitoring. It consists of 5 million insect specimens from around the world, with a high-res microscopy image, DNA barcode, taxonomic labels, size, and geolocation info for each sample.
arxiv.org/abs/2406.127...

BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity

As part of an ongoing worldwide effort to comprehend and monitor insect biodiversity, this paper presents the BIOSCAN-5M Insect dataset to the machine learning community and establish several benchmar...

arxiv.org

December 9, 2024 at 4:57 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news