Lightnews — Scholar-powered news

appenz.bsky.social

@appenz.bsky.social

And this is fully fine of Flux dev on Replicate with two LoRAs and a good amount of parameter optimizations. It's a different style, but also a whole different level of quality.

January 20, 2025 at 6:29 AM

appenz.bsky.social

@appenz.bsky.social

For comparison, here is what Krea AI generates with a photo and an style to give it the animated movie look. This does look a little like a me.

January 20, 2025 at 6:29 AM

appenz.bsky.social

@appenz.bsky.social

6/6 Full text of the "Ensuring U.S. Security and Economic Strength in the Age of Artificial Intelligence" below. I recommend loading it into an LLM and use it to find stuff.

Full text: public-inspection.federalregister.gov/2025-00636.pdf

Just don't trust the LLM to do the math. This is GPT-4o.

January 13, 2025 at 6:23 PM

appenz.bsky.social

@appenz.bsky.social

5/6 No license is required for: Australia, Belgium, Canada, Denmark, Finland, France, Germany, Ireland, Italy, Japan, the Netherlands, New Zealand, Norway, Republic of Korea, Spain, Sweden, Taiwan, the
United Kingdom.

Singapore, Switzerland and Israel are missing.

January 13, 2025 at 6:23 PM

appenz.bsky.social

@appenz.bsky.social

4/6 GPUs are covered if either their "TPP" exceeds 4,800.

TPP is defined as TOPs * Bit Legth * 2 w/ sparsity. So for example:
- H100: 1,000 TOPS * 16 bit = 16,000 TPP 🚫
- A100: 312 TOPS * 16 bit = 4,990 TPP 🚫

Full details in the CCL: www.bis.doc.gov/index.php/d...

January 13, 2025 at 6:23 PM

appenz.bsky.social

@appenz.bsky.social

3/6 There are no restrictions for open weight models. This will certainly help open source models as they are now much easier to handle.

January 13, 2025 at 6:23 PM

appenz.bsky.social

@appenz.bsky.social

2/6 It regulates models trained > 10^26 operations. Quick math:
- 70b Model : 70 billion × 6 × 15 trillion = 6*10^24 ✅
- 405b Model: 405 billion × 6 × 15 trillion = 3.6×10^25 ✅
So the cutoff is around 1T weights trained on 15T tokens for one epoch.

January 13, 2025 at 6:23 PM

appenz.bsky.social

@appenz.bsky.social

5/5 One more thing that surprised me was that all of the top quant trading firms from wall street had large recruiting booth at NeurIPS. Crazy times.

December 17, 2024 at 6:59 PM

appenz.bsky.social

@appenz.bsky.social

4/5 The debate of auto-regressive vs. diffusion continues. I thought for images diffusion had won, but the NeurIPS best paper (below) and the new @GroqInc image model are auto-regressive. Diffusion for LLMs also are a thing now. 🤷‍♂️

arxiv.org/abs/2404.02905

December 17, 2024 at 6:59 PM

appenz.bsky.social

@appenz.bsky.social

3/5 Inference-time compute. With models topping out, this is the next frontier for improving AI performance. Good intro on the @huggingface blog:

huggingface.co/spaces/Hugg...

And there is a lot more we can do, e.g. prompt optimization (DSPy/ TextGrad), workflow and UI.

December 17, 2024 at 6:59 PM

appenz.bsky.social

@appenz.bsky.social

2/5 Our assumption since early 2023 has been that pre-training of LLMs would stall as we are out of data. Ilya's Test-of-Time acceptance speech may end the debate. LLM performance will converge, which likely helps OSS models. The emphasis shifts to higher layers including...

December 17, 2024 at 6:59 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news