Lightnews — Scholar-powered news

lhl

@lhl.bsky.social

460 followers 330 following 130 posts

Easily distracted, currently building open source AI. Living online since FidoNet

Posts Replies Media Videos

lhl

@lhl.bsky.social

Previously, they did some extensive coverage of the US tariff situation/impact that was some of the best coverage/explanation I saw across any news media as well: www.youtube.com/watch?v=1W_m...

The Death of Affordable Computing | Tariffs Impact & Investigation

YouTube video by Gamers Nexus

www.youtube.com

August 18, 2025 at 6:19 AM

lhl

@lhl.bsky.social

One crazy observation, I just used both Shisa V2 405B and ChatGPT 4.5 (who's JA benchmark scores are the best we're tested) to write a Japanese tweet for me and 4.5 overwhelmingly preferred Shisa V2's tweet: chatgpt.com/share/683e88...

ChatGPT - Shisa V2 405B チャット

Shared via ChatGPT

chatgpt.com

June 3, 2025 at 5:37 AM

lhl

@lhl.bsky.social

Perhaps a more interesting side note is that I am still basically illiterate in Japanese, but wrote this presentation with almost no native speaker review/assistance - just many many rounds of LLM assistance (mainly GPT-4.5, but some help from Shisa V2 405B too! 😂) including for final editing.

June 3, 2025 at 5:15 AM

lhl

@lhl.bsky.social

We're still working on a full proper technical report (tracking down references are hard) but we have an Overview Report slide deck I posted in EN/JA here: shisa.ai/posts/shisa-...

It's my first Japanese slide deck and I super embraced the aesthetic!

June 3, 2025 at 5:11 AM

lhl

@lhl.bsky.social

Related to an earlier observation bsky.app/profile/did:... - but since, both our 70B and 405B Shisa V2 models are *stronger than GPT-4 in Japanese,* it has trouble judging them. Luckily GPT-4.1 is still able to distinguish them. 😅

June 3, 2025 at 5:08 AM

lhl

@lhl.bsky.social

BTW, right now you can chat w/ an FP8 version of Shisa V2 405B online now. If you don't speak Japanese, you can ask it to translate or even teach you some 😀 chat.shisa.ai

June 3, 2025 at 5:02 AM

lhl

@lhl.bsky.social

Any batching will affect determinism, but also changes to the kvcache layout (since they can change the GEMM shapes used which can lead to bit level differences) so I don't think it's safe to blanket claim that outputs will necessarily be deterministic even when running locally at temp=0

May 17, 2025 at 8:36 AM

lhl

@lhl.bsky.social

Each DPO for the 405B took all 256 H100s at our disposal and took about 3300 GPU hours. By comparison, doing a full SFT+DPO on our Shisa V2 70B "only" took about 1200 H100 hours.

DPO mini-sweep. The calculated-scaled LR did end up being the best overall performer.

April 28, 2025 at 12:29 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news