lhl
banner
lhl.bsky.social
lhl
@lhl.bsky.social
Easily distracted, currently building open source AI. Living online since FidoNet
Previously, they did some extensive coverage of the US tariff situation/impact that was some of the best coverage/explanation I saw across any news media as well: www.youtube.com/watch?v=1W_m...
The Death of Affordable Computing | Tariffs Impact & Investigation
YouTube video by Gamers Nexus
www.youtube.com
August 18, 2025 at 6:19 AM
One crazy observation, I just used both Shisa V2 405B and ChatGPT 4.5 (who's JA benchmark scores are the best we're tested) to write a Japanese tweet for me and 4.5 overwhelmingly preferred Shisa V2's tweet: chatgpt.com/share/683e88...
ChatGPT - Shisa V2 405B チャット
Shared via ChatGPT
chatgpt.com
June 3, 2025 at 5:37 AM
Perhaps a more interesting side note is that I am still basically illiterate in Japanese, but wrote this presentation with almost no native speaker review/assistance - just many many rounds of LLM assistance (mainly GPT-4.5, but some help from Shisa V2 405B too! 😂) including for final editing.
June 3, 2025 at 5:15 AM
We're still working on a full proper technical report (tracking down references are hard) but we have an Overview Report slide deck I posted in EN/JA here: shisa.ai/posts/shisa-...

It's my first Japanese slide deck and I super embraced the aesthetic!
June 3, 2025 at 5:11 AM
Related to an earlier observation bsky.app/profile/did:... - but since, both our 70B and 405B Shisa V2 models are *stronger than GPT-4 in Japanese,* it has trouble judging them. Luckily GPT-4.1 is still able to distinguish them. 😅
June 3, 2025 at 5:08 AM
BTW, right now you can chat w/ an FP8 version of Shisa V2 405B online now. If you don't speak Japanese, you can ask it to translate or even teach you some 😀 chat.shisa.ai
June 3, 2025 at 5:02 AM
Any batching will affect determinism, but also changes to the kvcache layout (since they can change the GEMM shapes used which can lead to bit level differences) so I don't think it's safe to blanket claim that outputs will necessarily be deterministic even when running locally at temp=0
May 17, 2025 at 8:36 AM
Each DPO for the 405B took all 256 H100s at our disposal and took about 3300 GPU hours. By comparison, doing a full SFT+DPO on our Shisa V2 70B "only" took about 1200 H100 hours.
April 28, 2025 at 12:29 PM