Lightnews — Scholar-powered news

青龍聖者

@bdsqlsz.bsky.social

Hugging Face has just announced their storage plan.
25$/TB/month
They do not measure public datasets, only private datasets are calculated.
Deleted a 14TB outdated dataset, now it fits perfectly.
huggingface.co/docs/hub/sto...

December 11, 2024 at 6:19 AM

青龍聖者

@bdsqlsz.bsky.social

Generative Photography:Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis
generative-photography.github.io/project/

I am glad that I did not delete the ICC information of the images when I organized the dataset before...

December 10, 2024 at 6:19 PM

青龍聖者

@bdsqlsz.bsky.social

TRELLIS on windows, 16G VRAM need Auto install and Download Model.
I spent 2 hours compiling all the libraries needed for Windows and the one-click installation script.
github.com/sdbds/TRELLI...

December 10, 2024 at 6:18 PM

青龍聖者

@bdsqlsz.bsky.social

Pixtral-large just became the current SOTA open source VLM.
Just processed 100K datasets captions with it (insert Danbooru tags)
Only took 24 hours and it's free.😁
I want to complete the captions of all 210K datasets today.

December 4, 2024 at 4:28 PM

青龍聖者

@bdsqlsz.bsky.social

Tencent open source 13B HuanYuanVideo,SOTA GenVideoModel!
page:https://aivideo.hunyuan.tencent.com
code:https://github.com/Tencent/HunyuanVideo

December 3, 2024 at 5:55 AM

Reposted by 青龍聖者

Michael Tschannen

@mtschannen.bsky.social

Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)?

We have been pondering this during summer and developed a new model: JetFormer 🌊🤖

arxiv.org/abs/2411.19722

A thread 👇

1/

December 2, 2024 at 4:41 PM

青龍聖者

@bdsqlsz.bsky.social

Publish a 23M generated dataset from Midjourney Captioned Full Dataset. Thanks to the contribution of a certain third-party data provider who wishes to remain anonymous.
huggingface.co/datasets/dee...

December 2, 2024 at 10:36 AM

青龍聖者

@bdsqlsz.bsky.social

New tagger model is coming! Thank Mr. SmilingWolf！
huggingface.co/spaces/Smili...

November 30, 2024 at 2:07 PM

青龍聖者

@bdsqlsz.bsky.social

Style-Friendly SNR Sampler for Style-Driven Generation
Just change SNR can make better style model!
μ=−6,α = 2
if you use kohya, just set
--weighting_scheme=logit_normal
--logit_mean=-6,
--logit_std=2
arxiv.org/abs/2411.147...

November 27, 2024 at 6:04 AM

Reposted by 青龍聖者

Andi

@andimara.bsky.social

Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

SmolVLM can be fine-tuned on a Google collab and be run on a laptop! Or process millions of documents with a consumer GPU!

November 26, 2024 at 3:57 PM

青龍聖者

@bdsqlsz.bsky.social

Sora was initially amazing, but now it's way behind.
huggingface.co/spaces/PR-Pu...

November 26, 2024 at 6:23 PM

青龍聖者

@bdsqlsz.bsky.social

Stability AI drop ControlNets for Stable Diffusion 3.5 Large
stability.ai/news/sd3-5-l...
Blur、Canny、Depth

November 26, 2024 at 4:22 PM

青龍聖者

@bdsqlsz.bsky.social

Kaze AI, a watermark cleaning tool.
kaze.ai/toolkit/wate...

November 26, 2024 at 1:58 PM

青龍聖者

@bdsqlsz.bsky.social

Qwen2vl-Flux is a SOTA multimodal image generation model that enhances FLUX with Qwen2VL's vision-language understanding capabilities.
huggingface.co/Djrango/Qwen...

November 25, 2024 at 2:06 PM

青龍聖者

@bdsqlsz.bsky.social

OminiControl: Minimal and Universal Control for Diffusion Transformer
Flux 1. model
Code:https://github.com/Yuanshi9815/OminiControl
Demo:https://huggingface.co/spaces/Yuanshi/OminiControl