Lightnews — Scholar-powered news

arXiv cs.SD Sound

@cssd-bot.bsky.social

[2025-12-31 Wed (UTC), no new articles found for csSD Sound]

December 31, 2025 at 6:39 AM

Reposted by arXiv cs.SD Sound

arXiv cs.CL Computation and Language

@cscl-bot.bsky.social

Deepak Babu Piskala: PROFASR-BENCH: A Benchmark for Context-Conditioned ASR in High-Stakes Professional Speech https://arxiv.org/abs/2512.23686 https://arxiv.org/pdf/2512.23686 https://arxiv.org/html/2512.23686

December 30, 2025 at 6:30 AM

Reposted by arXiv cs.SD Sound

arXiv cs.CL Computation and Language

@cscl-bot.bsky.social

Yu-Xiang Lin, Cheng-Han Chiang, Hung-yi Lee: Style Amnesia: Investigating Speaking Style Degradation and Mitigation in Multi-Turn Spoken Language Models https://arxiv.org/abs/2512.23578 https://arxiv.org/pdf/2512.23578 https://arxiv.org/html/2512.23578

December 30, 2025 at 6:30 AM

Reposted by arXiv cs.SD Sound

arXiv eess.AS Audio and Speech Processing

@eessas-bot.bsky.social

I\c{s}{\i}k, I\c{s}{\i}k, I\c{s}{\i}k, Taylan: Geometry-Aware Optimization for Respiratory Sound Classification: Enhancing Sensitivity with SAM-Optimized Audio Spectrogram Transformers https://arxiv.org/abs/2512.22564 https://arxiv.org/pdf/2512.22564 https://arxiv.org/html/2512.22564

December 30, 2025 at 6:35 AM

Reposted by arXiv cs.SD Sound

arXiv eess.SP Signal Processing

@eesssp-bot.bsky.social

Hanbeot Park, Yunjeong Cho, Hunhee Kim: EEG-to-Voice Decoding of Spoken and Imagined speech Using Non-Invasive EEG https://arxiv.org/abs/2512.22146 https://arxiv.org/pdf/2512.22146 https://arxiv.org/html/2512.22146

December 30, 2025 at 6:35 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Saifelden M. Ismail: Mobile-Efficient Speech Emotion Recognition Using DistilHuBERT: A Cross-Corpus Validation Study https://arxiv.org/abs/2512.23435 https://arxiv.org/pdf/2512.23435 https://arxiv.org/html/2512.23435

December 30, 2025 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Pierre Mackenzie: Chord Recognition with Deep Learning https://arxiv.org/abs/2512.22621 https://arxiv.org/pdf/2512.22621 https://arxiv.org/html/2512.22621

December 30, 2025 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

HaeChun Chung: AudioGAN: A Compact and Efficient Framework for Real-Time High-Fidelity Text-to-Audio Generation https://arxiv.org/abs/2512.22166 https://arxiv.org/pdf/2512.22166 https://arxiv.org/html/2512.22166

December 30, 2025 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Ni, Yang, Tian, Li, Lyu, Du, Wang, Luo, Zhang: Marco-ASR: A Principled and Metric-Driven Framework for Fine-Tuning Large-Scale ASR Models for Domain Adaptation https://arxiv.org/abs/2512.22165 https://arxiv.org/pdf/2512.22165 https://arxiv.org/html/2512.22165

December 30, 2025 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Jin Sob Kim, Hyun Joon Park, Wooseok Shin, Sung Won Han: A Robust framework for sound event localization and detection on real recordings https://arxiv.org/abs/2512.22156 https://arxiv.org/pdf/2512.22156 https://arxiv.org/html/2512.22156

December 30, 2025 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Jin Sob Kim, Hyun Joon Park, Wooseok Shin, Sung Won Han: Rethinking Leveraging Pre-Trained Multi-Layer Representations for Speaker Verification https://arxiv.org/abs/2512.22148 https://arxiv.org/pdf/2512.22148 https://arxiv.org/html/2512.22148

December 30, 2025 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

[2025-12-30 Tue (UTC), 6 new articles found for csSD Sound]

December 30, 2025 at 6:34 AM

Reposted by arXiv cs.SD Sound

arXiv eess.AS Audio and Speech Processing

@eessas-bot.bsky.social

Ruihao Jing, Cheng Gong, Yu Jiang, Boyu Zhu, Shansong Liu, Chi Zhang, Xiao-Lei Zhang, Xuelong Li: Rare Word Recognition and Translation Without Fine-Tuning via Task Vector in Speech Models https://arxiv.org/abs/2512.21894 https://arxiv.org/pdf/2512.21894 https://arxiv.org/html/2512.21894

December 29, 2025 at 6:35 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Most. Sharmin Sultana Samu, Md. Rakibul Islam, Md. Zahid Hossain, Md. Kamrozzaman Bhuiyan, Farhad Uz Zaman: Zero-Shot to Zero-Lies: Detecting Bengali Deepfake Audio through Transfer Learning https://arxiv.org/abs/2512.21702 https://arxiv.org/pdf/2512.21702 https://arxiv.org/html/2512.21702

December 29, 2025 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Liuyang Bai, Weiyi Lu, Li Guo: Semantic Codebooks as Effective Priors for Neural Speech Compression https://arxiv.org/abs/2512.21653 https://arxiv.org/pdf/2512.21653 https://arxiv.org/html/2512.21653

December 29, 2025 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

[2025-12-29 Mon (UTC), 2 new articles found for csSD Sound]

December 29, 2025 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

[2025-12-26 Fri (UTC), no new articles found for csSD Sound]

December 26, 2025 at 6:39 AM

Reposted by arXiv cs.SD Sound

arXiv cs.CL Computation and Language

@cscl-bot.bsky.social

Zhongren Dong, Haotian Guo, Weixiang Xu, Huan Zhao, Zixing Zhang: Foundation Model-based Evaluation of Neuropsychiatric Disorders: A Lifespan-Inclusive, Multi-Modal, and Multi-Lingual Study https://arxiv.org/abs/2512.20948 https://arxiv.org/pdf/2512.20948 https://arxiv.org/html/2512.20948

December 25, 2025 at 6:30 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Wan Ki Wong, Ka Ho To, Chuck-jee Chau, Lucas Wong, Kevin Y. Yip, Irwin King: Towards Practical Automatic Piano Reduction using BERT with Semi-supervised Learning https://arxiv.org/abs/2512.21324 https://arxiv.org/pdf/2512.21324 https://arxiv.org/html/2512.21324

December 25, 2025 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Zhongren Dong, Bin Wang, Jing Han, Haotian Guo, Xiaojun Mo, Yimin Cao, Zixing Zhang: SACodec: Asymmetric Quantization with Semantic Anchoring for Low-Bitrate High-Fidelity Neural Speech Codecs https://arxiv.org/abs/2512.20944 https://arxiv.org/pdf/2512.20944 https://arxiv.org/html/2512.20944

December 25, 2025 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

[2025-12-25 Thu (UTC), 2 new articles found for csSD Sound]

December 25, 2025 at 6:34 AM

Reposted by arXiv cs.SD Sound

arXiv cs.CL Computation and Language

@cscl-bot.bsky.social

Poli, Luthra, Benchekroun, Higuchi, Gleize, Shen, Algayres, Chung, Assran, Pino, Dupoux: SpidR: Learning Fast and Stable Linguistic Units for Spoken Language Models Without Supervision https://arxiv.org/abs/2512.20308 https://arxiv.org/pdf/2512.20308 https://arxiv.org/html/2512.20308

December 24, 2025 at 6:30 AM

Reposted by arXiv cs.SD Sound

arXiv cs.CL Computation and Language

@cscl-bot.bsky.social

Qian Chen, Luyao Cheng, Chong Deng, Xiangang Li, Jiaqing Liu, Chao-Hong Tan, Wen Wang, Junhao Xu, Jieping Ye, Qinglin Zhang, Qiquan Zhang, Jingren Zhou: Fun-Audio-Chat Technical Report https://arxiv.org/abs/2512.20156 https://arxiv.org/pdf/2512.20156 https://arxiv.org/html/2512.20156

December 24, 2025 at 6:29 AM

Reposted by arXiv cs.SD Sound

arXiv eess.AS Audio and Speech Processing

@eessas-bot.bsky.social

Chengwei Liu, Haoyin Yan, Shaofei Xue, Xiaotao Liang, Xiaofu Chen, Bin Gong, Zheng Xue, Gang Song: QuarkAudio Technical Report https://arxiv.org/abs/2512.20151 https://arxiv.org/pdf/2512.20151 https://arxiv.org/html/2512.20151

December 24, 2025 at 6:35 AM

Reposted by arXiv cs.SD Sound

arXiv cs.CV Computer Vision and Pattern Recognition

@cscv-bot.bsky.social

Tian, Du, Zhang, Wang, Lee, Bai, Zhu, Niu, Tang: DDAVS: Disentangled Audio Semantics and Delayed Bidirectional Alignment for Audio-Visual Segmentation https://arxiv.org/abs/2512.20117 https://arxiv.org/pdf/2512.20117 https://arxiv.org/html/2512.20117

December 24, 2025 at 6:30 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news