Lightnews — Scholar-powered news

Zhaofeng Lin

@zhaofenglin.bsky.social

59 followers 45 following 13 posts

PhD student @Trinity College Dublin | Multimodal speech recognition
https://chaufanglin.github.io/

Posts Replies Media Videos

Zhaofeng Lin

@zhaofenglin.bsky.social

Results show Auto-AVSR may rely more on audio, with a weaker correlation between MaFI scores and IWERs in AV mode.
In contrast, AVEC shows a stronger use of visual information, with a significant negative correlation, especially in noisy conditions.

[7/8] 🧵

April 1, 2025 at 11:18 AM

Zhaofeng Lin

@zhaofenglin.bsky.social

Occlusion tests reveal AVSR models rely differently on visual segments.

Auto-AVSR & AV-RelScore are equally affected by initial & middle occlusions, while AVEC is more impacted by middle occlusion.

Unlike humans, AVSR models do not depend on initial visual cues.

[5/8] 🧵

April 1, 2025 at 11:17 AM

Zhaofeng Lin

@zhaofenglin.bsky.social

First, we revisit *effective SNR gain* - measured by the difference in SNR at which the AVSR WER equals the reference WER for audio-only recognition at 0 dB.

This metric quantifies the benefit of the visual modality in reducing WER compared to the audio-only system. [3/n] 🧵

April 1, 2025 at 11:16 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news