Lightnews — Scholar-powered news

Christoph Minixhofer

@cdminix.bsky.social

100 followers 220 following 120 posts

Post-doc @ University of Edinburgh. Working on Synthetic Speech Evaluation at the moment.
🇳🇴 Oslo 🏴󠁧󠁢󠁳󠁣󠁴󠁿 Edinburgh 🇦🇹 Graz

Posts Replies Media Videos

Christoph Minixhofer

@cdminix.bsky.social

I don't download new HF models often, but when I do, it's during the 0.008% of downtime :(

October 20, 2025 at 9:04 AM

Christoph Minixhofer

@cdminix.bsky.social

It's been a great #interspeech2025!
I presented a TTS-for-ASR paper:
www.isca-archive.org/interspeech_...
And one on prosody reps: www.isca-archive.org/interspeech_...
There were many interesting questions & comments - if you have more and didn't get the chance feel free to send me a message.

August 21, 2025 at 4:47 PM

Christoph Minixhofer

@cdminix.bsky.social

Thank you to everyone who stopped by, I’m grateful for all the feedback and interesting questions #interspeech2025

August 20, 2025 at 12:42 PM

Christoph Minixhofer

@cdminix.bsky.social

One day until the Q2 ttsdsbenchmark.com update. We‘ll see which TTS system tops the leaderboard this time - some new ones have been added that could shake things up.

July 4, 2025 at 6:29 AM

Christoph Minixhofer

@cdminix.bsky.social

This figure motivated a lot of my PhD (or at least nudged me into a direction) -- check out arxiv.org/abs/2110.11479 (Hu et al.) if you haven't come across it before, it really frames the problem of synthetic/real speech distributions well.

Figure showing two overlapping bell curves representing data distributions. The green curve on the left is labeled ‘synthetic data distribution’, and the black curve on the right is labeled ‘true data distribution’. The horizontal axis is divided into four regions: ‘artifacts’ (only covered by the green curve), ‘over-sampled’ (where the synthetic curve is higher than true), ‘under-sampled’ (where the true curve is higher than synthetic), and ‘missing samples’ (only covered by the black curve). Caption: Fig. 1 describes the gap between synthetic and true data distributions partitioned into four regions.

June 30, 2025 at 6:40 PM

Christoph Minixhofer

@cdminix.bsky.social

Spotted a Norwegian flag across the Firth of Forth, didn’t know Norwegians had hytte on this side of the North Sea as well!

Norwegian flag in a sunny and green scene in Scotland with water and a bridge in the background.

June 29, 2025 at 12:42 PM

Christoph Minixhofer

@cdminix.bsky.social

When future archeologists dig up the remains of my thesis in 3,000 years.

May 13, 2025 at 4:56 PM

Christoph Minixhofer

@cdminix.bsky.social

I’m told it is mandatory in Norway to leave the city and go to a hytte in thewoods on the weekend, so doing my best.

December 14, 2024 at 1:55 PM

Christoph Minixhofer

@cdminix.bsky.social

Nice, good to know. Do you mean what happens to the reprs after fine-tuning? I'd guess the more different the downstream task the bigger a jump you'd see in the last layer(s). It's already visible in the paper I linked (phone identity and word identity) - although idk why word meaning improved!

Visualization of properties encoded at different W2V2
layers. The curves measure different metrics on different
scales; they are shown together only to compare where ma-
jor peaks and valleys occur. Details in sections 5.2.1 - 5.2.3 in https://arxiv.org/pdf/2107.04734

December 12, 2024 at 7:15 PM

Christoph Minixhofer

@cdminix.bsky.social

Yes, the energy requirements (especially for training) are not transparent enough and a lot of AI use is frivolous. At the moment a ChatGPT query takes about 15x the energy of a g. search. Yet no one is telling me to go to the library and read through conference proceedings to avoid 15+ g. searches.

December 11, 2024 at 10:31 AM

Christoph Minixhofer

@cdminix.bsky.social

So if we look at google scholar results for both, it looks like SMOS is on the rise, but it has actually been used at least as long as CMOS for speech synthesis evaluation.
CMOS has a history in evaluation standards, just like MOS. But recently it's all about speech synth.

(7/9)

December 10, 2024 at 9:37 AM

Christoph Minixhofer

@cdminix.bsky.social

Presented my poster on TTSDS, a benchmark for Text-to-Speech at #slt2024 yesterday.

We found that our zero-shot distribution distance (similar to FID across several factors like prosody, speaker, etc.) correlated well with subjective evaluation for TTS systems from 2008 to 2024.

ttsdsbenchmark.com

December 4, 2024 at 6:48 AM

Christoph Minixhofer

@cdminix.bsky.social

If people had cheered for Elon at that Dave Chapelle gig ages ago, could we have avoided this entire timeline?

November 21, 2024 at 6:15 AM

Christoph Minixhofer

@cdminix.bsky.social

As part of some ongoing work, I'm releasing the currently biggest collection of docker containers for state-of-the-art #voicecloning #tts systems. github.com/ttsds/datasets
Alongside there is also a nice overview of all systems (see below)

November 19, 2024 at 11:19 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news