Hao Tang
larryniven4.bsky.social
Hao Tang
@larryniven4.bsky.social
Lecturer at the University of Edinburgh. Member of Centre of Speech Technology Research (CSTR).
Back then Kaldi hadn't existed yet. HTK was still the dominant toolkit. I cannot remember if there was MLP implemented in HTK. I was mostly using a neural network toolkit (I think from ICSI) that just trained 3-layer MLPs. (Do people still have the source code? Somebody should put it on github.)
December 4, 2024 at 11:47 AM
Hybrid (Bourlard and Morgan, 1989) and tandem (Hermansky et al., 2000) approaches were well established already. I think the tandem approach had an upper hand, until it was convincingly displaced in 2012. (But hey, we are using HuBERT features nowadays, not very different from a tandem approach.)
December 4, 2024 at 11:47 AM
People were very skeptical, mainly because nobody knew where to even start to reproduce the results. (The results were finally reproduced years later in Kaldi.)
December 4, 2024 at 11:47 AM