Describes a way to learn controllable style factors for an end-to-end speech synthesis system in an unsupervised manner.
Compares CTC, RNN-Transducer, and attention-based models for end-to-end speech recognition.
At Google, I am now a member of the team that brought you Tacotron, an end-to-end speech synthesis system that uses neural networks to convert text directly to audio.
We identify and address issues that make deployment difficult for end-to-end speech recognition systems.
This paper describes work done at Baidu's Silicon Valley AI Lab to train end-to-end deep recurrent neural networks for both English and Mandarin speech recognition.
Lasagne is a lightweight library to build and train neural networks in Theano.
Baidu's Deep Speech system does away with the complicated traditional speech recognition pipeline, replacing it instead with a large neural network that is trained in an end-to-end fashion to convert audio into text.
Applies a deep generative sequence model to the analysis of drum patterns.