Deep Learning

Uncovering Latent Style Factors for Expressive Speech Synthesis

Describes a way to learn controllable style factors for an end-to-end speech synthesis system in an unsupervised manner.

Exploring Neural Transducers for End-to-End Speech Recognition

Compares CTC, RNN-Transducer, and attention-based models for end-to-end speech recognition.

End-to-End Speech Synthesis

At Google, I am a member of the team that brought you Tacotron, an end-to-end speech synthesis system that uses neural networks to convert text directly to audio.

Reducing Bias in Production Speech Models

We identify and address issues that make deployment difficult for end-to-end speech recognition systems.

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

This paper describes work done at Baidu's Silicon Valley AI Lab to train end-to-end deep recurrent neural networks for both English and Mandarin speech recognition.

Lasagne: First Release

Lasagne is a lightweight library to build and train neural networks in Theano.

Baidu's Deep Speech system does away with the complicated traditional speech recognition pipeline, replacing it instead with a large neural network that is trained in an end-to-end fashion to convert audio into text.

Analyzing Drum Patterns Using Conditional Deep Belief Networks

Applies a deep generative sequence model to the analysis of drum patterns.