Selected Publications

This work adds a prosody encoder to the Tacotron text-to-speech system that enables the reproduction of the intonation, stress, and rhythm of any spoken utterance.
ICML, 2018

This paper describes work done at Baidu’s Silicon Valley AI Lab to train end-to-end deep recurrent neural networks for both English and Mandarin speech recognition.
ICML, 2016

Applies a deep generative sequence model to the analysis of drum patterns.
ISMIR, 2012

OpenMP and CUDA implementations of NMF to speed up drum track extraction.
ISMIR, 2009

Recent Publications

More Publications

. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron. ICML, 2018.

Preprint PDF Project Poster Slides Source Audio Examples Blog Post

. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis. ICML, 2018.

Preprint PDF Project Source Audio Examples Blog Post

. Uncovering Latent Style Factors for Expressive Speech Synthesis. NIPS ML4Audio Workshop, 2017.

Preprint PDF Project Poster Audio Examples Workshop

. Exploring Neural Transducers for End-to-End Speech Recognition. ASRU, 2017.

Preprint PDF Project Source

. Reducing Bias in Production Speech Models. arXiv, 2017.

Preprint PDF Project

. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. ICML, 2016.

Preprint PDF Project Slides Source

. Lasagne: First Release. GitHub, 2015.

Code Project 0.1 Ref

. LibROSA: Audio and Music Signal Analysis in Python. SciPy, 2015.

PDF Code Project 0.5.0 Ref

. Scalable Multimedia Content Analysis on Parallel Platforms Using Python. TOMCCAP, 2014.

PDF Project Source

Projects

End-to-End Speech Synthesis

At Google, I am now a member of the team that brought you Tacotron, an end-to-end speech synthesis system that uses neural networks to convert text directly to audio.

End-to-End Speech Recognition

Baidu’s Deep Speech system does away with the complicated traditional speech recognition pipeline, replacing it instead with a large neural network that is trained in an end-to-end fashion to convert audio into text.

Open Source Contributions

Some projects I contributed to.

Parallel Computing for Music and Audio

As a member of UC Berkeley’s Par Lab, I did a variety of projects focused on improving the computational efficiency of music and audio applications.

Automatic Drum Understanding

Can we teach computers to listen to drum performances the way humans do? (This was my PhD thesis.)