Selected Publications

This paper describes work done at Baidu’s Silicon Valley AI Lab to train end-to-end deep recurrent neural networks for both English and Mandarin speech recognition.
ICML, 2016

Applies a deep generative sequence model to the analysis of drum patterns.
ISMIR, 2012

OpenMP and CUDA implementations of NMF to speed up drum track extraction.
ISMIR, 2009

Recent Publications

More Publications

. Uncovering Latent Style Factors for Expressive Speech Synthesis. NIPS ML4Audio Workshop, 2017.

arXiv PDF Project Poster Audio Workshop

. Exploring Neural Transducers for End-to-End Speech Recognition. ASRU, 2017.

arXiv PDF Project Ref

. Reducing Bias in Production Speech Models. arXiv, 2017.

arXiv PDF Project

. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. ICML, 2016.

arXiv PDF Project Slides Ref

. Lasagne: First Release. GitHub, 2015.

Code Project 0.1 Ref

. LibROSA: Audio and Music Signal Analysis in Python. SciPy, 2015.

PDF Code Project 0.5.0 Ref

. Scalable Multimedia Content Analysis on Parallel Platforms Using Python. TOMCCAP, 2014.

PDF Project Ref


End-to-End Speech Synthesis

At Google, I am now a member of the team that brought you Tacotron, an end-to-end speech synthesis system that uses neural networks to convert text directly to audio.

End-to-End Speech Recognition

Baidu’s Deep Speech system does away with the complicated traditional speech recognition pipeline, replacing it instead with a large neural network that is trained in an end-to-end fashion to convert audio into text.

Open Source Contributions

Some projects I contributed to.

Parallel Computing for Music and Audio

As a member of UC Berkeley’s Par Lab, I did a variety of work focused on improving the computational efficiency of music and audio applications.

Automatic Drum Understanding

Can we teach computers to listen to drum performances the way humans do? (This was my PhD thesis.)