Avatar

Eric Battenberg

Software Engineer

Google Research

Biography

I joined Google Research in 2017, where I am a part of the Sound Understanding team within Machine Perception. I am passionate about the potential for machine perception research to make our interactions with technology more natural and seamless, rather than distracting and addictive.

Previously, I was a Research Scientist at the Baidu Silicon Valley Artificial Intelligence Lab (SVAIL) led by Adam Coates and Andrew Ng. At Baidu, I had the privilege of contributing to Deep Speech 2, a revolutionary end-to-end neural speech recognition system. Before that, I developed algorithms for audio event detection and music mood classification at Gracenote in Emeryville, CA.

I received my PhD in Electrical Engineering and Computer Sciences from UC Berkeley, where I worked on signal processing and machine learning techniques for music and audio applications as a member of the Parallel Computing Laboratory (Par Lab). For my thesis work, I developed a system for machine understanding of drum performances.

At Berkeley, I was advised by David Wessel at the Center for New Music and Audio Technologies (CNMAT) and co-advised by Nelson Morgan at the International Computer Science Institute (ICSI).

Interests

  • Machine Perception
  • Generative Modeling
  • Deep Learning / Neural Networks
  • Speech and Language Understanding
  • Audio Signal Processing
  • Parallel Computing

Education

  • PhD in Electrical Engineering and Computer Sciences, 2012

    University of California, Berkeley

  • MS in Electrical Engineering and Computer Sciences, 2008

    University of California, Berkeley

  • BS in Electrical Engineering, 2005

    University of California, Santa Barbara

Projects

End-to-End Speech Synthesis

At Google, I am now a member of the team that brought you Tacotron, an end-to-end speech synthesis system that uses neural networks to convert text directly to audio.

End-to-End Speech Recognition

Baidu’s Deep Speech system does away with the complicated traditional speech recognition pipeline, replacing it instead with a large neural network that is trained in an end-to-end fashion to convert audio into text.

Open Source Contributions

Some projects I contributed to.

Parallel Computing for Music and Audio

As a member of UC Berkeley’s Par Lab, I did a variety of projects focused on improving the computational efficiency of music and audio applications.

Automatic Drum Understanding

Can we teach computers to listen to drum performances the way humans do? (This was the focus of my PhD thesis.)