Andrew Song on The Robot Brains Season 2 Episode 14

 

Pieter Abbeel

The five senses, touch, sight, smell, taste and hearing. They are critical to how we perceive the world around us. Without them, we are lost. And as we grow older, they tend to weaken, particularly when it comes to sight and hearing. Today's guest, Andrew Song, is the co-founder and CEO of Whisper AI, and he's on a mission to help give people great hearing, regardless of their age. A mathematical and computer science graduate from the University of Waterloo and who is pioneering the practical application of modern artificial intelligence in hearing aids. Welcome to the show, Andrew. So great to have you here with us. 

 

Andrew Song

Thanks for having me. It's really exciting to be here. 

 

Pieter Abbeel

Yeah, so nice to have you on. I've actually known about your company for several years now. I remember, in the early days, one of your co-founders, Dwight Crow, would actually stop by OpenAI sometimes, and we would chat about what you all were building. And it's of course, it's been quite the journey since because that's that's three four years ago. But let's go even further back. Where did you grow up? 

 

Andrew Song

I grew up in, near Toronto, in Canada. In a little steel town called Hamilton. And Hamilton, when I was growing up, I liked to tell Americans it's like the Pittsburgh of Canada. A lot of steel industry. And then, you know, maybe not so much anymore. 

 

Pieter Abbeel

And how growing up there did you get excited about technology? And from there, AI? Hearing aids? What was that like? 

 

Andrew Song

You know, I think growing up where I grew up is almost hard to think about from where I am now because I get to live in the Bay Area. I live in Silicon Valley. I get to see a lot of new technology. But you know, where I grew up, that wasn't really a reality for me. One of the things that I think I was very lucky at a young age to, kind of, have two influences on me that steered me in this direction. One of them is my mother, who worked a lot with computers, and has a degree in physics. And because of what she worked on, we were able to have various computers at home, as part of her work. That I was able to tinker with. And I remember I have all these memories at a very young age when our classroom got a computer that I would be the I.T. person because I was the only person that had ever really worked with a computer in any substantive way. Right? Other than, maybe playing a game, playing solitaire or something. You know, beyond playing solitaire, I was the only one who had ever typed commands into a MS-DOS right? As someone in grade three. And I think as a child, that gave me maybe a lot of confidence. You know, an area of expertise, in so far as any eight year old has expertise. It maybe gave me an area of expertise and made me want to learn more. And, you know, a few years later, one of the big influences for me was actually Bill Gates' book called The Road Ahead. I encourage a lot of people to read it, with the memory that it was written in the mid to late '90s. And in that book. Bill Gates talks a lot about what the future will look like. And you can imagine as an eight, nine, 10 year old, 11 year old, 12 year old reading this, it really sounds like you're reading a science-fiction novel. But that was really motivating for me. That was really a vision of the future that I was excited about. But as I grew older, one of the things I was most impressed by that book and one of the reasons I revisit it, was how accurate it is. You know, it's one thing to have predictions about the future that you, you know, you talk about over a coffee or a beer or something with your friends. And then five years, you say, Oh, I was right, you cherry pick the ones that you were right about. It's really another thing, I think, to write a book saying okay, in ten years, this is what's going to happen. Here's how we're going to communicate. Here's how live video streaming is going to work. And then, you know, wake up one day as a university graduate and find that like 90 percent of the book worked out. Bill Gates has a unique view of the world, so he can see that, and so that was really inspiring.

 

Pieter Abbeel

After graduation, you worked for a while at Facebook? 

 

Andrew Song

That's right. Yeah. 

 

Pieter Abbeel

How do you make the jump from your position at Facebook to starting your own company, specifically to want to build hearing aids? What inspired you to do that? 

 

Andrew Song

You know, one of the things I really discovered about myself at Facebook is first, that I really loved working on products that help people communicate or be with other people. Be more social. I think that's one of the things that really connects a lot of the different work I did at Facebook and with hearing aids. But for me and you know, for my co-founders, for Dwight, as you mentioned and our other co-founder, Shlomo, we really see, we really each had our own individual stories about how hearing loss affected our family and affected their quality of life. And I think that's one of the first learnings that anybody who experiences hearing loss, either for themselves or through a family member or friend, someone they love. One of the things they realized is that hearing loss is actually, in some sense, it's very little about hearing, funnily enough. It's, of course, about the sound and the, you know, the decline in the hearing system and all those hearing functions. But hearing loss is, you know, connected. There's a reason hearing loss is connected to a higher feeling of loneliness or a higher feeling of stress or a higher risk of dementia. It's because of those interactions that you're not able to have when you have hearing loss. And I saw that through my own grandfather. He's a really big inspiration for me. I think about him a lot. And when I think of, I think what's really great about Whisper as a company, when you talk to individual members of our team, when you meet employees, everybody sort of has their person in their mind. For some employees, it's themselves. For some people, maybe it's a grandparent or a parent. For some people, it's a friend they knew growing up. And I think that connection is for me where I really knew that this was a great opportunity to build a better product and ultimately a great business. 

 

Pieter Abbeel

Now, I've definitely noticed it myself. I mean, if somebody has a hard time hearing, it's not just, I mean, often in one on one conversations, that kind of still works because you work on it. But in group conversations, especially, they have a really hard time actively participating or even, you know, keeping track of what's going on. Unless the group is really carefully paying attention to it. And some people won’t know. I mean, it's complicated. 

 

Andrew Song

Yes. One of the, if you don't mind me sharing one of the first stories that I still hold onto. I really remember from the early days of the company, you know, we were interviewing lots of people who had experienced hearing loss. People who used hearing aids, people who didn’t. Just trying to understand what that experience was like. And a woman shared this story with us about how hearing loss was affecting her. She was maybe mid-fifties. She had colorful hair, dyed hair, very colorful. I remember she loved having purple hearing aids. And she said that the biggest impact before she got hearing aids and when she knew she had to do something was she was out at dinner with a friend, two or three friends, two to three of her girlfriends and somebody told a joke and she didn't laugh. And she didn't laugh because she couldn't hear the joke. And the other friend laughed. So, you know, you can imagine that awkward situation, how awkward that must make you feel. And how isolating that must make you feel. But then for her, this actually went a step beyond because that friend actually didn't connect her lack of her, her not laughing to a hearing loss issue. She connected it to a personal judgment. So that friend actually became a little bit upset at her. You know, you can imagine if you tell lots of jokes and one friends not laughing, you know, you're going to get a little bit annoyed at them, after a certain point. So then she found out that this friend was upset with these other people that she was having lunch with. And now she has to get now, you know, she's already embarrassed enough. Now her friend is upset at her, she has to go talk to her friend and she has to talk about this hearing loss is a medical condition that she's trying to not admit she has and hide. And you know, we all, especially in the modern era, have different stresses that we all deal with. Imagine that on top of everything else, that you're doing to live life. I think that's an incredibly human challenge, a kind of human situation to live with. And that's why I think a lot about how our work is important now. 

 

Pieter Abbeel

Now, you and your co-founders noticed that the existing hearing aids, at the time, aren't good enough to get the full experience of proper hearing. Why did you think it was possible to change that? I mean, presumably there's a big industry that's supposedly trying to make the best possible hearing aids and have a lot of money going into those efforts. Why do you think we can actually do something different here? 

 

Andrew Song

What's funny is I think our starting point was maybe a little bit more humble. Maybe, maybe we didn't think we could. But there's a few glimmers that's maybe worth looking into. And that's maybe where deep learning and neural networks and AI, whatever the term of the day, is to describe this body of work that's taken over the world. That's where you started to see first in research. And then, you know, as we developed it more and more in reality, what was possible. And I think what really motivated us, you know, in 2016, the research, kind of, the basic science, the basic academics around how neural network models, machine learning models can be used to improve hearing. Some of the research was published. But when you talk to industry, it's, kind of, the industry insiders, the people who are responsible for making technology decisions and hearing aids, the conversation would sort of be like, Well, that's that's all very cute, you know, very, very cool. We have a far, a 20 year research arc that's looking at that technology, but that can never work in a hearing aid today. And sometimes they will talk about, maybe, the amount of processing capability that was needed. And there's no way to put more, you know, you're not going to stop a laptop to somebody's head. So that's never going to work. Other people just had a lot of skepticism about the fundamental algorithms. You know, we got into this interesting discussion once about how the height of the sound source might affect the models. Things like, okay, now that's, I'm new to this, okay, that's interesting for us to think about and see. But it was a lot of, you know, at some point when someone tells you enough people tell you things are possible, but there's a good fundamental result underneath it. You start to get a little bit skeptical. And I think that skepticism eventually turned the tide for us and said, you know, this is important. And there's a reason why people need this. And I think that there's actually a really big opportunity here. And then you go figuring out whether you can make it possible. 

 

Pieter Abbeel

How does a regular hearing aid work? What does it do? What is the device that people put in their ears? And then from there, can you help us understand how the Whisper hearing aids work? 

 

Andrew Song

Yeah, regular hearing aid, not to oversimplify too much, but in many ways a regular hearing aid is kind of like it's kind of like if you use, some people might be familiar with an equalizer. Those people who might, aren't familiar with an equalizer, one of my memorable, one of the ways I like to explain it is if you used Windows Media Player growing up, which you have to be a certain age to use, I understand. Maybe not all of the listeners have used that, but go take a look on Google. A hearing aid is basically a compressor, so it makes things louder, and it tries to make loud things, not too loud. And then all of the equalizer dials that you saw. So you know, you could say, I'm listening to classical music and it would change the dials, a little bit or you could change the dials yourself. And there's sort of that type of process inside of hearing aid. And then more advanced ones have a little bit more sophisticated noise reduction, or maybe a directional microphone system. But it's fundamentally based off of a low, low power, low compute availability system, which are running very constrained, kind of, signal processing algorithms. And that's really, that's really what's important. And the reason that's the case is we want their devices to be small. People want hearing aids to be discreet. You want a long battery life because you need to wear them all day. You know, if you think about wearing AirPods for five hours, that that feels great if you're listening to music, but it doesn't even come close if you're wearing a hearing aid and you need very, very low latency because you're getting a direct path of sound from somebody talking to you, you're seeing their mouth move and you're getting amplification all at the same time. And so the latency constraint is a lot more challenging than compared to, you know, Bluetooth headphones and watching a video because there's no direct path and things like that. So, you're really constrained around this kind of this classic, classical signal processing algorithms. 

 

Pieter Abbeel

So Andrew, if I can play this back to you, to make sure I understand, so a hearing aid, the starting point of hearing aid is actually microphones, there’s microphones built in there. Is that right? 

 

Andrew Song

Yes, there's usually one or two microphones per hearing aid. 

 

Pieter Abbeel

And then there's the little speaker in there. So you're still hearing with the hearing aid. It's just that the sound has been transformed. It's been first recorded, then passed on, transformed with different amplification for different frequencies. Is that right? 

 

Andrew Song

That's right. And those amplifications can change depending on the person's specific hearing loss through a process called fitting. And that's where the audiologist, a doctor, may come in. 

 

Pieter Abbeel

Got it. So a doctor would fit how your ear works, which frequencies you're better at hearing, worse than hearing, and then the electronics would, kind of, pre-amplify to make up for that. 

 

Andrew Song

Exactly. Bump it up a little bit at this frequency. Is it off a little bit at this frequency? Similar to what Apple is doing to your  rock music without, you know, without the, with a microphone and with the speaker, not through your, not the recorded music, of course. 

 

Pieter Abbeel

So, that seems like, you know, a fairly basic model that's not too hard to understand what's going on. But now you're saying that's also not enough. Why is that not enough? 

 

Andrew Song

The first, most important reason is because the people who wear hearing aids say that's not enough. If you look at some of the biggest challenges with hearing aids, it's following complex conversations with multiple speakers in a noisy environment. And the reason that's a complex situation. The simplified model can't really help is you don't really know what to amplify. If you're forced to make simple decisions like make every sound between 500 hertz and, you know, they say, two kilohertz louder, in a noisy environment because that's where speech is, you know, speech tends to exist or it just makes everything louder. Well you're also going to make a coffee grinder louder in that basic model. And no matter how you do that box drawing, you're going to be a little bit, you're going to be really constrained. And as a result, when we're talking, if there's someone else talking in the background in a restaurant, not only is it going to make it harder for me to hear you, but it's actually going to add to my, again my distress, my dissatisfaction. And that's what you constantly hear from people who use hearing aids. 

 

Pieter Abbeel

So they're hearing all these things amplified that they don't want to hear amplified. They just want to hear a specific person speak or a specific set of people speak, not everything else. And so naive amplification amplifies independent of who is speaking. So I mean, I can think of ways to get around this. But I'm curious, how is the Whisper hearing system getting around this?

 

Andrew Song

By using AI, of course. That's of course, the simple answer. But I think there's two problems that we really solved at a technical level. And then, you know, we can talk about modeling and all the fun stuff around AI. But going back to, you know, when we talked to industry experts, I think there were two things that we wanted to be able to do. The first is we wanted to be able to use deep learning models, in order to make an advancement on this problem. And you know, you can look at a lot of offline experiments. And so we want to enable that. Deep learning models, and I think the state of the art in terms of model compression has gone a long way. But certainly in 2016, the state of the art model compression was not very good. And so we thought, Okay, maybe we need a little bit more just to store this thing. It's a lot of numbers. It's a lot of weights sitting there, right? And the second problem is you need a lot more processing power. This kind of simple, constrained system that you've designed was never really, no one ever conceptualized something like a deep learning model. And you need to be able to add that computation power in a way that doesn't affect latency. It doesn't create a latency issue. And so that's what the problem that we were focused around that. And the way we do that is through a little pocket device called the Whisper Brain. And so when you look at the Whisper hearing system, there's traditional, the traditional earpieces, which are kind of what you think about when you probably think about a hearing aid in your mind. It looks like exactly like that. And by themselves, those are great, you know, premium hearing aids, have all the traditional technology you would want. But when the Whisper Brain is nearby in your pocket, or on a table somewhere, it's adding that, you know, superpower engine to your hearing aid with the models. With all of those capabilities, through a special kind of wireless protocol we designed in order for it to not have a latency issue. And that, sort of gives this idea of strapping a laptop to your head. Of course, in a more practical, because of the power strapping a laptop to your head without, maybe the downsides of strapping, literally strapping a laptop to your head. 

 

Pieter Abbeel

Now, back in 2016, when you started, definitely, I mean, getting deep learning slash neural net models into a hearing aid would have been completely impossible. Even a laptop was a stretch at the time, I would say. And still today, a lot of it is run on bigger machines than laptops. But here you have a special pod that runs these models. So I'm really curious what's in this pod? Is there any video? Is this somehow a Nvidia GPU in there? What do you put in there? 

 

Andrew Song

What we look for is a specialized mobile processor that would be good at handling, you know, multiply accumulates while you're doing a lot of multiply accumulates in parallel. Actually, what the Whisper Brain is, you know, for those of your listeners who haven't seen it, it's palm sized, maybe a little bit smaller than a deck of cards, roughly about between the size of a car key and a deck of cards, maybe. And what I like to tell people is inside there, there's really three things. There's a radio so that the wireless communication can work on radio and then an antenna. There's a huge processor, kind of like a, you know, maybe like a 2017's mid-grade Android phone processor about equivalent, in order to be able to run all of this deep learning. And there's a battery, you know, most of it is actually a battery to power all of this and to keep it alive and to again, get the battery, you need all day battery life to be able to do all this stuff. So from that point, it's actually quite simple. But how you integrate that into the hearing aid is, I think, where we have to spend a lot of time to get it right. 

 

Pieter Abbeel

Now, this pod that you carry around or sits in your pocket, that's largely battery, apparently volume wise. But of course, the magic is in the processing and the low latency radio connection. It's using deep learning, as you've alluded to. Now, you can't just put some, you know, deep learning into a pod and assume it works. You need to somehow train it right? And to have some version of that somehow, there is some input output pattern that you think, you know, you can leverage to get a better, effectively, a better amplifier system than just a frequency amplification and the amplification rate. And so what goes into that? What are you leveraging in terms of data to learn from? 

 

Andrew Song

I would say we're still on this journey. Anyone who works closely, you know deeply in the AI knows that data is the lifeblood of any AI problem. And we certainly believe that very early on. You know, when we started very naively, we used what every researcher used, which is kind of the publicly available research data sets. And that got us maybe a few months of progress. But one of the things that we learned is that those data sets, the primary one that a lot of people use is The Wall Street Journal. There's a Wall Street Journal 2MIX, that that a lot of folks use in this problem space. What you find is that that's like a lot of research data sets. It's oversimplified. You know, those were voices recorded in a quiet anechoic with no echo, no reverberation environment, laid on top of each other. And the real word, it's just a lot more complicated than that. So very quickly we were building models that had showed great performance, but what we knew wouldn't extend into the problem space that we had. And so we had to be a little bit more creative in how we approached data and training. 

 

Pieter Abbeel

So I'm trying to imagine this dataset. I mean, I'm familiar with vision data sets and speech recognition datasets, but I imagine you're not doing speech recognition here because you don't want to transcribe into text. You want to do something a little different. What exactly do you want to do? What's the input like and what's the desired output? 

 

Andrew Song

Yeah. So the input is usually audio. Audio can be many different forms. In some models, it's wave forms. And some forms, it's maybe features of audio, like FFT of the audio. And what's great when you take the FFT of the audio, you almost have a vision problem. You have a vision problem where you can't see the whole, you know, you have to be very careful if you want to use it in a real time system. But you get a little bit closer to a vision problem, which is actually pretty nice. And so that's the input of the problem, you know, for the Wall Street Journal MIX, for example, it's that data set is literally people reading The Wall Street Journal. It's a very old data set. And so different voices, men, women, older, younger, different inflections. And they still read sentences like, the FED cut interest rates by five basis points in 2019. It was like that. You know, that's what you're listening to all day and then you mix that on top of each other. And then for a lot of the early models we did, we took the, you know, you're taking the free transfer of those and passing in the kind of magnitude data that features into your model. And what you're getting out of that can be very different. And that's where maybe we've innovated the most over time. I think in the most basic construct of the model, what you're really trying to do is get almost like an image mask, you know? So if your input is an FFT of audio, what you're looking to try to get out is an image mask. A highlighting where the important areas of sound are. Where you want to focus your amplification and where you want to maybe not amplify as much. And you know, you can steer that model. You can steer that problem based on what data you give it, in whatever way makes sense. So we tend to, our data sets are very speech focused because most people, when they're in a noisy place, they want to hear the person talking to them, right? But that's not always true. And you know, you can think of exceptions to those cases. Or you can imagine if you're a bird watching, you know, suddenly you care more about birds than humans, right? All of these different things. I'm very excited about the kind of the longer range of this problem and why there's so much work to do, but that's kind of the basic setup of the problem. 

 

Pieter Abbeel

And so the way I'm interpreting this is the data set is very complex because multiple voices are overlaid and other sounds and voices could be overlaid. And then out, you want something clean. You want to hear just Andrew or just Pieter or something like that? Is it fair to say that then, the neural net will learn to split this sounds effectively into each speaker, but then the person listening, do they have some kind of clicker? And they say, Okay, they click through and then they lock on to a specific speaker? Or how does that work? And how does that then work if there's multiple people that alternate speaking? I'm curious. 

 

Andrew Song

Yeah. So the way you know, the way our models are set up, given our problem space is that we really want to focus on the kind of the focus speaker, the primary speaker. And primary is ultimately defined by what the person wants to listen to. So usually that tends to be if you think about, just take away hearing, it's just how humans think about it. It tends to be the loudest person, right? That's actually a good proxy in a lot of situations. If we're talking, you're closer to me so that the level of my ears, you're louder than all of the background noise. And if we're at Starbucks and someone yells out a coffee order, you know, so David, they yell really loudly, maybe I don't want to hear that, but that speech is designed to be heard, right? And if you have good hearing, you're going to hear that and your mind is momentarily going to focus on that and then come back to the conversation. Now there's a lot more kind of feature work and data work you do. But as a starting, you know, for simplifying purposes, you can think about it like that. And because this person, you know, the user of our product has hearing loss, we still need to be able to make adjustments based on their frequency, to hearing loss, based on the frequencies, based on a lot of these other factors. These core algorithms that are in the ear pieces that are nothing to do with the AI, necessarily. So there's a kind of post-processing step that all of the audio goes through. So you can almost think of the model as giving, outputting features that go into a larger audio system that's being adjusted. I think one of the things that is really unique about our problem that makes it maybe a little bit more complicated than just thinking about noise reduction or removing or clicking through voices is actually that people want background noise, some level of background noise, they just don't want it to be drowning out the speaker, right? Background noise is really important because it tells you where you are. If you're in a Starbucks and you don't hear any of the background noise and it's just a person speaking. You actually get your brain very, you get a headache. You start to feel nauseous. Your brain is not happy about that situation. And I know that because we've tried that situation before, it's very nauseating. So, you know, background noise also has a safety component to it too, right. If there's a siren going off, that's background noise in some sense, but you want to know that that's there. And so often we're not just trying to remove noise and clicking through different speakers. We're actually trying to balance that. Given all of the other hearing loss factors that somebody has said that makes a problem maybe even more challenging than, you know, the kind of the the research model, model research scope, that researchers tend to look at. 



 

Pieter Abbeel

Now, as a startup, you start from nothing, when you start out and you’ve got to start building, right? And I mean, how do you get started? How do you start trading with people who try it out? I mean, how do you measure progress, their experiences and tie that up back into your own work? 

 

Andrew Song

Yeah, that is something that we've changed a lot over the time as we've matured as a company too. You know, to give you a picture of what it was like early on. We were, you know, a very small company and mostly building offline models because we didn't have a hearing aid hardware yet. You know, we're there's a whole other parallel track of development that was doing that. And so we did a lot of offline evaluation. We were able to take some of our models, eventually got sophisticated enough that we could record the output of hearing aid, a premium hearing aid in the same situation. We could record and then we could go to a cafe or a park in San Francisco, where we are. And try to record some of that ambient environment, have a conversation and then bring that result back, play to the hearing aid, play it through our models and then do offline evaluation with expert lists, sometimes with expert listeners, sometimes with non expert listeners. You know, there's this kind of crowd labeling platforms, that we would try. Sometimes within ourselves and do something, maybe something as simple as a PESQ score, which is a PESQ score, which is kind of a metric that's used in VoIP and or just have people rate them, you know, on a scale of one to five. 

 

Pieter Abbeel

That reminds me of some recent work that is maybe not directly applicable yet, but there's some work on learning with human feedback. So rather than unsupervised learning a human would rate the quality, for example, of OpenAI to the text summarization where a human would raise the quality of summarization. And it seems like some of that could at some point make its way into the hearing aids too, where a human would rate different processing modalities that are based on different weights and neural network based on what they're getting and over time. Figure out an even better objective for that specific user, what they want to hear. 

 

Andrew Song

Yes, and that's sort of, that's one of the exciting areas that we're starting to explore now. You know, we've come a long way since that initial step. Now we do. Now we have expert listeners in house. We have a research department that works with people who use hearing aids and who use our hearing aids, and we're able to try different scenarios on the device or offline, in a sound booth. And ultimately, that's where I think a lot of the future of hearing is, is being able to find an individual's path through their auditory system, right? And ultimately program the hearing aid and develop the models so that it supports the needs of that individual person. Because, you know, just like how the brain processes sound on an individual level is also very different. And that's ultimately really the goal of the hearing aid. It's to give the brain the type of information it needs in a way that's easier to process given the compromised auditory system so that you can get all of the information, you know, about the world just like you and I might. 

 

Pieter Abbeel

Now, I'm kind of curious when people have hearing aids or have trouble hearing, right, what you talk about on the one hand, equalizers slash amplifiers, right? And the other hand, the Whisper hearing system, which does quite complex source, looking at different sources in the sound and amplifying them differently, giving different emphasis and so forth. Now in the human brain, when I hear something, the sound comes in, and then I have to do all that processing myself. And it makes me wonder, just from a medical point of view, when people have trouble hearing, to what extent is it, kind of, a physical thing? Versus is it a processing thing? Is that well understood? Of course, they're intertwined. Processing is physical.

 

Andrew Song

In 10 years. We'll look back and we'll realize it wasn't well understood. How about that’s the right way to say it? Hearing loss certainly has a very physical component, you know, hearing it's literally, you know, micro hairs on your cochlea, moving back and forth and responding to different frequencies. And hearing loss is often that physiological process being disrupted. But of course, that disruption can have downstream effects on that sound processing. And so, you know, I think a real vision of a hearing aid is not just about sound and adjusting these sources and adjusting this scope, but almost being a counteracting force to this disruptive physiological issue. So that your, the processing in your brain can still do what it needs to do. Because ultimately, you know, the brain is probably the most powerful computer and in some regards, the most powerful computer that we all have. 

 

Pieter Abbeel

And you want to keep leveraging, of course, the brain as much as possible. Now I'm curious, from my perspective, it certainly I'm always leaning towards AI will likely be the solution because it seems like more data, more compute and over time, things will only get better. But at the same time, as you're building this product, I wonder to which extent there are also other things that you spend a lot of time on. I mean, I could imagine you might have thought about microphone arrays that maybe, you know, you turn your head and you're kind of specifically, thanks to the array of microphones, by turning your head, you're zoning in on a specific sound somewhere. I mean, maybe, maybe that's too crazy. That's not possible with sound, but I'm curious, on the electronic side, aside from the AI, what are some things you thought about hard? 

 

Andrew Song

Yeah, there's I think, everything related to the sound system, the audio system with the product is super, super important. Down to, you know, I’ll give you a problem related to sound that probably no one has really thought of on this podcast before, which is a microphone ceiling. You know, when you have a microphone, it has to get attached to a circuit board. It's an electronic microphone. And it has to be sealed because if it's not sealed, you know, sound can seep into various areas and you won't get the actual signal properly. And there's a whole faction process behind us, and then you have to ask yourself, well, how do you even test that seal? Is that correct? How do you know if that seal is correct? You know, and that has a big impact on the results of the whole sound system, but also the machine learning. Eventually each hearing aid is an input source that's coming into this model. And imagine there's this like a random bias that's being added to each hearing aid in a way that you can't control, which is manufacturing, right? You want to minimize that bias as much as possible, even just so that your models, you know, you want your models to be robust to that, but at some point you just need better inputs. That is, I think, you know what a problem that we spent a very, very long time thinking about in our office prototyping and then testing a lot in our manufacturing, in our factory because we knew it had such a big effect on the AI, right? Of course, you know, how we design our microphone arrays is really, really important because of, like you said, it can steer kind of where the sound is coming from. It, again, has a really big impact on what the inputs look like. You know, whether they're pointing forward, whether they're pointing sideways, whether one is pointing forward or pointing sideways. All of these things. And so those are some of the more, I think, like hardware oriented problems. I think the other big problem that we spent a lot of time figuring out is around the integration of all of this. You know, it's one thing to get a processor, put a battery on it and run some models and, you know, run that on a loop. Okay, that's great. But you need those results to get back to the earpieces and process the sound in under six milliseconds roundtrip from when the sound first hits the microphone. So six milliseconds to do all of that. Yeah, and just to give a benchmark for many folks, typically, the latency on a Bluetooth link in one direction to send data from over Bluetooth in one direction is somewhere, you know, at the fastest, maybe like 10 milliseconds. So just to send the data using Bluetooth, if you use Bluetooth as your protocol in one direction, never mind back, you have to get it back at some point somehow too, build another time and focus on that problem. You just send it forward. You know, already busts your latency requirements. And so that's where a lot of, that goes into the design of the communication system that goes into the hardware design because you need to be finding low latency ways of getting data from point A to point B within the chips. And then you need it to be really robust. So you have to focus a lot on antennas and a lot of you know, those types of things. So that's another, you know, there's like Ph.Ds of work there that we had to, you know, ask friends about and call, call in favors and understand in order to make this work. 

 

Pieter Abbeel

Well, now what if somebody wants to try out the Whisper hearing system? What should I do? 

 

Andrew Song

Well, it's very easy. They can go to our website. We work with hearing doctors, typically audiologists, they're called, around the country. And they can go to our website and there's a little form they can fill out and we'll get in contact with them, refer them and help them schedule an appointment with their nearest doctor. And the reason we do that is so that first, we want to make sure they have a really good baseline understanding of their hearing because again, that hearing aid is not a one size fits all device. It's programmed and personalized to an individual person's needs. And then that doctor is going to be able to explain, you know, how to use the product if, as you use it, maybe you're noticing certain aspects of the sound can be adjusted for you, as you get used to it, as you adapt to it. And that's what that doctor's there for. But we have doctors all over the country now. And so our website is Whisper.ai. 

 

Pieter Abbeel

How affordable is it?

 

Andrew Song

So our hearing aids are a little bit unique in that we offer them on a monthly plan. And one of the reasons we do that is we actually produce software upgrades to the hearing aids. You know what, we'll call them, new model updates, new model architectures and new weights. And we do that because we get a lot of data so we can over time, you know, over a year or so that we have this hearing aid, it’s much, we've improved the models and we've improved the sound system of the hearing aid maybe four or five times already in production. And so our hearing aid right now is one hundred and thirty-nine dollars a month. It's a three year term. And that makes it a little bit more affordable for people. So they're not spending thousands of dollars upfront, which is what a typical hearing aid would ask of folks.
 

Pieter Abbeel

And is it typically covered with health insurance? 


Andrew Song

There are certain health insurances that do cover it. For example, if you're a veteran that gets care to the Veterans Administration, there's often hearing aid coverage there. There are certain employers who have special plans that do hearing aids. Unfortunately, hearing aid coverage is not as broad as one would hope, given the impact that it has on so many other facets of life. You know, even just the risk of dementia, reducing the risk of dementia alone would be worth it. But I think that's an area that I'm personally really passionate about and hope to see some change over time. 

 

Pieter Abbeel

Yeah, it seems like some advocacy is needed there. Maybe a foundation that people can donate to, to help other people acquiring the right hearing aids, themselves.  


Andrew Song

Yeah. For sure, yeah, it has such a big, big impact on people's lives and people's others health outcomes, too. You know, imagine if you're an older person and you have hearing loss and you don't have a hearing aid. If your doctor tells you what you're supposed to do about your health, you're not even going to be able to, on some basic level, you can't even hear that. So I think the, you know, there's a human argument, the human arguments maybe easy to understand. But I think the economic argument is also easy to reveal. And I hope, one of the hopes I have is that Whisper can maybe help reveal that argument and make some progress there as well. 

 

Pieter Abbeel

Talking about, let's say, maybe older people looking for care, but also older people often looking to talk with their grandchildren on video calls and so forth. And I'm curious with video call, maybe things are a little bit easier and you can achieve even higher quality. Is that right? Because you kind of know that the sound is coming from that other device? 


Andrew Song

It's sort of a mixed bag, I think, for people. Certainly, when you know where the source is coming from, you can do a lot, a lot more to target the sound, you know, up until just wearing headphones and making them really loud. You know, that's not a, that's not a great long term solution, but if you're on a video call, you know that gets you, gets you a certain way, it gets you a certain way. But I think the impact of COVID in general has been probably at present pretty challenging for people with hearing loss just because of masks alone. I've gone through the experience where I'm talking with someone and they're speaking at their regular volume. The sound is at the normal volume I’m hearing at, but I can't lip read them. So it's actually harder for me to understand what they're saying. And that's maybe only a small, small picture of what it's like to have hearing loss. And so, you know, having masks has been a real big challenge for people with hearing loss. 

 

Pieter Abbeel

Is there a favorite customer or patient story that, you know, are you willing to share? 


Andrew Song

We get so much great feedback. I think it's one of the most motivating and one of the best parts about working on this problem and, you know, working on working at Whisper. You know, the one that stays with me, that I'll share, I visited a clinic one day and there's an older gentleman and his son who had visited to try some hearing aids. And he tried lots of hearing aids and ultimately didn't like work, didn't buy any of them over the many years in which he tried them because he didn't really, didn't feel like they were helping. You know, hearing aids are not an inexpensive purchase. And, you know, the doctor, I happened to be there that day, but the doctor told him a little bit about Whisper and, you know, use the word AI, you know, all of this kind of tech things and you know, this patient sort of like, you know, I don't care about AI, I don't care about this. Just like, can I try the thing, you know, I don't care that it has this. I don't care about anything that you're saying, actually, nothing you're saying matters. And he tried them and his reaction, sort of, the reaction was instant in that what he noticed the difference was. And so much so I think his reaction was one thing, but the reaction of his son that brought him was what I really remember because for the son, he had never seen his father be able to engage in that way. You know, his father is used to operating in a certain context in a certain world. And so now he can hear and he think, Oh, like, Oh, someone's talking over there? That's interesting, okay. But for the rest of us, we knew that person was talking over there and this person and his father just was never paying attention to it. And all of a sudden, his father was more engaged. Was kind of, you know, following around things, and was talking at a more normal volume because you can also hear yourself a little bit better. That's another impact that it has. And his son just started crying in the room, in that instant. And that's, I think, a very powerful thing where you can really see how it's going to make a really meaningful impact on somebody's lives. 

 

Pieter Abbeel

What a beautiful story. Now, Andrew, I must ask. I mean, early on, you mentioned your grandfather as some of your inspiration. Does he wear Whisper hearing aids? 


Andrew Song

He does, though he doesn't actually wear the final version. So it's a funny story. My grandfather lives in China, where I was actually originally born, and the last time I visited China. I was actually visiting to go to the manufacturing, had pre-production hearing aids and gave him a set. And they're not nearly as good as what we have now, funnily enough. But he has them, and he does use them. But then, as your listeners probably know, it's not so easy to visit China, in the past 18 months. So since we've launched, I actually haven't been able to go and see him. I'd love to do that, and it's one of the things that I hope I get to do soon in order to give him a pair. 

 

Pieter Abbeel

Now we've talked a lot about hearing. Of course, I mean, really, really important. And that's what you spend most of your time on. But I can't imagine you're also thinking more broadly at times. And I'm curious, what are some other places in healthcare where you see a very big impact potential for AI? 


Andrew Song

Yeah, healthcare is a huge field, and I don't, I certainly don't claim to be an expert in any one of them, but I always get more excited about the very practical impacts that AI can have. You know, the areas and where AI can, where you don't even know necessarily that AI, you have to market it as AI. It's just bringing some good to the world to, let's say. And certainly, I think where the kind of intersection of computer vision and health is a big area of just bringing robustness to the healthcare system. I think, for example, if somebody has if you're looking at an X-ray and or you're looking at an MRI or some sort of scan and trying to diagnose something, just having AI be able to support a doctor's decision or improve the decision making of a medical team. I think that's amazing work and has a huge, measurable effect in the quality of life of so many people today and reduces errors and all of the other things that our healthcare system is really concerned about. It's not, you know, you notice I didn't describe that as a vision where, like you send pictures and the AI is your doctor, and decides all of your health care outcomes, you know, certainly with enough data, I think we can we can reach maybe a vision or get a lot closer to that vision. But that, you know, I think that vision sometimes ignores all of the important steps along the way that have real, meaningful change to people's day to day lives. I also think there's a lot of healthcare problems like hearing or personalization and kind of, an individual's care journey is going to be very, very different than the average person's care journey, right? And you know, a mathematical way to say that is, you know, the variance of given a set people with the same set of inputs the correct output has high variance, right? Hearing is one of those people, one of those things. If you have two people with the same measured hearing loss, the actual hearing aid settings that are best for that person are all over the map. You know, the prescription is just a starting point. It's not an ending point. Unlike glasses, let's say. The prescription is the ending point in glasses. And I think of those problems AI is very well suited to help optimize and map that journey to this very complex space, you know, with human cognition in the way and human preferences, all of these things, like you were mentioning earlier. And I think that's going to be a big area where we can kind of as a healthcare system, will be able to move away from saying, well, we averaged 900 data points so that when you have this measure, when you have this problem, we know on average this is the best results that will give you this, will recommend this result. We can actually take a lot of those individual factors and give you a much more personalized result. And I think that's going to support healthcare and healthcare outcomes. And, you know, in a way that wasn't possible before. 

 

Pieter Abbeel

Well, that will be a beautiful future once we get there. Well, Andrew, thanks so much for joining us. This was a really great conversation. Thank you. 


Andrew Song

Yeah, I had a lot of fun. Thanks for having me on. Can't wait, can't wait to listen to it. You know, when it's all done, I really appreciate you inviting me on.