AI That Hears
In the last lesson you explored AI that sees. Now let's use a different sense — hearing! Your ears pick up sound and your brain turns it into meaning instantly. Teaching a computer to do the same thing took engineers many decades of work. Today it works well enough that you can talk to your phone and it will type exactly what you said. Let's find out how.
How Sound Becomes Words
Sound travels through the air as invisible waves — like ripples on a pond, but in air. A microphone in your phone or smart speaker captures those waves and turns them into a stream of numbers. That stream of numbers is what AI works with. It has been trained on thousands of hours of human speech — people saying millions of words in many different voices, accents, speeds, and background noise situations. When you speak, the AI compares your sound patterns to everything it has heard before. It figures out which sounds match which letters and words. The result: the words appear on screen, or the assistant answers you.
Speech recognition AI converts sound waves into words by finding patterns. It has heard so many examples of human speech that it can match your voice to the words you are saying — most of the time!
Voice assistants go one step further. After the AI converts your voice to words, a second AI reads those words and tries to understand what you want. If you say Play something happy, it figures out you want music, and that you want a happy-sounding song. That understanding step is a type of AI called natural language processing — natural language meaning regular human speech, and processing meaning making sense of it. Talk-to-text on a phone is simpler: it just turns your voice into typed words without needing to understand or respond. But even that simple step uses complex AI working very fast.
Voice AI also powers features that help people with disabilities. Someone who cannot type easily can speak to send a message. Someone who cannot read can hear text read aloud. AI that hears and speaks makes technology more accessible — usable by more people — which is one of its most important contributions.
Fill in the missing words about how speech recognition works.
Voice AI is not perfect. It can struggle with strong accents, background noise, or unusual words. It sometimes mishears things — especially names of people or places it has not seen much in its training. Researchers are constantly working to make it better and fairer across all the different ways people speak around the world.
With a grown-up's permission, open a talk-to-text app on a phone or tablet. Try speaking clearly and then try mumbling. Try speaking fast and then slowly. See how the AI does! Noticing when and why it makes mistakes is great scientific thinking.
What does a microphone do to sound before AI can process it?
What is natural language processing?
The Telephone Game — With Accents
- Play this game with three or more people.
- Player 1 whispers a sentence into Player 2's ear — something like The purple elephant ate seven apples.
- Player 2 whispers what they heard to Player 3. Continue until everyone has heard it.
- The last player says the sentence out loud. Compare it to the original.
- Now try again, but Player 1 uses a funny voice or speaks very fast.
- Discuss: how does this relate to why voice AI sometimes mishears words?
- Just like your ears, AI ears struggle more when things are less clear!