Christmas Hackaton: Speech Recognition and Synthesis
Every year, NewsUK hosts a Christmas Hackathon, and this year, I decided to take part and explore the possibilities of using speech recognition and speech synthesis WEB APIs. I wanted to see how they could be integrated into our projects and what kind of impact they could have.
The Idea
As part of the Newskit team, one of our goals is to keep our documentation website polished and up-to-date. To make it easy for users to find the information they need, we have an efficient text search area. Users can type in keywords and a list of related links will appear, allowing them to easily find what they’re looking for.
But I thought, wouldn’t it be even cooler if you could “ask” the website? And be taken directly to the information you need, without having to type and then scroll through a list of options. Of course, creating something great in just 2 days and doing it alone is a big challenge, but I was determined to create something cool.
The main goal for me was to play with the Speech Recognition and Speech Synthesis Web APIs and see how it can be integrated into our projects.
By using these technologies, we were able to create a solution that allows users to search for information using their voice. This not only makes the experience faster but also more accessible for users with disabilities. In this blog post, I’ll take you through the process of how I tackled this challenge and the results I achieved.
The spark
I first learned about these APIs months ago while watching an accessibility video from Apple (https://developer.apple.com/videos/play/wwdc2022/10153/). They discussed many new techniques and features for enhancing website accessibility, but the idea of using speech recognition and synthesis for web navigation stood out to me.
The Planning
For this project, I was joined by my colleague Aysha. Together, we divided the work into tasks using FigmaJam, so that we could work in parallel and stay focused on our MVP. It’s easy to waste time on non-essential features, but by using this method, everything was clear and we were able to use our limited time effectively.
Newskitta
And below here is the Vocal Search component we have built.
Have also loaded a quick demo on YouTube: https://www.youtube.com/watch?v=BaJ9pMQx0wo
What is made of: We added a new Icon Button in the navigation bar, next to the search input. Clicking on it opens a modal where Newskitta (the cute robot) greets you.
The modal’s interface contains a collapsible “How to guide”, a “microphone on/off” text to let you know when you are being listened to, a button to start speaking, and a transcript of what you said.
How it works: You click on the “How to talk” button, speak your query(examples in the guide), and Newskitta asks you if you are happy with what they captured. If you confirm, they will search and take you straight to the page you were looking for. If not, you will be prompted to try again. The experience is very smooth and allows for interaction in case of errors so you can try again.
It was really exciting to see our product come to life and respond to voice input. The Web Speech API has definitely made significant advancements in recent years and it was impressive to see it in action. It was a great experience to work with these technologies and see how they can be used to improve user experience and accessibility
API Implementation: Implementing and using the APIs was relatively easy. We installed the react-speech-kit NPM package and went through the documentation to understand the different parameters we could use, such as starting and stopping listening and controlling Newskitta’s speech. We also utilized the native Web Speech Synthesis API to personalize the voice. This allowed us to choose from different voice shades, accents, pitch, and speaking speeds. In addition to the API logic, we also had to write functions to search through our website for the correct link. Overall, it was a fun and educational experience and basically built a little intelligence to communicate with!
Challenges
Time was a major challenge, as we only had 2 days to build the experience, make it feel fluid, test it, and prepare a presentation. However, since we were familiar with our website and Newskit, it made the process easier. We had to learn about a new API, make it work in React( react-speech-kit wasn’t our first attempt!), write the script to search properly through our JSON, give user feedback and create a nice user experience
Even though the speech recognition was surprisingly good, it did not work well in every browser. When we tested it in Safari, our sentences were hardly captured correctly. However, there are other third-party services that can be used to replace the browser’s native speech API.
Finally, If I had more time:
- I would have explored voice confirmation, as currently, I need to click yes or no. I attempted to implement it but it was challenging.
- I would have expanded the script to find anything on the website.
- I would have made Newskitta more clever by suggesting search options in case the user asks for something that is not literally there but is similar.
- I would have explored different Speech API services.”
Thanks to the right alignment of many stars, we managed to get the majority of the votes and arrive first in the competition!