Over the past decade, podcasts have experienced an explosive surge in popularity. What was once a niche medium has now become a cultural phenomenon, with new podcasts seemingly popping up on every conceivable topic – from advice and celebrity interviews to true crime and UX design. The podcast landscape now boasts over 5 million podcasts and a staggering 70 million individual episodes, all competing for the attention of approximately 464.7 million listeners in 2023.
The podcast surge shows no signs of slowing down. Projections indicate growth rates of 4-5% in North America and Western Europe, with even greater increases of around 7.8% forecasted for Latin America. Podcasts have managed to captivate audiences, particularly in our fast-paced world where traditional book reading is on the decline. Some argue that podcasts have made information more accessible and engaging in the midst of our bustling lives. Regardless, the importance of accessibility remains a paramount concern.
While podcasts may appear accessible to people with visual impairments, a significant portion of the population stands to benefit from enhanced accessibility – specifically the Deaf and hard of hearing. In Canada, around 10% of the population is either Deaf or hard of hearing, and an even larger portion of adults experience mild hearing loss. Beyond the physical challenges, these individuals often encounter economic and social barriers, sometimes unaware of their disability due to the gradual nature of hearing loss.
In the United States, the National Association of the Deaf (NAD) has been a longstanding advocate for closed captions on television and visual content. Since 1996, US TV content has been required to feature accurate captions. In Canada, the Canadian Association of the Deaf played a pivotal role in pushing for closed captioning regulations enforced by the Canadian Radio-television and Telecommunications Commission (CRTC). Consequently, Canadian broadcast networks started offering closed captioned programs and live captioning.
The challenge has now extended its reach to encompass other mediums – streaming TV, social media apps, and most recently, podcasts. Streaming platforms like Netflix have managed to circumvent the regulations that traditional TV adheres to, leveraging their distinct classification as a separate entity from television. It wasn't until 2014 that Netflix committed to implementing captions across its entire content library, albeit after grappling with a formidable legal battle against the Disabilities Act and NAD. Eventually, they reached a resolution.
However, Netflix isn't the sole player grappling with the adoption of closed captions. YouTube, akin to Netflix, operates beyond the purview of the Federal Communications Commission (FCC), leading to a gradual integration of captions. Their strategy has largely involved relying on software to auto-generate captions, an approach necessitated by the vast volume of videos hosted on the platform. The sheer scale makes it practically unfeasible to manually add captions to each video. Thus, YouTube has leaned on a collaborative effort involving crowdsourcing, volunteers, and fervent enthusiasts over the years. However, this approach has not been without its shortcomings. Pranksters uploading false captions have introduced inaccuracies into the system, prompting content creators to increasingly turn to the expertise of paid transcribers or specialized captioning companies.
At present, there are no established regulations for podcast transcriptions. In 2021, the NAD filed a lawsuit against SiriusXM for not providing transcripts, a case that remained unresolved as of August 29, 2022. While exact statistics are scarce, it's estimated that only 1% of podcasts have been transcribed. This leaves countless episodes beyond the reach of the Deaf and hard of hearing. Media giants like Vox and NPR offer transcripts for their prominent podcasts, but surprisingly, many popular podcasts still lack official transcriptions.
Transcriptions offer benefits that extend beyond mere accessibility. They can be repurposed into various formats such as blog content, e-books, guides, infographics, and email newsletters. Here's a consolidated list of these advantages:
Platforms like HappyScribe offer software-generated transcripts, albeit with some errors. Even with the paid version at $12 USD per month for 300 minutes of audio transcription, the AI software guarantees only 85% accuracy. To attain 99% accuracy, you'd need to invest $1.75 or more per minute for a professional human transcription service. Regrettably, this pricing structure renders transcription services financially out of reach for many small content creators.
As of late 2023, podcast transcription can be achieved through Word Online's transcribe feature. While requiring an Office subscription, this option emerges as one of the most budget-friendly choices for AI-generated transcription with speaker diarization (particularly useful for multiple speakers). An alternative route involves uploading the podcast as a video format on YouTube, leveraging its auto-generating captions feature. This approach demands a few additional steps. In both cases, manual error correction is still necessary, making these methods the most cost-effective means of transcribing audio content.
Automatic speech recognition (ASR) technology is progressing rapidly toward achieving precise audio transcriptions. Leading companies like Google, IBM, and Microsoft have launched various ASR software solutions, though some remain costly. OpenAI, the minds behind ChatGPT, has taken an intriguing route with Whisper – a free, open-source ASR solution. However, utilizing Whisper effectively requires a degree of technical proficiency due to its intricacies.
As comprehensive language models continue to evolve, the path toward more accurate and accessible transcriptions becomes clearer. Simultaneously, the urgency of legislative interventions for ensuring accessibility gains prominence. Historical precedents underscore the potential delays in companies embracing these inclusive measures. Consequently, regulations and data privacy provisions should encompass AI software companies. This comprehensive approach will safeguard the security, privacy, and transparency of voice models and data in the AI landscape.
Podcasts must be made accessible. Advocacy efforts have propelled transcription mandates from television to visual content, and it's time for podcasts to follow suit. Accessibility isn't just beneficial for the audience; it aids content creators in generating diverse content, enhancing their online presence. As advancements continue in AI, the hope is for transcription barriers to become a thing of the past.