Demystifying Accessibility: Video and Audio

March 18, 2026 | by Ray Seiden

Video and audio can be powerful tools when it comes to creating engaging and detailed learning materials. Whether you're bringing your students into an immersive narrative to explain a case study or breaking down complex science with animated graphics, there is a lot you can do when breaking into this dimension of media.

And as with any media operating in multiple dimensions, accessibility is always a more complex concern. While technology has improved dramatically when it comes to helping create things like closed captions and transcripts, it will never be able to replace human insight and review.

But to do that, we need to understand what actually makes video or audio accessible, beyond just the basics. And hopefully, this blog will do just that.

What Media Are We Talking About?

For the scope of this blog post, we’re going to be addressing the needs of : audio, video, and synchronized media, also known as multimedia.

Audio-only media: This includes things such as podcasts, songs, and interview recordings. They tend to provide information through speech, music, and sound effects, while excluding any visuals, aside from a simple thumbnail.

Video-only media: This includes video with incredibly minimal/no sound, or silent animations. They typically provide information using video, animation, and onscreen text. If any audio is included, it would be limited to basic background music that provides no additional information or context.

Synchronized media: This is what most people think about when thinking of video content, which includes a combination of both audio and visual media. It can convey information using all of the methods listed above, and is the most commonly used category (of the three) for instructional materials.

Assistive Tools

Most technology/media savvy people are familiar with the assistive tools used to make audio and video media accessible, but oftentimes, the specific differences between each type are a mystery. So before we continue, let’s introduce them all.

relay mostly spoken information in text form in real time, conveying information as the user listens/watches the media.
relay mostly spoken information translated from a different language. Of note, subtitles and captions as terms are often used interchangeably.
relay mostly spoken information in text form divorced from the original media, typically as a separate document entirely.
relay important visual information in audio form and can be done either synchronously or separately.
relay important visual information in text form and are typically read asynchronously with a screen reader or TTS.

One might note that these tools are often conveying the same information. As such, media creators can use one to develop the other, saving significant time. However, you should not fall prey to the temptation to skip one of the multiple required tools. The information is the same, but the delivery method is different.

Before we jump into how to create the above tools, I want to take a moment to address who will be using these tools.

It’s easy for people to think that they don’t need to worry about accurate captions or detailed descriptions if they do not have a student with approved accommodation needs in their classrooms, but this can’t be farther from the truth. Not only is there a case for undiagnosed or undisclosed disabilities, but it's also important to realize that these tools help everyone.

Neurodivergent students may process audio better when paired with captions. Busy students in loud environments might miss details if captions aren’t included. Students might need to engage with your learning materials in a location that prohibits extraneous sounds/audio.

While Deaf and HOH individuals, as well as visually impaired individuals, obviously deserve equal access to all course materials, they are not the only ones who will benefit from accessible content.

You never know who will need or benefit from these tools, so you should always make a point to provide them.

Captions, Subtitles, and Transcripts, Oh My!

At their core, captions and transcripts are the same thing. They are a text description of the spoken words, tone, and relevant sounds present in a video or audio file. The only real difference is in how they are displayed to the user.

For captions and subtitles, they are typically displayed at the bottom of the screen and can be embedded in the media (called open captions) or toggleable in the media player (called closed captions). Users read the text as they are listening to the video, thus making this a synchronous tool. Transcripts, on the other hand, tend to be presented as a separate text document and are read asynchronously.

Due to their similarity, one can often take their transcript file and turn it into captions relatively easily, and vice versa. But how does one get the text in the first place?

Creating Captions

Typing up everything that is said in a video/audio file can be a rather time consuming task. Fortunately, there are tools available that can make this process faster and easier. That being said, it’s important to keep in mind their limitations. As with all digital tools, human insight is a key factor in high quality captions.

For Lamar faculty and staff, the Instructional Media team suggests using the integrated Yuja features. that has improved significantly over the last few years, and captioning can be , allowing for quick review. You can also request help with captions from the Media team.

In addition to that, I’d be remiss to not mention other auto-captioning tools. Most media platforms (such as and ) have built in auto-captioning features, but there are also .

All that being said, any captions that are automatically generated must always be checked for errors. Here are some common elements that often result in inaccuracy:

Industry specific terms
Scientific names and terms
Speech interspersed with another language
Speech with a heavy accent
Fast pace or quiet speech
Low audio quality

It should also be noted that human captioning is an option. and typically charge around a dollar per minute. This option is good for long recordings or ones with multiple elements from the list above.

Descriptions

And finally, there is the matter of . As explained earlier, descriptions relay key visual or non-spoken information in either audio or text format. They can be integrated alongside captions, taking up space in the breaks between dialogue, or asynchronously in a separate document.

Audio Description

These are most useful in videos where complex or important visuals are shown alongside spoken dialogue. Videos can be paused to give the audio descriptions more time to explain these visual elements, but should not detract from the general viewing experience.

Prepare your videos for by:

Integrating descriptions in your original dialogue where possible.
Noting what visual information is most important to the intended message of your video content.
Leaving moments of silence between dialogue where needed.

Yuja offers functionality. Ask our studio team if you need guidance on using this feature.

Visual Description

These are often used for media that does not include dialogue, or where the dialogue should not or can not be interrupted. They function similarly to image descriptions, going into more detail than what is typically available with audio descriptions.

Keep these questions in mind while writing :

What sort of image is it and what is depicted in it?
Why is it relevant and have I already described it elsewhere?
What do I want viewers to take away from it?

It can be helpful to imagine that you are trying to describe the clip to a friend or colleague over the phone. We will also go over writing visual descriptions in a later blog post about Alt Text and Image Descriptions.

Conclusion

With all of that said, it’s important to remember that these accessibility tools aren’t only for compliance; they make our course content clearer and more effective for every learner.

As we continue striving toward excellence, keeping accessibility at the forefront helps us create high quality learning experiences we can truly be proud of, and reflects our commitment to student success.

Meet the Author

Ray Seiden, M.S., is our Faculty Success Designer and is responsible for designing, developing, and delivering professional development training modules aimed at enhancing faculty teaching excellence. They use their experience in Instructional Design and Graphic Design, alongside their passion for accessibility and andragogy, to create user friendly training materials and robust support programs for faculty.

Do you have a topic you want to write about in a blog post? Pending review, the CITL may host it here!

Email us your topic to start the process!

�鶹ӳ��Ӱ��

Center for Innovation in Teaching and Learning