How does MangoFinch handle real-time multilingual transcription?

When someone speaks, MangoFinch turns their words into text and translates it for everyone else — all in real time with almost no delay.

How many languages does MangoFinch support?

Over 36 languages, and it figures out which one you're speaking automatically.

Do I need to install anything to use MangoFinch?

Nope. It works right in your browser — just open the site, make a room, and share the code.

What is the difference between the Free and Pro plans?

Free gets you 5 meetings a day (2 people, 30 min each). Pro at $29/month gives you unlimited meetings, up to 10 people, longer sessions, recordings, and AI summaries.

Is my meeting data secure and private?

Yes. Everything is encrypted and audio isn't stored unless you choose to record.

Back to blog

Education & Accessibility

Building accessible meetings for deaf and hard-of-hearing participants

Real-time captions are a legal requirement and a practical one. Here is what actually works for DHH participants in multilingual meetings, where auto-captions from Zoom and Teams fall apart.

MangoFinch TeamApril 25, 20266 min read

One in eight Americans over the age of 12 has hearing loss in both ears. That is roughly 30 million people. In a 50-person company, statistically three or four employees are navigating meetings with some degree of hearing difficulty, and most of them have never told HR.

This is not an edge case. It is a baseline reality that meeting software mostly ignores.

What the law actually requires

The ADA does not use the word "captions." It requires effective communication, which courts have interpreted to include real-time text access for deaf and hard-of-hearing employees and customers. Section 508 of the Rehabilitation Act is more specific for federal agencies: electronic communications must be accessible, and that means captions or transcripts for audio content.

WCAG 2.1, the web accessibility standard that most companies reference in their compliance policies, sets a 99% accuracy threshold for pre-recorded captions (Level AA). For live content, the standard is less precise — it says "best available" — but research from Gallaudet University shows that caption accuracy below 96% causes DHH readers to lose the thread of a conversation. Below 90%, they report the captions are actively misleading.

That 96% number matters. Hold on to it.

Where auto-captions actually land

We tested four platforms in January 2026 with a 45-minute multilingual meeting recording — three speakers rotating between English, Spanish, and Mandarin.

Zoom's auto-captions hit 91% word accuracy on the English segments. Not bad. On the Spanish segments, accuracy dropped to 79%. The Mandarin was not captioned at all because Zoom requires you to select a single language before the meeting starts. For a DHH participant who needed to follow the full conversation, two-thirds of the meeting was either garbled or missing.

Microsoft Teams performed slightly better on English (93%) but had the same single-language limitation. Google Meet offers real-time translation captions as of late 2025, but only for specific language pairs, and the latency runs 4-6 seconds, which makes the captions arrive after the visual cue of someone else starting to speak.

The pattern is consistent: these tools were built for monolingual English meetings and extended to other languages as an afterthought. For a DHH participant in a multilingual meeting, the captions are unreliable at exactly the moments they matter most — when someone switches languages.

Why multilingual captions are harder for DHH users specifically

A hearing participant in a multilingual meeting has fallback information. They can hear tone, pacing, emphasis. They can tell when someone is asking a question versus making a statement, even in a language they do not speak. They catch names and numbers from the audio even when the captions fail.

A DHH participant has none of that. The captions are not supplementary. They are the entire signal. When the captions read "the team should focus on the princess of the project" (an actual mistranscription of "the principal of the project" from one of our test recordings), a hearing person catches the error from context. A DHH person has no way to know that is wrong.

This is why accuracy thresholds that seem acceptable for hearing users — 85%, 90% — are functionally broken for DHH participants. Every error in the caption stream is an error in comprehension, with no secondary channel to correct it.

How per-language transcription changes the equation

MangoFinch runs a speech engine with per-segment language detection. Instead of committing to one language for the entire meeting, the engine evaluates each phrase independently and routes it to the appropriate language model.

For DHH accessibility, this matters in a specific way: the English captions are always generated by the English model, even when surrounding speech is in another language. The tool does not try to force Spanish audio through an English recognizer, which is where most of the worst caption errors come from.

In our testing, English caption accuracy in multilingual meetings runs 95.2% on MangoFinch versus 91% on Zoom and 93% on Teams. That 2-4 percentage point gap sounds small. For a 60-minute meeting with roughly 9,000 words of English speech, it is the difference between 450 errors and 270 errors. For a DHH participant reading every word, 180 fewer errors is the difference between following the meeting and guessing at it.

The non-English segments get transcribed accurately in their own language and translated inline. A DHH participant who reads English sees the English translation appear below each non-English segment within 1.4 seconds. They do not miss the Spanish sidebar or the Mandarin clarification — they get it in text, in their language, fast enough to stay in the conversation.

Font size and contrast are not optional

Accurate captions that nobody can read are not accessible. This sounds obvious, but most meeting platforms default to caption styling that fails basic readability tests.

We spent three weeks on caption rendering during our beta, working with four DHH testers. The findings were specific:

**Font size.** Our initial default was 16px, which matches most web body text. Two of our testers immediately asked for larger text. We moved the default to 20px and added a range from 16px to 32px. The most-used setting among our DHH beta testers is 24px.

**Contrast.** White text on a semi-transparent dark background tested better than every other combination. We use #FFFFFF on rgba(0, 0, 0, 0.82), which produces a contrast ratio of 15.3:1 — well above WCAG's 4.5:1 minimum. One tester pointed out that colored backgrounds (the blue tint that Teams uses, the gray that Zoom uses) cause eye strain after 30 minutes of continuous reading. We made the background pure black with adjustable opacity.

**Line spacing.** Default line height of 1.5 was too tight for continuous reading. We moved to 1.7. Small change, large impact on readability during hour-long meetings.

**Speaker labels.** Every caption line is prefixed with the speaker name in a different weight (semibold name, regular text). This was the single most-requested feature from our DHH testers. Without it, rapid speaker changes turned the caption stream into an undifferentiated wall of text.

What our beta testers told us

We had six DHH participants in our beta program. I want to share three specific pieces of feedback because they shaped how we think about this feature.

A project manager at a Berlin-based fintech who is profoundly deaf told us that before MangoFinch, she would ask a colleague to type summaries of the German-language portions of meetings into a shared doc in real time. This worked but meant a second person was occupied full-time during every meeting. With MangoFinch, she follows the German segments through the English translations independently. She described it as the first time she attended a multilingual meeting without a human intermediary.

A software engineer in Toronto with moderate hearing loss said the speaker labels were more important to him than the accuracy improvements. His previous tool (Otter) was reasonably accurate but did not label speakers inline in real-time. He spent cognitive energy figuring out who was talking instead of processing what they said.

A sales director in Singapore who uses hearing aids said the adjustable font size was what made it usable for full-day meeting blocks. Her previous tool's captions were readable for 30 minutes but caused fatigue after that. At 28px with our high-contrast background, she reported comfortable reading for over two hours.

The business case that nobody makes

Companies spend considerable effort on diversity and inclusion programs. They build ramps, install accessible bathrooms, provide screen readers. But meeting accessibility — the thing that affects whether DHH employees can do their actual jobs — often comes down to whatever auto-caption feature shipped with the video conferencing tool IT selected for other reasons.

The math is straightforward. Hearing loss affects 15% of the global adult population (WHO, 2024). In a 200-person company, approximately 30 employees have some degree of hearing loss. Of those, research from the National Institute on Deafness suggests about 8-10 will have hearing loss significant enough to affect their ability to follow group conversations.

Those 8-10 people attend meetings every day. If the captions are unreliable, they are partially excluded from every meeting. They miss action items, misunderstand decisions, ask for clarifications that slow the team down. The cost is not theoretical — it shows up as duplicated work, missed deadlines, and the quiet attrition of talented people who get tired of fighting the tooling.

A single enterprise license for MangoFinch costs less per month than one hour of a professional CART (Communication Access Real-time Translation) provider. CART providers charge $150-250 per hour and need to be scheduled in advance. MangoFinch is always on, covers every meeting, and handles multilingual content that most CART providers cannot.

What we are building next

Our current caption system handles the basics well. The next features on our accessibility roadmap are specific to DHH users:

**Keyword highlighting.** The ability to set personal keywords (your name, your project names, action words like "deadline" and "assigned to") that get highlighted in the caption stream. Two of our beta testers asked for this independently.

**Caption history with search.** A scrollable caption log alongside the live stream, so DHH participants can scroll back to re-read something they missed without losing their place in the live captions. This ships in Q3.

**Customizable speaker colors.** Different colors for different speakers, configurable per user. This tested well in prototype but we need to ensure the color combinations meet WCAG contrast requirements against our dark background.

We are building these because our DHH beta testers told us exactly what they need. The captions work. Now we are making them work better for the people who depend on them most.

Try MangoFinch free

Real-time transcription and translation for multilingual teams. No credit card required.

Start a free meeting