How does MangoFinch handle real-time multilingual transcription?

When someone speaks, MangoFinch turns their words into text and translates it for everyone else — all in real time with almost no delay.

How many languages does MangoFinch support?

Over 36 languages, and it figures out which one you're speaking automatically.

Do I need to install anything to use MangoFinch?

Nope. It works right in your browser — just open the site, make a room, and share the code.

What is the difference between the Free and Pro plans?

Free gets you 5 meetings a day (2 people, 30 min each). Pro at $29/month gives you unlimited meetings, up to 10 people, longer sessions, recordings, and AI summaries.

Is my meeting data secure and private?

Yes. Everything is encrypted and audio isn't stored unless you choose to record.

Back to blog

Customer Stories

Multilingual transcription for healthcare: compliance, accuracy, and patient safety

21% of US patients speak a language other than English at home. Medical miscommunication causes real harm. Here is where transcription fits in, what HIPAA requires, and what we are building toward.

MangoFinch TeamMay 16, 20267 min read

A 67-year-old Spanish-speaking woman told her doctor she was "intoxicada." The interpreter translated it as "intoxicated." The ER team treated her for substance abuse. She was actually describing food poisoning. The misdiagnosis delayed proper treatment by 9 hours.

This case, documented in a 2001 study from the Journal of General Internal Medicine, gets cited a lot in medical interpretation training. It should also be cited in every conversation about medical transcription technology. Language is not a convenience in healthcare. It is infrastructure. When it breaks, patients get hurt.

The numbers behind the language gap

About 68 million people in the United States speak a language other than English at home. That is roughly 21% of the population, according to the 2023 American Community Survey. In states like California, Texas, and Florida, it is closer to 35%.

These patients interact with a healthcare system built almost entirely in English. Electronic health records are in English. Clinical documentation is in English. Discharge instructions, consent forms, lab reports — English.

The FDA's Adverse Event Reporting System has been analyzed multiple times for language-related incidents. A 2019 analysis found that approximately 10% of reported adverse events had documentation suggesting a language barrier contributed to the error. That is not a rounding error. That is tens of thousands of events per year where someone might not have been harmed if the communication had been clearer.

The Joint Commission reviewed 832 adverse events and found that communication failures were the root cause in 59% of them. Not all of those were language barriers — many were handoff failures, missing documentation, unclear orders. But when you layer limited English proficiency on top of an already fragile communication chain, the failure rate compounds.

How healthcare handles language today

Hospitals and clinics use three main approaches for language access.

**In-person interpreters** are the gold standard. A trained medical interpreter sits in the room, understands medical terminology in both languages, and can pick up on nonverbal cues. The problem: there are not enough of them. The Bureau of Labor Statistics estimates about 52,000 interpreters and translators working in the US, across all industries. Healthcare competes with courts, schools, and government agencies for the same pool. Many hospitals report wait times of 30 minutes to 2 hours for an in-person interpreter in less common languages like Burmese, Somali, or Haitian Creole.

**Phone and video interpretation services** are more available. Companies like Language Line and AMN Healthcare offer on-demand interpretation in 200+ languages. Response times average 30-60 seconds for Spanish, longer for less common languages. The cost is $1.50-3.00 per minute. A typical 20-minute patient encounter with interpretation costs $30-60 in interpretation fees alone. For a community health center seeing 80 LEP patients per day, that is $2,400-4,800 daily in interpretation costs.

**Bilingual staff** fill the gaps informally. A Spanish-speaking nurse might translate for a doctor who does not speak Spanish. This is common and problematic — the nurse is not trained in medical interpretation, may not know terminology in both languages, and is now doing two jobs simultaneously. The Agency for Healthcare Research and Quality has flagged this as a patient safety risk repeatedly.

None of these solutions create a written record of the non-English communication. The doctor's notes are in English. The patient's actual words — the ones that contain the clinical information — usually go unrecorded.

Where transcription fits

Medical transcription and medical interpretation are different things, and the distinction matters.

An **interpreter** is a real-time communication bridge. They enable a conversation between people who do not share a language. They understand context, can ask for clarification, and adapt to the clinical situation.

A **transcription tool** converts speech to text. It records what was said. It does not participate in the conversation or make clinical judgments.

MangoFinch is a transcription tool, not an interpreter. We want to be very clear about that boundary because confusing the two could be dangerous.

What transcription can do for healthcare is create documentation that currently does not exist. When a Spanish-speaking patient describes symptoms during a telehealth visit, MangoFinch can transcribe those words in Spanish and provide an English translation. The physician gets the translation in real time. The medical record gets a searchable, time-stamped transcript in both languages.

This matters for several specific scenarios:

**Telehealth visits** are the most natural fit. The audio stream already exists. There is no hardware to install. The transcription runs alongside the video call and captures everything. For the 38% of LEP patients who reported difficulty accessing interpreter services for telehealth in a 2023 JAMA study, real-time transcription plus translation could fill the gap while proper interpretation is being arranged.

**Multidisciplinary case conferences** are another strong use case. A care team discussing a patient might include a physician who took the history in English, a social worker who spoke with the family in Mandarin, and a physical therapist who communicated with the patient in Vietnamese. The conference happens in English. The original patient communications — and the nuances in them — are not in the record.

**Clinical documentation** is the daily grind that burns out healthcare workers. Physicians spend an average of 2 hours on documentation for every 1 hour of patient care, according to a 2023 AMA study. When language barriers are involved, that ratio gets worse because the doctor is simultaneously translating, interpreting context, and writing notes.

The HIPAA question

Anyone building technology that touches patient health information has to deal with HIPAA. This is not optional and not simple.

HIPAA's Privacy Rule governs who can access protected health information (PHI). The Security Rule governs how electronic PHI (ePHI) must be protected. For a transcription tool processing real-time audio from patient encounters, both rules apply directly.

The key requirements:

**Business Associate Agreement (BAA).** Any third-party service that processes, stores, or transmits PHI on behalf of a healthcare provider must sign a BAA. This means the transcription provider, the speech-to-text API, the translation API, and the cloud hosting provider all need BAAs in place. Major cloud speech and translation providers offer HIPAA-eligible environments with BAA. The chain of BAAs has to be unbroken.

**Encryption.** ePHI must be encrypted in transit (TLS 1.2 minimum) and at rest (AES-256 is standard). Audio streams, transcription text, translation outputs, and stored records all need encryption. This is table stakes for any modern SaaS, but healthcare requires documentation proving it.

**Access controls.** Role-based access, audit logging, automatic session timeouts, unique user identification. Every access to PHI must be logged with who, when, and what.

**Data retention and disposal.** Healthcare organizations have specific retention requirements (typically 6-10 years depending on the state and record type). The transcription tool needs configurable retention policies and certified data destruction processes.

**Minimum necessary standard.** Only the minimum amount of PHI needed for the specific purpose should be accessed. A transcription tool should not store more patient data than what appears in the audio.

Where MangoFinch stands today

I want to be honest about this: MangoFinch is not HIPAA-compliant today.

We have the encryption. We have access controls and audit logging. Our infrastructure runs on encrypted channels end to end.

What we do not have yet is the full compliance package: a completed SOC 2 Type II audit, BAAs with our speech and translation providers' HIPAA-eligible tiers, and formal policies for breach notification, data retention, and disposal. We also have not engaged a third-party security firm to validate our controls, which is standard practice before offering a BAA to healthcare customers.

This is not a weekend project. SOC 2 audits take 3-6 months. Building a HIPAA-eligible deployment that is architecturally separate from our standard product takes engineering time. We are planning for this, with a target of Q4 2026 for a healthcare-specific offering.

In the meantime, we are not marketing to healthcare providers or accepting PHI. We think it is irresponsible to put a "HIPAA-compliant" badge on a product that has not completed the full validation process. We have seen other transcription startups do this. It is reckless.

The accuracy bar is higher in healthcare

General meeting transcription can tolerate some errors. If the transcript says "Q3 revenue was $4.2 million" instead of "$4.3 million," someone will catch it in review.

Medical transcription errors can cause direct harm. "Hypertension" versus "hypotension" is a one-word difference with opposite clinical implications. "Take 1.0 mg" versus "take 10 mg" is a decimal point. "Left" versus "right" determines which side of the body gets operated on.

Our current word error rate across all languages is in the single digits for general transcription. For medical terminology specifically, we have not yet benchmarked against clinical datasets because we have not built the medical-specific language models. Standard speech-to-text models perform 15-25% worse on medical terminology compared to general vocabulary, based on published industry benchmarks.

Getting that number down requires fine-tuning on medical audio datasets, which requires partnerships with healthcare organizations, which requires HIPAA compliance. It is a chicken-and-egg problem that we are working through methodically.

What the roadmap looks like

We are building toward healthcare readiness in stages.

**Stage 1 (now through Q2 2026):** Architecture review. We are documenting every data flow, every third-party integration, every storage location. This is the foundation for the SOC 2 audit.

**Stage 2 (Q3 2026):** SOC 2 Type II audit engagement. We will work with an independent auditor to validate our security controls over a 6-month observation period.

**Stage 3 (Q4 2026):** HIPAA-eligible deployment. Architecturally isolated environment with BAAs from all sub-processors. Healthcare-specific data retention policies. Breach notification procedures.

**Stage 4 (Q1 2027):** Medical vocabulary optimization. Partnership with a clinical NLP team to fine-tune our transcription pipeline on medical audio. Domain-specific accuracy benchmarking.

This timeline might slip. Compliance projects always take longer than planned. But we would rather ship late and correct than ship early and unsafe.

The gap between what exists and what is needed

Healthcare needs multilingual transcription more than almost any other industry. The patient population is linguistically diverse. The stakes are life and death. The existing solutions — human interpreters, phone lines, bilingual staff — are expensive, inconsistent, and leave no searchable record.

Technology can help, but only if it meets the compliance requirements and the accuracy bar. We are not there yet. We are building toward it with our eyes open about what "there" means.

The interpretation-transcription workflow

One thing we have learned from conversations with healthcare administrators is that transcription does not replace interpretation. It supplements it. The practical workflow looks like this:

A Spanish-speaking patient arrives at a clinic. The front desk requests an interpreter. While waiting (average: 34 minutes for non-Spanish languages, per a 2023 survey of 200 community health centers), the intake nurse uses a translation app for basic questions. The interpreter arrives. The clinical conversation happens through the interpreter. After the visit, the physician writes notes in English from memory and from any notes taken during the interpreted session.

The gap is in step four. The physician is reconstructing a conversation from memory. They are translating back from what the interpreter said, filtering through their own understanding of what the patient meant. Clinical details get lost in this reconstruction. A 2020 study in Patient Education and Counseling found that physicians recalled only 60-70% of the clinical information exchanged during interpreted encounters, compared to 85-90% in same-language encounters.

A transcription tool running during the encounter would capture the original Spanish, the interpreter's English translation, and the physician's responses. The physician's post-visit documentation could reference exact quotes instead of reconstructions. The transcript becomes a safety net, not a replacement for the interpreter.

For telehealth specifically, the audio is already digital. There is no additional hardware. The transcription runs silently alongside the video call. The physician gets a timestamped, bilingual record of everything said. If they need to check whether the patient said "dolor de cabeza" or "dolor de pecho" — headache versus chest pain — the answer is in the transcript.

What we are hearing from the field

We have had conversations with 14 healthcare organizations in the last three months. Not sales conversations — research conversations about what they actually need from multilingual technology.

Three consistent themes:

First, they want transcription for documentation support, not for direct patient communication. No one is asking us to replace interpreters. They want a record of what was said so the documentation is more complete.

Second, they need configurable data retention. Some states require 7 years for medical records. Some specialties require longer. A transcription tool that deletes records after 90 days does not work for healthcare.

Third, the accuracy bar for medical terminology is non-negotiable. An 85% general accuracy rate that drops to 70% on medical terms is not usable. They would rather have no transcript than an inaccurate one that could be entered into the medical record and relied upon for clinical decisions.

These conversations are shaping our healthcare roadmap. We are not building what we think healthcare needs. We are building what healthcare practitioners are telling us they need.

If you work in healthcare and want to follow our progress toward a HIPAA-eligible offering, reach out at healthcare@mangofinch.com. We want to build this with clinical partners who can tell us what we are getting wrong before it matters.

Try MangoFinch free

Real-time transcription and translation for multilingual teams. No credit card required.

Start a free meeting