How does MangoFinch handle real-time multilingual transcription?

When someone speaks, MangoFinch turns their words into text and translates it for everyone else — all in real time with almost no delay.

How many languages does MangoFinch support?

Over 36 languages, and it figures out which one you're speaking automatically.

Do I need to install anything to use MangoFinch?

Nope. It works right in your browser — just open the site, make a room, and share the code.

What is the difference between the Free and Pro plans?

Free gets you 5 meetings a day (2 people, 30 min each). Pro at $29/month gives you unlimited meetings, up to 10 people, longer sessions, recordings, and AI summaries.

Is my meeting data secure and private?

Yes. Everything is encrypted and audio isn't stored unless you choose to record.

Back to blog

Product Updates

What we learned from 10,000 multilingual meeting minutes

Data from MangoFinch beta users reveals how multilingual teams actually communicate — and it's not what most people assume.

MangoFinch TeamApril 11, 20268 min read

We hit 10,000 transcribed meeting minutes in our beta last month. That's a lot of audio — roughly 167 hours of multilingual conversations across 47 teams in 12 countries.

I spent a week pulling the data apart. Some of what we found confirmed our assumptions. Some of it surprised us. And a few patterns completely changed how we think about multilingual transcription.

Here's what the numbers actually say.

The average meeting uses 2.3 languages

Not two. Not three. 2.3. That decimal point matters because it tells you something specific: most multilingual meetings have a primary language and a secondary language, with occasional dips into a third.

The distribution breaks down like this:

- 31% of meetings used exactly 2 languages

- 28% used 3 languages

- 22% were monolingual (but used MangoFinch for the real-time translation feature)

- 14% used 4 languages

- 5% used 5 or more

The record holder was a supply chain coordination call between a team in Shenzhen, their logistics partner in Rotterdam, a supplier in Istanbul, and headquarters in São Paulo. Six languages in 45 minutes: Mandarin, Dutch, English, Turkish, Portuguese, and a brief stretch of Spanish when the São Paulo PM talked to a colleague in Buenos Aires who joined late.

That transcript would have been completely unusable in any other tool. In MangoFinch, it read like a meeting — messy, human, and coherent.

73% of code-switches happen at topic boundaries

This was the finding that changed how I think about language switching.

We tagged every language switch in our dataset and mapped them against the conversational structure. Nearly three-quarters of switches happened when the topic changed — not randomly mid-thought.

A team discussing quarterly targets in English would switch to Korean when the conversation moved to a specific client relationship managed by the Seoul office. A design review conducted in French would shift to English when the discussion turned to the React component library. A sales pipeline review in Portuguese would briefly go Japanese when discussing the Tokyo market.

The switches tracked with expertise domains. People gravitate toward the language they encoded their knowledge in.

The remaining 27% of mid-topic switches fell into predictable categories:

- 11% were emotional register switches (frustration, excitement, emphasis)

- 9% were precision switches (reaching for an untranslatable concept)

- 4% were rapport switches (two speakers sharing a minority language)

- 3% were genuinely accidental (false starts, word-finding pauses)

Only that last 3% resembled what most people imagine when they think of code-switching as "confused" or "messy." The other 97% was structured and purposeful.

The most common language pairs

We ranked every language pair by frequency of co-occurrence in meetings:

1. **English–Spanish** (18% of multilingual meetings)

2. **English–Mandarin** (14%)

3. **English–Japanese** (11%)

4. **English–Portuguese** (9%)

5. **English–French** (8%)

6. **English–Korean** (7%)

7. **English–Hindi** (6%)

8. **Spanish–Portuguese** (5%)

9. **English–German** (4%)

10. **Mandarin–Japanese** (3%)

English shows up as one half of the pair in 77% of multilingual meetings. That's not surprising — it functions as the lingua franca for most global business. But what's telling is that it's rarely the *only* language. English is the bridge, but people keep one foot on their home shore.

The Spanish-Portuguese pair at number eight was interesting. Those meetings tended to involve Latin American teams where participants could understand each other's languages without translation, switching between them fluidly. The Mandarin-Japanese pair showed up in East Asian business contexts where both languages share written characters and certain business vocabulary.

The "comfort zone" pattern

This was my favorite finding. We noticed a consistent pattern where speakers would start a complex argument in their L2 (usually English), hit a wall, and revert to their L1 to finish the thought.

We called it the "comfort zone" pattern because it maps to cognitive load. When the argument is simple — reporting numbers, sharing status updates, reading from slides — people stay in L2 comfortably. When the argument gets complex — defending a technical decision, explaining a nuanced tradeoff, pushing back on a proposal — the cognitive cost of constructing the argument *and* translating it simultaneously becomes too high.

The data showed a clear threshold. Once a speaker's utterance exceeded roughly 45 seconds of continuous complex argumentation in their L2, the probability of a switch to L1 jumped from 12% to 68%. The brain hits a wall, and the language system takes the path of least resistance.

This has real implications. If your meeting tool can't handle that switch, you lose the most substantive part of what your team member was trying to say — because the moment they switch is precisely the moment they're getting to the hard part.

Peak switching: first five and last ten minutes

We graphed language switching frequency across meeting duration and found a clear U-shape.

**First five minutes: warmup switching.** Meetings start with greetings, small talk, and settling in. This is where people are most relaxed and most likely to use their L1 for informal conversation before the meeting "officially" starts. We saw switching rates 2.4x higher than the meeting average in this window.

**Middle section: stabilization.** Once the agenda kicks in, meetings tend to settle into a primary language, with switches following the topic-boundary pattern described above. Switching frequency drops to its lowest point around the 15-20 minute mark.

**Last ten minutes: wind-down switching.** As meetings wrap up, switching frequency climbs again — not quite to opening levels, but 1.8x the middle-section average. This phase includes action item assignment (often in the language of the person responsible), casual sign-offs, and side conversations that spill out of the formal structure.

The practical takeaway: if your transcription tool has a "warm-up" period where it's calibrating to the meeting language, it's going to miss the most linguistically diverse portion of the call. We designed MangoFinch to start language detection from the first audio frame.

Which languages have the hardest acoustic overlap

Not all language pairs are equally easy to distinguish in audio. Some combinations give our detection model significantly more trouble than others.

**Hardest pairs for acoustic detection:**

1. **Spanish and Portuguese** — 89% phonological overlap in connected speech. When a Brazilian speaker talks fast, the acoustic signature is remarkably close to certain Spanish dialects. Our detection accuracy drops about 6 percentage points for this pair compared to our average.

2. **Hindi and Urdu** — Spoken registers are nearly identical in casual conversation. The differences are primarily in formal vocabulary and script, neither of which helps with audio detection. We rely heavily on specific vocabulary markers rather than acoustic features.

3. **Mandarin and Cantonese** — Tonal systems overlap significantly, and code-switching between them is extremely common in southern Chinese business contexts. Short utterances (under 3 seconds) are the hardest to classify.

4. **Norwegian, Swedish, and Danish** — Scandinavian meetings with multiple Nordic languages present a genuine detection challenge. We currently group these more broadly and rely on contextual cues.

**Easiest pairs for acoustic detection:**

1. **Japanese and English** — Completely different phonological systems. Detection accuracy is very high.

2. **Arabic and any Romance language** — Pharyngeal consonants in Arabic are acoustically distinct from anything in French, Spanish, or Portuguese. Detection accuracy is very high.

3. **Korean and English** — Syllable structure differences make these very distinguishable. Detection accuracy is very high.

Average transcript accuracy by language tier

We grouped our supported languages into three tiers based on transcription accuracy across the beta:

**Tier 1 (highest accuracy):** English, Spanish, French, German, Portuguese, Japanese, Mandarin Chinese. These have the most training data in our speech engine and the most consistent results in our system.

**Tier 2 (strong accuracy):** Korean, Hindi, Arabic, Italian, Dutch, Russian, Turkish. Strong performance but with more variation depending on accent, speaking speed, and audio quality.

**Tier 3 (moderate accuracy):** Vietnamese, Thai, Indonesian, Polish, Czech, and other languages with smaller representation in training data. Usable for getting the gist of a conversation, but not yet reliable enough for verbatim transcription.

The accuracy drops at language switch points by an average of 4 percentage points across all tiers. That boundary between languages — the last syllable of one language and the first syllable of the next — is where the model has to make a hard decision with minimal context. We've improved this by 8 points since our first beta build, and it's our primary accuracy focus for Q2.

Real patterns from real teams (anonymized)

A few examples from the beta that illustrate how multilingual teams actually use MangoFinch:

**The "bilingual buffer" pattern.** A consulting team in Singapore has one team member who's fluent in both English and Mandarin. She naturally becomes the bridge — listening to the Mandarin discussion, then summarizing key points in English for team members who don't speak Mandarin, and vice versa. Before MangoFinch, the English-only speakers had to trust her summary completely. Now they can read the translated transcript in real time and ask follow-up questions directly. She told us: "I used to spend half the meeting translating instead of contributing."

**The "L1 sidebar" pattern.** A product team split across Berlin and Bangalore uses English as their meeting language. But when two Bangalore engineers need to quickly hash out a technical detail, they drop into Hindi for 30-60 seconds, resolve it, and then one of them summarizes back in English. With MangoFinch, the Berlin team members can follow along with the translated sidebar in real time instead of sitting idle. Meeting time for their weekly sync dropped from 55 minutes to 40 minutes.

**The "emotional L1" pattern.** A startup founder originally from Brazil runs her all-hands in English for her mixed team. But when she gets genuinely passionate about a milestone or a challenge, she slips into Portuguese. Her team has learned that when she switches to Portuguese, it means she really cares about what she's saying. MangoFinch catches those moments instead of leaving a gap in the transcript.

**The "domain expert" pattern.** A pharmaceutical company's regulatory team discusses FDA submissions in English but switches to German when talking about EU-specific regulations because half the team did their regulatory training in Germany. The regulatory vocabulary lives in German in their heads, and forcing English adds translation overhead to already complex subject matter.

What surprised us most

Three things we didn't expect:

**Meeting culture varies more than meeting language.** Two teams speaking the same two languages (English and Japanese) can have completely different switching patterns based on company culture. A flat startup had roughly equal language time. A traditional corporate team used Japanese for 80% of the meeting, with English only for technical terms.

**Audio quality matters more than accent.** We expected accent variation to be our biggest accuracy challenge. It wasn't. The difference between a good microphone in a quiet room and a laptop mic in a coffee shop dwarfed the difference between any two accents of the same language. A non-native English speaker with a quality headset transcribed more accurately than a native speaker on a laptop in an open office.

**People edit their transcripts less than we expected.** Our beta includes a transcript editor, and we assumed heavy editing. Average edits per meeting: 3.2 corrections. Most users treated the transcripts as "good enough" reference documents rather than verbatim records. The value wasn't perfection — it was having a searchable, multilingual record that captured the meeting's content across all languages used.

What we're building next

The 10,000-minute dataset is now feeding directly into our development roadmap.

We're adding speaker-attributed language profiles — once MangoFinch learns that a particular speaker tends to use English for status updates and Korean for technical details, it can bias its detection model accordingly. Early tests show a 3 percentage point accuracy improvement at switch boundaries.

We're also building meeting analytics that surface the language distribution for each call — not as a judgment, but as a tool for inclusive meeting design. If your transcript shows that one team member spoke for 2 minutes in a 60-minute meeting, and their only contributions were in their L2, that's a signal worth paying attention to.

The data keeps confirming the thesis we started with: multilingual meetings aren't broken. They're sophisticated, patterned, and purposeful. The tools just needed to catch up.

Try MangoFinch free

Real-time transcription and translation for multilingual teams. No credit card required.

Start a free meeting