Case studies
Consumer AI / voice assistant · 2024

Speech-in, speech-out - a GPT-4 Turbo assistant in 51 languages

Speech-in, speech-out - a GPT-4 Turbo assistant in 51 languages

Client
Botinfo ("AI Chat with Botinfo")
Duration
6 weeks
Status
Shipped
Stack
OpenAI GPT-4 Turbo · Native platform ASR · Google Text-to-Speech · Prompt engineering layer

What we solved

Typing a question into a chat box is the friction most voice-first apps never remove. Botinfo closes the loop: the user taps the mic, speaks, and hears the answer back - in any of 51 languages, with a choice of male or female voice. Under the hood it is three separate engineering problems stitched end to end, not one product.

The system at a glance

A mobile app (iOS + Android) captures audio, passes it to native platform ASR, feeds the cleaned transcript through a prompt-engineering middleware, calls the OpenAI GPT-4 Turbo API, and pipes the answer into Google Text-to-Speech for spoken playback. The OpenAI key lives on a FastAPI + Postgres backend proxy, not in the app bundle. Freemium subscriptions gate the volume: 3-day free trial, then weekly, monthly, or yearly plans via StoreKit + Play Billing.

What the user experiences

  • Open the app. The home screen prompts with example topics: Social, Professional, Health, Academic, Creative writing, Household.
  • Tap the mic. The screen confirms: “We are listening to your question. Tap again to submit question.”
  • Speak naturally in any of 51 languages. Pick male or female TTS voice.
  • The app transcribes in the background, rewrites the prompt to stay in-scope, calls GPT-4 Turbo.
  • The reply streams back and plays as natural-sounding speech through the device speaker.
  • Ask anything from casual chat to professional tasks - the system adapts through the prompt layer.

How we built the pieces

Voice in - native platform ASR, no extra SDK

Instead of bundling a third-party speech SDK and its gigabyte of locale models, we use the device’s native speech-recognition APIs (Apple Speech + Android SpeechRecognizer). 51-language support comes with them. Battery and install-size both benefit.

The middle - prompt engineering, not raw GPT

Botinfo does not pass raw transcripts to GPT-4 Turbo. The prompt-engineering layer classifies intent (question vs task vs creative-writing vs household-how-to) and injects a matching system prompt before the model call. That is the difference between “generic paragraph” and “useful answer for the domain the user actually asked about.”

Voice out - Google TTS

The reply text gets piped into Google Text-to-Speech with configurable voice (male / female). Pronunciation handles proper nouns, numbers, and the long tail of edge cases a phone-native TTS can fumble.

The API-key problem - FastAPI proxy, always

The wrong answer is shipping the OpenAI key in the mobile bundle. The right answer is a FastAPI + Postgres proxy: the app authenticates the device to our server, our server holds the OpenAI key, rate-limits per-user, and the key never lives in APKs, IPAs, or web bundles where anyone can decompile it. This is the fence between a weekend project and a product that accepts subscription payment.

Subscriptions - tiered freemium

Three-day free trial. Paid plans at $5–7/week, $20/month, or $179.99/year via StoreKit + Play Billing. Webhooks tell the backend in real time whether a user is paid, so rate limits flip without the app restarting.

Why Botinfo survives when OpenAI changes the rules

Three things keep Botinfo from dying the day OpenAI ships voice mode or changes pricing:

  1. The API key is server-held - we can swap providers without shipping an app update.
  2. The prompt layer is ours - we can rewrite it for GPT-4o, Claude, or a self-hosted model without retraining users.
  3. Voice in and voice out are separate - if Google TTS gets expensive, we swap for on-device TTS without touching the rest.

Results

  • Shipped to App Store (v1.16.0, Jan 2024) and Google Play.
  • 51-language support via native platform ASR.
  • Male/female TTS voice options.
  • Subscription tiers (weekly / monthly / yearly) live via StoreKit + Play Billing.
  • No OpenAI key in the app bundle; every model call authenticated at the FastAPI proxy.

What an engineering team should take from this

If you are building any voice-first LLM product, three things are worth copying from Botinfo:

  1. Treat the API key as a server secret. If you remember nothing else, remember this.
  2. Prompt middleware is a first-class component. Version it, log it, swap models behind it.
  3. Use the platform’s native ASR before you reach for a third-party SDK. 51 languages come free; bundle size stays small.

Tech stack

  • Mobile: Flutter (iOS + Android, single codebase - inferred)
  • AI: OpenAI GPT-4 Turbo API
  • Voice in: Native platform ASR (Apple Speech / Android SpeechRecognizer)
  • Voice out: Google Text-to-Speech (male / female voices)
  • Middleware: prompt-engineering layer routing intent to matching system prompts
  • Backend: FastAPI + Postgres (proxy holding the OpenAI key, session history, rate limiting)
  • Subscriptions: StoreKit + Google Play Billing

Screens

Three Botinfo phone mockups - Listening, Ask Anything, and orbit animation during GPT-4 Turbo response

Reference architecture

The stack, one-pass.

Named pieces, how they connect, and why each one earned its spot.

  • 01OpenAI GPT-4 Turbo

    natural-language understanding and answer generation across 51 languages

  • 02Native platform ASR

    on-device speech recognition - no third-party ASR SDK in the bundle

  • 03Google Text-to-Speech

    natural-sounding spoken output in male or female voice options

  • 04Prompt engineering layer

    raw transcripts are noisy; the middle layer rewrites before calling GPT-4 Turbo

  • 05Flutter

    one codebase for iOS + Android

  • 06FastAPI + Postgres

    proxy server holding the OpenAI key, rate-limiting, session history

Full stack

Every piece, named.

  • OpenAI GPT-4 Turbo
  • Native platform ASR
  • Google Text-to-Speech
  • Prompt engineering layer
  • Flutter
  • FastAPI + Postgres
  • StoreKit + Google Play Billing
The team on the call

Named engineers, not a pool.

You speak to the person who’ll review the architecture. No account-manager layer. No offshore switcheroo.

Founder & Lead Engineer

Sameer Donga

Shipping Flutter, FastAPI, and AI systems since 2019. Reviews the architecture on every engagement.

Start a similar build

You have the reference. Now the project.

Tell us the shape of your version. We come back with a written architecture and a fixed quote.