Building Stezio: Guiding Remote Stethoscope Exams with Gemini Live and Google Cloud Run!!!

I created this project as part of the #GeminiLiveAgentChallenge, and honestly, it is so cool.

When I started thinking about telemedicine, one problem kept bothering me. Doctors can see you and talk to you over video, but they cannot actually examine you. A normal webcam is fine for conversation, but it cannot replace something like listening to a patient’s heart or lungs.

Even if a patient owns a digital stethoscope, another issue appears. Most people do not know the correct anatomical spots to place it. Positions like the Aortic or Pulmonic valve areas are very specific. Without guidance, people often move the stethoscope around randomly and end up recording noise that doctors cannot use.

That problem led me to build Stezio, an AR guided AI copilot that acts like a doctor sitting next to you, guiding you step by step through a basic physical exam.

The brains: Gemini Flash Native Audio

For something like this to work, the AI has to respond almost instantly. A normal text to speech pipeline would not work. Imagine dragging a stethoscope across your chest and the AI says “stop” three seconds later. By then you have already moved past the correct spot.

So I used Gemini 2.5 Flash Native Audio through the Live API. Instead of generating text and converting it to speech, the model processes the context and streams audio responses directly in real time.

On the frontend, I added a computer vision layer using MediaPipe. It tracks the LED on the stethoscope as the user moves it across their chest. Those coordinates get converted into small JSON packets that represent the stethoscope’s position.

That spatial data is streamed into Gemini Live. Because the model knows where the user’s hand is, it can respond naturally with instructions like “move it slightly down… perfect, hold still.” The feedback comes back almost instantly, which makes the experience feel much more natural.

The backbone: Google Cloud Run

Connecting a web app directly to a real time multimodal model turned out to be one of the harder parts of the project. The app relies on continuous bidirectional streaming over WebSockets.

My frontend is built with Next.js and hosted on Vercel. The issue is that edge functions on Vercel tend to terminate long running connections, which makes them a bad fit for persistent WebSocket streams.

To solve this, I built a dedicated Node.js WebSocket proxy server and deployed it on Google Cloud Run.

The backend is containerized with Docker and deployed through Google Cloud Build and Google Artifact Registry. A simple script builds the container and pushes it to Cloud Run automatically.

This proxy server sits between the client and Vertex AI. It receives live microphone audio and spatial data from the browser, packages everything together, and forwards it to the Gemini model in real time.

The result

By combining on device computer vision, real time AI reasoning, and cloud routing, Stezio turns a complicated medical step into something simple. The user just sits in front of their laptop while the AI guides their hand until the stethoscope reaches the correct spot.

Building this for the #GeminiLiveAgentChallenge made me think differently about how spatial data and AI can work together. Telemedicine does not have to stop at video calls. With the right tools, patients can perform guided exams at home while doctors still receive useful clinical data.

If you want to see the code, feel free to check out the repository. You can also try the live demo at app.stezio.com.

I also created this piece of content for the purposes of entering the #GeminiLiveAgentChallenge.

See my presentation and the live demo I am presenting here 😄

#GeminiLiveAgentChallenge

0 Response to "Building Stezio: Guiding Remote Stethoscope Exams with Gemini Live and Google Cloud Run!!!"

Post a Comment