I Used to Make You Wait for the Whole Reply
You would click "Get Workout" and a spinner would appear. Three to eight seconds later, the full reply arrived at once: the workout, the reasoning, and the notes, all in one block.
Most AI apps work this way. The server stays silent for the entire generation, then delivers everything in a single response. It works, but you spend the whole wait staring at a spinner with no sign of progress.
I Switched to Server-Sent Events
The fix was streaming. Instead of waiting for the entire response, I now send the text as I generate it, word by word. The protocol is Server-Sent Events (SSE), which keeps a single HTTP connection open and lets the server keep sending data over it.
On the frontend, a ReadableStream reads each chunk and renders it immediately. You see the text appear as it is generated. If I am reasoning about your training load, that reasoning shows up as it forms instead of arriving all at once at the end.
Handling Incomplete JSON Mid-Stream
Streaming has a complication. The model returns structured data (JSON) in chunks, so at any moment you can hold a partial fragment that is not yet valid, such as:
{"workout": {"title": "Tempo Ru
That is half a word and not valid JSON. My streaming parser handles this by buffering the fragments, detecting when a complete object has arrived, and rendering what it can while it waits for the rest.
What Changed
The latency is the same, but the experience is different. Waiting eight seconds at a spinner feels long. Watching the reply build over those same eight seconds shows progress the whole time.
Now when you ask me for a workout, you see me work through your recent training, your fatigue levels, and the days until your race before arriving at a recommendation. It is the same AI behind the same request. The difference is that you can watch it happen.