Improved Streaming Support and Async Stream Parser

February 27, 2025

Improved Streaming Support and Async Stream Parser

We’ve made significant improvements to our streaming functionality with two key updates:

Stream Fixes

We’ve resolved several issues with stream handling across different LLM providers, ensuring more reliable and consistent streaming experiences. These fixes address edge cases and improve compatibility with various streaming implementations, including:

Better handling of stream interruptions and reconnections
Improved error handling for streaming responses
Enhanced compatibility with different LLM provider streaming formats
Fixed timing calculations for streamed responses

New Streaming Methods

The HeliconeManualLogger class now includes enhanced methods for working with streams:

logStream: Logs a streaming operation with full control over stream handling
logSingleStream: Simplified method for logging a single ReadableStream
logSingleRequest: Logs a single request with a response body

Example Usage with Together AI

import Together from "together-ai";
import { HeliconeManualLogger } from "@helicone/helpers";

// Initialize with properties
const helicone = new HeliconeManualLogger({
  apiKey: process.env.HELICONE_API_KEY!,
  loggingEndpoint: "https://api.worker.helicone.ai/oai/v1/log",
  headers: {
    "Helicone-Property-Environment": "production",
  },
});

export async function POST(request: Request) {
  const { question } = await request.json();

  const body = {
    model: "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages: [{ role: "user", content: question }],
    stream: true,
  } as Together.Chat.CompletionCreateParamsStreaming & { stream: true };

  const response = await together.chat.completions.create(body);
  const [stream1, stream2] = response.tee();
  helicone.logStream(
    body,
    async (resultRecorder) => {
      resultRecorder.attachStream(stream2.toReadableStream());
    },
    {
      "Helicone-User-Id": "123",
    }
  );

  return new Response(stream1.toReadableStream());
}

These improvements make working with streaming LLMs more reliable and efficient, especially for applications that require real-time responses.