OpenTelemetry GenAI Semantic Conventions

Guide: Instrument LLMs with OpenTelemetry GenAI conventions—portable traces for chat, tools & RAG. Debug faster, swap vendors safely.

Standardize traces for LLMs, tools, and RAG so observability survives model swaps, vendor changes, and “agent sprawl.”

We’re watching observability become a first-class feature of AI engineering. Not because dashboards are cool—but because LLM systems are inherently distributed:

  • one call to a model,
  • another to embeddings,
  • a retrieval hop to a vector DB,
  • a tool call to an internal service,
  • a final synthesis step…

If we instrument each piece with different naming conventions (or worse: vendor-specific schemas), we get “telemetry soup.” The OpenTelemetry GenAI semantic conventions give teams a common vocabulary for spans and attributes, so traces remain meaningful even when we swap models/providers or reorganize agent workflows. (And yes, your future self will thank you.)

OpenTelemetry’s GenAI semantic conventions define well-known operation names like chat, embeddings, execute_tool, invoke_agent, etc., and they’re currently marked Development stability—so you should expect iteration, but you can still implement them today with guardrails.

Semantic conventions vs “vendor observability”

Let’s compare the two camps:

1) Vendor/platform conventions

Pros:

  • Fast time-to-value
  • Deep UI features (prompt diffing, eval dashboards, cost breakdowns)

Cons:

  • Harder portability: traces can become vendor-shaped
  • Switching tools later often means “re-instrument everything”

2) OpenTelemetry semantic conventions (the “portable standard”)

Pros:

  • Tool/vendor agnostic: emit OTLP once, choose backends later
  • Easier cross-team alignment (platform engineering loves this)
  • Better story for compliance and long-term maintainability

Cons:

  • You may need to build/compose some higher-level views yourself
  • GenAI conventions are still evolving (Development stability)

Also worth noting: there are adjacent/open efforts like OpenInference (popular in the LLM observability community) that define their own semantic attributes. You can map between schemas, but the strategic win is choosing a “source of truth” early and being consistent.

The GenAI span model in practice

OpenTelemetry’s GenAI conventions standardize things like:

  • Span naming: recommended patterns like {gen_ai.operation.name} {gen_ai.request.model} for certain operations.
  • Operation names: chat, embeddings, execute_tool, invoke_agent, create_agent, generate_content, etc.
  • Provider identification: well-known gen_ai.provider.name values such as openai, anthropic, azure.ai.openai, aws.bedrock, etc.
  • Sensitive content guidance: some attributes (like message/tool payloads) may contain sensitive data; instrumentations should allow filtering/truncation and avoid emitting huge payloads by default.

That last bullet is the difference between “helpful traces” and “we accidentally logged customer secrets.”

A working implementation

Below is a practical implementation that:

  1. configures OpenTelemetry tracing,
  2. creates a GenAI span around an LLM request,
  3. works with OpenAI Responses API,
  4. optionally works with LM Studio as an OpenAI-compatible local server.

Step 0: Install dependencies

pip install opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc openai

Step 1: Configure OpenTelemetry + OTLP exporter

from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

resource = Resource.create({
    "service.name": "genai-demo",
    "service.version": "0.1.0",
})

provider = TracerProvider(resource=resource)
trace.set_tracer_provider(provider)

# Export to an OTLP endpoint (Collector / vendor / gateway)
otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True)
provider.add_span_processor(BatchSpanProcessor(otlp_exporter))

tracer = trace.get_tracer("genai-demo")

Tip: In real deployments, prefer standard OpenTelemetry env vars (e.g., OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_ENDPOINT) instead of hardcoding.

Step 2: Instrument an OpenAI Responses API call (correct API shape)

OpenAI’s Responses API uses client.responses.create(...).

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def generate_answer(user_text: str) -> str:
    model = "gpt-4.1-mini"  # pick what you actually use

    with tracer.start_as_current_span(f"chat {model}") as span:
        # GenAI semantic attributes (keep them small & safe)
        span.set_attribute("gen_ai.operation.name", "chat")
        span.set_attribute("gen_ai.provider.name", "openai")
        span.set_attribute("gen_ai.request.model", model)
        span.set_attribute("gen_ai.output.type", "text")

        # ⚠️ Avoid logging raw prompts by default (privacy/security)
        # If you must, consider redaction/truncation + explicit opt-in.

        resp = client.responses.create(
            model=model,
            input=user_text,
        )

        # Responses API returns content in a structured form; a common helper is output_text.
        # Keep your code aligned to the official docs for your SDK version.
        text = getattr(resp, "output_text", None)
        if text is None:
            # Fallback: handle structured output if output_text isn't available
            text = str(resp)

        return text

Step 3: Swap OpenAI for LM Studio locally (same instrumentation)

LM Studio’s API server is designed to be OpenAI-compatible and documents endpoints like /v1/chat/completions, /v1/embeddings, and /v1/responses.

from openai import OpenAI

# LM Studio default is often http://localhost:1234/v1
client = OpenAI(
    api_key="lm-studio",  # LM Studio typically doesn’t require a real key
    base_url="http://localhost:1234/v1",
)

def local_answer(user_text: str) -> str:
    model = "your-local-model-name"

    with tracer.start_as_current_span(f"chat {model}") as span:
        span.set_attribute("gen_ai.operation.name", "chat")
        span.set_attribute("gen_ai.provider.name", "openai")  # schema-wise it's OpenAI-compatible
        span.set_attribute("gen_ai.request.model", model)

        resp = client.responses.create(model=model, input=user_text)
        return getattr(resp, "output_text", str(resp))

Real-world implementation tips

Don’t record raw prompts by default

OpenTelemetry’s GenAI spec explicitly warns that some attributes may contain sensitive information and recommends opt-in + filtering/truncation approaches.
Practical pattern:

  • default: record metadata only (model, operation, durations, errors)
  • debug mode (explicit): record trimmed/redacted message content
  • production: use allowlists and regex-based scrubbing for secrets

Use separate spans for the parts you’ll actually debug

A useful trace breakdown looks like:

  • invoke_agent <agent-name>
    • chat <model>
    • embeddings <embed-model>
    • execute_tool <tool-name>
    • chat <model> (final synthesis)

Those operation names are standardized in the GenAI conventions list.

Keep attribute payloads small

Even if you can record large tool definitions or message arrays, the spec warns these can be large and shouldn’t be on by default.
Engineers love observability… right up until the collector bill arrives.

Quick comparisons you’ll get asked in leadership meetings

“How is this different from OpenInference?”

OpenInference defines an LLM-oriented attribute model used by parts of the community and tooling ecosystem. You can use it, but if your platform strategy is “OpenTelemetry everywhere,” the GenAI semantic conventions reduce fragmentation across services and languages.

“Does this lock us into OpenAI?”

No—gen_ai.provider.name includes many providers (OpenAI, Anthropic, Bedrock, Azure OpenAI, etc.), and the operation naming stays consistent across them.
That’s the entire point: swap providers without rewriting your observability story.

Key takeaways

  • GenAI semantic conventions give your traces a shared grammar: chat, embeddings, execute_tool, invoke_agent, etc.
  • The conventions are Development stability, so implement with versioned mappings and expect updates.
  • Instrumentation should treat prompts/tool payloads as sensitive and large—opt-in, redact, truncate.
  • You can instrument OpenAI Responses cleanly using the official API shape (responses.create).
  • You can run the same approach locally with LM Studio’s OpenAI-compatible API server.

Cohorte Team
December 15, 2025.