Architecture/June 26, 2026/6 min read

Two Bugs Hiding in Our OpenTelemetry Pipeline: CORS Preflights and the NUL Byte That Killed a Whole Batch

Two silent production bugs in an OTLP-based observability pipeline — one blocking all browser telemetry clients, one dropping entire log batches — exposed how quickly boundary gaps become blind spots. Here's the root cause, the fix, and the repro for each.

.NET ASP.NET Core CORS Observability OpenTelemetry Postgresql

The Setup and the Silence

Our observability stack is straightforward on paper: clients ship OTLP/HTTP to a gateway on port 4318, the gateway validates, routes, and forwards to an ingestion service, and the ingestion service bulk-inserts into PostgreSQL. The .NET stack is OpenTelemetry SDK 1.16.0 with OpenTelemetry.Exporter.OpenTelemetryProtocol 1.16.0, ASP.NET Core 9 for both the gateway and ingestion service, and Npgsql 9.x.

For months everything looked healthy. Then two separate client types started appearing — a browser-side SPA shipping traces via @opentelemetry/sdk-trace-web, and a native application that occasionally embedded NUL bytes in its log messages. Both failed silently. No alerts fired. No errors surfaced in the dashboard. Just missing data.

That's the worst kind of observability bug: the pipeline that's supposed to tell you things are broken is itself broken, and it's not telling you.

Background: The Gateway Layer and Why It Exists

In a multi-tenant OTLP deployment, you don't expose the ingestion service directly. The gateway handles CORS (browser clients need it), routes /v1/traces, /v1/metrics, /v1/logs, and /v1/profiles to the right backend, and separates anonymous ingest from credentialed API endpoints like dashboards or the SignalR hub used for live tail. The ingestion service sees only internal traffic; it doesn't need to think about CORS.

This separation is clean architecturally — but it creates two distinct boundary layers where bad assumptions can silently swallow traffic.

Bug #1: The CORS Preflight That Blocked Browser Clients

Symptom

Browser clients on a different origin (https://app.example.com) sent zero traces to the gateway. Network tab showed a 204 No Content response to the OPTIONS preflight — which looks like success — but no Access-Control-Allow-Origin header was present. The browser treated it as a denied preflight and never sent the actual POST /v1/logs.

Root Cause

The gateway had a blanket CORS policy configured like this:

builder.Services.AddCors(options =>
{
    options.AddDefaultPolicy(policy =>
    {
        policy
            .AllowAnyOrigin()
            .AllowAnyMethod()
            .AllowAnyHeader()
            .AllowCredentials(); // <-- the problem
    });
});

The CORS specification explicitly prohibits returning Access-Control-Allow-Origin: * when Access-Control-Allow-Credentials: true is set. ASP.NET Core enforces this: when it detects the combination of AllowAnyOrigin() and AllowCredentials(), it suppresses the Access-Control-Allow-Origin header entirely rather than producing an invalid response. The preflight returns 204 with no CORS headers — which is indistinguishable from a network success to anything that isn't a browser.

The policy was originally written for the SignalR hub, which does need credentials. It was copy-pasted to the OTLP gateway routes without realising the credentialed wildcard trap.

OTLP /v1/logs posts Content-Type: application/x-protobuf — a non-simple content type — so a preflight is always triggered from browser clients. There's no way around it.

The Fix

The fix is a small piece of terminal middleware that intercepts OPTIONS requests on /v1/* before the CORS middleware runs, and responds directly with the correct anonymous CORS headers. Browser-sourced OTLP doesn't need credentials; it just needs Allow-Origin: *.

// In Program.cs, registered BEFORE app.UseCors() and app.UseAuthentication()
app.Use(async (context, next) =>
{
    if (context.Request.Method == HttpMethods.Options
        && context.Request.Path.StartsWithSegments("/v1"))
    {
        var response = context.Response;
 
        // Only stamp the header if the ingestion service hasn't already set one
        // (protects diagnosability of 5xx responses that pass through)
        if (!response.Headers.ContainsKey("Access-Control-Allow-Origin"))
        {
            response.Headers["Access-Control-Allow-Origin"] = "*";
        }
 
        var requestedHeaders = context.Request.Headers["Access-Control-Request-Headers"].ToString();
        if (!string.IsNullOrEmpty(requestedHeaders))
        {
            response.Headers["Access-Control-Allow-Headers"] = requestedHeaders;
        }
 
        response.Headers["Access-Control-Allow-Methods"] = "POST, OPTIONS";
        response.Headers["Access-Control-Max-Age"] = "86400"; // 24 h — browser caches the preflight
        response.StatusCode = StatusCodes.Status204NoContent;
        return;
    }
 
    await next(context);
});
 
// The credentialed policy stays on /api and /hubs only
app.UseCors("CredentialedPolicy"); // SignalR, dashboard API

The /api and /hubs routes keep their named credentialed CORS policy with explicit origins. The OTLP routes never touch that policy. Two different auth boundaries, two different policies — that's the actual lesson.

Bug #2: The NUL Byte That Killed an Entire Batch

Symptom

Intermittent HTTP 500 responses from the ingestion service, always in bursts. Checking Postgres logs revealed ERROR: invalid byte sequence for encoding "UTF8": 0x00 — SQLSTATE 22021. A native application was serializing log messages that occasionally contained a NUL byte (0x00) in the body, likely from a C-style string boundary in the serialization layer.

Because the ingestion service performs a bulk INSERT for the whole batch, one poisoned record caused the entire statement to be rolled back. Standard SQL transaction semantics: no partial success. From the upstream client's perspective it received a 500 and, depending on retry configuration, either dropped the batch or retried the whole thing — including the bad record — in a loop.

Root Cause

PostgreSQL's text type is UTF-8 internally and cannot store the NUL byte (\u0000), even though NUL is a valid Unicode code point and is legal in other encodings. Every other control character — U+0001 through U+001F — stores without complaint. It is only 0x00 that PostgreSQL refuses. One bad record, however, poisons the whole batch; there is no per-row partial success.

The affected fields in an OTLP log record are wider than you'd expect: log body, attribute keys, attribute values, nested kvlist keys and values, and severityText. All of them can carry arbitrary user-provided strings.

The Fix

A focused sanitiser that strips only NUL bytes, applied at the ingestion boundary before the bulk insert is constructed:

public static class OtlpSanitizer
{
    /// <summary>
    /// Strips NUL (0x00) from a string. PostgreSQL text columns reject NUL;
    /// all other Unicode control characters are stored without error.
    /// Returns the original reference if no NUL is present (zero allocation).
    /// </summary>
    public static string StripNul(string? value)
    {
        if (value is null) return string.Empty;
        if (!value.Contains('\0')) return value; // fast path — no allocation
        return value.Replace("\0", string.Empty);
    }
 
    /// <summary>
    /// Applies StripNul to all string fields in a decoded OTLP log record
    /// before it is handed to the bulk-insert command builder.
    /// </summary>
    public static void SanitizeLogRecord(LogRecord record)
    {
        record.Body = StripNul(record.Body);
        record.SeverityText = StripNul(record.SeverityText);
 
        foreach (var attr in record.Attributes)
        {
            attr.Key = StripNul(attr.Key);
            if (attr.Value?.StringValue is not null)
                attr.Value.StringValue = StripNul(attr.Value.StringValue);
 
            // Recurse into kvlist values
            if (attr.Value?.KvlistValue is not null)
            {
                foreach (var kv in attr.Value.KvlistValue.Values)
                {
                    kv.Key = StripNul(kv.Key);
                    if (kv.Value?.StringValue is not null)
                        kv.Value.StringValue = StripNul(kv.Value.StringValue);
                }
            }
        }
    }
}

The Contains('\0') fast path means the vast majority of records — those without NUL — incur no allocation whatsoever. Only poisoned records pay the Replace cost.

SanitizeLogRecord is called once per record immediately after protobuf deserialization, before the batch command is built. The sanitized batch then inserts cleanly; if a caller's data had a NUL, they see a 200 and their record is stored with the NUL stripped, not dropped entirely.

Verifying the Fixes

Both fixes are surgical: two files changed, zero dependency additions, clean build.

Repro for Bug #1 — CORS preflight:

curl -i -X OPTIONS https://gateway.example.com/v1/logs \
  -H "Origin: https://app.example.com" \
  -H "Access-Control-Request-Method: POST" \
  -H "Access-Control-Request-Headers: Content-Type"

Before the fix: 204 No Content, no Access-Control-Allow-Origin header. After: 204 No Content with Access-Control-Allow-Origin: *, Access-Control-Allow-Methods: POST, OPTIONS, Access-Control-Max-Age: 86400.

Repro for Bug #2 — NUL byte batch:

Construct a minimal OTLP log export request in Protobuf (or use a test client) with one log record whose body is "hello\x00world". Before the fix: HTTP 500, entire batch dropped. After: HTTP 200, record stored as "helloworld" — NUL stripped, rest of batch intact.

Both can be exercised in integration tests against a local Postgres instance without any mocking.

Takeaways

These two bugs share an underlying pattern. Both were invisible during normal operation. Both were triggered by client types that sat just outside the original design assumptions — a browser SPA and a native app with C-style strings. And both were exploitable only because a boundary layer (the gateway, the ingestion service) wasn't doing defensive handling.

The CORS bug is a policy-layering trap: credentialed CORS and wildcard origins are mutually exclusive by specification. If your gateway serves both browser-facing OTLP and credentialed API endpoints, those must be separate named policies applied to separate route groups. Cargo-culting a single default policy will silently kill one of them.

The NUL bug is a data-hygiene trap: PostgreSQL's text type has one additional constraint beyond UTF-8 validity, and it's not one you'll hit in typical application traffic. When you ingest telemetry from diverse, loosely controlled clients — native apps, embedded systems, anything that serializes raw memory regions — you will eventually see it. The fix is cheap; the cost of not having it is a silent full-batch drop every time it triggers.

If you're running an OTLP gateway in .NET, check two things right now: fire an OPTIONS /v1/logs from a cross-origin curl and confirm you get Access-Control-Allow-Origin: * back; and send a log record with a NUL in the body and confirm you get a 200, not a 500. If either of those fails, you have the same bugs we did.

Sources

June 11, 2026 · 6 min

Vertical Slice Architecture in ASP.NET Core: Features as First-Class Citizens

Vertical Slice Architecture flips the organizational instinct of layered systems — instead of grouping code by technical concern, you group it by use case. Here's how to structure, share, test, and migrate to slices in a real ASP.NET Core codebase.

Read

July 16, 2026 · 6 min

Graph-Native Data Structures in C#, Part 4 — Graphs & Adjacency with a Social Follow Network

Part 4 of the series models a directed social follow network in NebulaGraph, then surfaces neighbour queries, mutual connections, friend-of-friend suggestions, and super-node protection as typed C# async methods.

Read

July 13, 2026 · 7 min

Graph-Native Data Structures in C#, Part 3 — Linked Lists & Sequences as Edges

Model append-only event streams and version chains as linked-list graphs in NebulaGraph, then walk, mutate, and guard them safely from C# using idempotent VIDs and native nGQL GO traversals.

Read

The Setup and the Silence

Background: The Gateway Layer and Why It Exists

Bug #1: The CORS Preflight That Blocked Browser Clients

Symptom

Root Cause

The Fix

Bug #2: The NUL Byte That Killed an Entire Batch

Symptom

Root Cause

The Fix

Verifying the Fixes

Takeaways

Sources

Keep reading

Vertical Slice Architecture in ASP.NET Core: Features as First-Class Citizens

Graph-Native Data Structures in C#, Part 4 — Graphs & Adjacency with a Social Follow Network

Graph-Native Data Structures in C#, Part 3 — Linked Lists & Sequences as Edges