auditllmobservability

TypeScript Strategies for Auditability: Logging and Provenance for LLM Outputs

ttypescript

2026-02-11

10 min read

Build auditable TypeScript systems that log prompts, model versions, and user interactions with typed provenance metadata for LLMs (Gemini/Claude).

Hook: When LLM outputs become evidence, TypeScript must deliver auditable provenance

You ship features powered by Gemini or Claude, but regulators, customers, and internal compliance teams ask: which prompt produced that answer, which model version, and who saw it? In 2026, with major vendors partnering and models deployed across clouds and endpoints, auditability is no longer optional. This guide shows how to build auditable TypeScript systems that log prompts, model versions, and user interactions with typed provenance metadata — from developer workflows and tsconfig guards to runtime logging, observability, and CI policies.

Why auditability for LLMs matters in 2026

The AI ecosystem changed fast between 2024 and 2026. Apple’s move to use Google's Gemini in system assistants, and Anthropic’s desktop "Cowork" preview, are examples of vendor collaborations and endpoint proliferation that complicate provenance. For teams, this means:

Models can be proxied, fine-tuned, or wrapped — tracking the exact model & provider matters.
LLM outputs are being used for decisioning and must be traceable for compliance and incident response.
Privacy and data minimization demand typed schemas that support redaction and hashing.

What this article covers

Designing typed provenance metadata in TypeScript
Safe prompt logging patterns (redaction, hashing, and HMAC)
Integrating with observability (OpenTelemetry, structured logs) and SIEM
Build-time and CI strategies (tsconfig, ESLint, tests) that make audits reproducible
Advanced strategies: tamper-evidence (signatures, Merkle trees) and data retention

1) Design a minimal, typed provenance model

Start with a single source of truth: a TypeScript interface that every service uses to create audit events. Keep it small and extendable. Use runtime validation with zod or io-ts to avoid schema drift between front-end, backend, and ingest pipelines.

// Provenance.ts
import { z } from "zod";

export const ModelInfoSchema = z.object({
  provider: z.string(),            // "google", "anthropic", "openai"
  model: z.string(),               // "gemini-pro", "claude-2.1"
  modelVersion: z.string().optional(),
  containerId: z.string().optional(), // if running in a managed container
});

export type ModelInfo = z.infer;

export const ProvenanceSchema = z.object({
  eventId: z.string(),             // UUID
  timestamp: z.string(),           // ISO timestamp
  userId: z.string().optional(),
  sessionId: z.string().optional(),
  requestId: z.string().optional(),

  // The prompt should be audited carefully — see redaction & hashing below
  promptHash: z.string(),
  promptRedacted: z.string().optional(),

  model: ModelInfoSchema,
  apiVersion: z.string().optional(), // provider API version
  toolchain: z.object({             // SDK, client library metadata
    name: z.string(),
    version: z.string().optional(),
  }).optional(),

  // Response metadata
  responseHash: z.string().optional(),
  responseSize: z.number().optional(),

  // Chain-of-custody fields
  createdBy: z.string(),            // service/component
  signature: z.string().optional(), // optional digital signature
});

export type Provenance = z.infer;

Strong typing prevents accidental schema changes. The schema includes both hashed content and an optional redacted stored prompt to support compliance and debugging while protecting PII.

2) Log prompts with privacy in mind: redaction, hashing, and HMAC

Raw prompts can contain PII. For auditability, capture enough to reproduce and explain outputs while complying with privacy laws. Use three complementary artifacts:

Prompt hash — deterministic hash (SHA-256) of the canonical prompt text.
Redacted prompt — a version with detected PII removed or replaced.
HMAC — keyed-hash to later prove the prompt existed without revealing it (privacy-preserving verification).

import crypto from "crypto";

export function sha256Hex(input: string): string {
  return crypto.createHash("sha256").update(input, "utf8").digest("hex");
}

export function hmacHex(key: string, input: string): string {
  return crypto.createHmac("sha256", key).update(input, "utf8").digest("hex");
}

Store the hash and HMAC in your provenance record. Keep HMAC keys in a KMS (AWS KMS, GCP KMS) and rotate them regularly. That lets auditors verify ownership without exposing raw prompts.

3) Use structured JSON logs and OpenTelemetry traces

Throwing strings into a log file won’t cut it. Use structured JSON logs, enriched traces, and consistent field names so downstream systems (ELK, Datadog, Splunk, Honeycomb) can index and alert on provenance events.

import pino from "pino";

const logger = pino({
  level: process.env.LOG_LEVEL || "info",
  base: { service: "llm-service" },
  timestamp: pino.stdTimeFunctions.isoTime,
});

export function logProvenance(p: Provenance) {
  logger.info({ provenance: p }, "llm.provenance");
}

Combine logs with OpenTelemetry spans to record latency and model request/response lifecycle. Use semantic attribute names like llm.model, llm.prompt.hash, and llm.response.size.

4) Model versioning: track provider, model, and release metadata

Vendor ecosystems are mixing models: Apple using Gemini, Anthropic's desktop agents, and hosted fine-tunes. Your provenance model must capture:

provider: google, anthropic, openai, etc.
model: gemini-pro, claude-2.1, etc.
modelVersion: semantic or provider-specific revision
apiVersion: provider API release (important when behavior changes)

Include these fields in every call to the model client. If you proxy multiple providers, also log the proxy version so you can reconstruct behavior later. If you need to run spot or local experiments (or a local LLM lab), ensure the same provenance schema is produced by those endpoints.

5) Architect an append-only event store for raw provenance

For legal audits, append-only stores (immutable event logs) are far superior to ad-hoc DB updates. Options:

Cloud object stores (S3 with object lock) and index metadata in a search engine
Event stores (EventStoreDB, Kafka + compacted topics)
Write-once tables in your data warehouse (BigQuery, Snowflake) with partition-based retention

Always persist the provenance JSON as-is and keep an indexable, smaller row for fast querying. Make the raw blob immutable and sign it with the service key for tamper-evidence. Consider integrating your append-only store with your document lifecycle tooling (compare document lifecycle approaches) so retention and access controls are consistent across records.

6) Tamper-evidence: signatures and Merkle roots

For high-assurance systems, add digital signatures or Merkle trees over batches of provenance events. Signatures let auditors verify the log was produced by your service and not altered. A Merkle root is useful if you need compact proofs for many events.

import { sign } from "crypto";

// Pseudocode: sign serialized provenance with a service private key
function signProvenance(serialized: string, privateKeyPem: string): string {
  const signer = crypto.createSign("RSA-SHA256");
  signer.update(serialized);
  return signer.sign(privateKeyPem, "base64");
}

Store signatures alongside each event or periodically publish Merkle roots to an external ledger (or a public S3 object) so auditors can verify immutability. If you need on-chain proofs or compact reconciliation, consider systems that support compact on-chain reconciliation for Merkle roots and receipts.

7) Developer tooling & CI: tsconfig, linters, and pipeline checks

You need build-time guarantees that your provenance schema won’t accidentally change. Use these practical rules:

tsconfig: strict by default

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ESNext",
    "strict": true,
    "noImplicitAny": true,
    "exactOptionalPropertyTypes": true,
    "forceConsistentCasingInFileNames": true,
    "skipLibCheck": true,
    "declaration": true
  }
}

exactOptionalPropertyTypes and strict prevent accidental optional fields that break auditors' expectations.

ESLint + typescript-eslint

Enforce runtime schema validation and prohibit any unchecked casts:

module.exports = {
  parser: "@typescript-eslint/parser",
  plugins: ["@typescript-eslint"],
  extends: [
    "eslint:recommended",
    "plugin:@typescript-eslint/recommended",
    "plugin:import/errors",
    "plugin:import/warnings"
  ],
  rules: {
    "@typescript-eslint/no-explicit-any": "error",
    "@typescript-eslint/consistent-type-imports": "error",
  }
};

For secure development defaults and library hardening, follow general security best practices around dependency updates and build pipelines.

CI checks

In CI (GitHub Actions / GitLab CI) run these steps before merging:

Type check: tsc --noEmit
Lint: eslint
Schema tests: run unit tests that validate serialization with zod
Integration smoke test against a test model (recording a few provenance events)

# .github/workflows/ci.yml (excerpt)
- name: Type check
  run: npm run build --if-present && tsc --noEmit

- name: Lint
  run: npm run lint

- name: Run provenance contract tests
  run: npm run test:provenance

8) Test strategies: contract tests, property-based tests, and replay

Tests are your first line of defense. Focus on:

Contract tests that validate the ProvenanceSchema for every event generated by your service
Property-based tests (fast-check) to ensure hashing and redaction are stable across inputs
Replay tests that deserialize stored provenance events and re-run verification (HMAC, signatures)

import { describe, it, expect } from "vitest";
import { ProvenanceSchema } from "./Provenance";

it("produces valid provenance", () => {
  const p = makeSampleProvenance();
  const parse = ProvenanceSchema.safeParse(p);
  expect(parse.success).toBe(true);
});

9) Observability & alerting: what to watch for

Configure alerts around provenance integrity, not just latency or errors. Examples:

Missing model metadata in provenance events — indicates a bug in instrumentation
Mismatch between promptHash and expected HMAC verification failures — potential tampering
Spike in redacted prompts or PII detections — may indicate data ingestion issues or a UX regression

Use queryable dashboards for counts by model & provider, and make it easy for compliance teams to export event subsets for audits. For teams working at the edge or streaming events into analytics, consider the edge signals & personalization approach to routing provenance events into low-latency analytics.

10) Compliance & retention: policies you can automate

Work with legal and privacy to define retention windows for raw prompts vs. hashed records. Common pattern:

Raw, redacted prompts: 30–90 days (subject to policy and region)
Hashed + HMAC + signatures: retained longer (1–7 years) for auditability
Access controls: role-based gates and audited exports

Automate retention with lifecycle rules (S3 Object Lock + lifecycle), and ensure deletion is auditable. For regulated industries, consider holding hashed evidence out-of-band (separate project/account) to minimize blast radius.

Advanced strategies: federated provenance and cross-provider traceability

With vendor partnerships like Apple-Gemini and Anthropic’s desktop agents, provenance may cross trust boundaries. Design for federated proofs:

Attach provider-signed metadata when providers expose signed call receipts
Use a shared minimal provenance vocabulary so events from different providers can be correlated
Publish periodic manifests of model versions used by your service, and snapshot them into your audit store

Real-world checklist (practical steps you can implement this week)

Define a ProvenanceSchema (zod) and export types used by every service.
Instrument every LLM call to produce a provenance event with model metadata, promptHash, and createdBy.
Implement prompt redaction and HMAC using KMS; store keys centrally and rotate them.
Switch to structured JSON logs and add OpenTelemetry spans for model calls.
Add a CI contract test that fails the build if the ProvenanceSchema changes unexpectedly.
Persist raw events in an append-only store and index minimal rows for quick queries.
Configure alerts for missing metadata, HMAC verification failures, and spikes in PII redaction.

Case study: small payments startup

A mid-stage fintech in 2025 integrated Gemini via a third-party SDK. After a user dispute, the compliance team needed which prompt produced a charge reversal decision. The company implemented:

Typed provenance (zod) and structured logs
HMACed prompt hashes in S3 (with KMS) and short-lived redacted prompts for debugging
CI tests that validated schema compatibility on every PR

Outcome: 24-hour audit response, reproducible evidence, and a reduction in incident resolution time. The team also faced stricter regulator questions in 2026; the provenance system made compliance meetings far less painful.

Common pitfalls and how to avoid them

Logging raw prompts everywhere — mitigate with redaction and HMAC.
Relying only on client-side metadata — ensure server-side authoritative provenance is generated.
Ignoring schema evolution — avoid silent type changes by enforcing schema tests in CI.
Storing signatures without key management — use KMS and vault workflows and rotate keys; audit key access.

"Auditability starts with typed provenance and ends with disciplined pipelines." — engineering principle

Takeaways: a concise checklist for TypeScript teams (2026-ready)

Type everything: zod/io-ts + TS types are your foundation.
Log smart: structured JSON, promptHash + redacted prompt, model metadata.
Protect privacy: HMAC, KMS, and redaction before persistence.
Build defensively: strict tsconfig, ESLint rules, CI contract tests.
Verify integrity: signatures or Merkle roots for tamper-evidence.
Automate retention: lifecycle rules and auditable deletions.

Call to action

Start by adding a small ProvenanceSchema and one logging call to your LLM pipeline this week. Run a CI contract test to prevent accidental schema changes, and push your first signed, append-only event to an immutable store. If you'd like, scan your repository for all places that call LLMs and apply the provenance wrapper as a single pull request.

For hands-on adopters: fork a small example that implements the patterns in this article (typed schema, redaction, HMAC, signature) and run the sample pipeline in a sandbox. If you need guidance on secure key management or compliance automation, pair this with your security team — provenance is a cross-functional responsibility.

The 2026 reality is clear: vendors will keep partnering, and models will run everywhere. Make auditability a first-class citizen in your TypeScript codebase now — your future self (and auditors) will thank you.

typescript

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.