Logs in. Answers out.
Every layer is inspectable.
Twenty detectors run on every log line. Each layer is independently observable — inspect candidates, suppression decisions, and the evidence behind any fire.
Anything that speaks HTTP.
Send via FluentBit, Vector, Promtail, the OpenTelemetry collector, syslog forwarders, CloudWatch subscription filters, or a curl script. No SDKs, no agents required. Typically searchable within seconds of POST.
Field extraction without a schema.
Severity, service, hostname, and trace IDs are extracted automatically. Custom fields are stored as-is — no schema to declare, no cardinality tax. JSON is parsed; raw text is kept searchable.
Template clustering.
Every line is hashed to a stable template — variable values (UUIDs, IPs, timestamps) are abstracted out, leaving the structural shape. The same error across a thousand unique strings collapses to one pattern fingerprint.
Per-service rolling distributions.
A rolling distribution per service per hour-of-week. Seasonal patterns are learned. A 3am spike on a quiet weekend service isn't compared to peak business hours on a noisy production API. Statistical detectors score deviations against this baseline.
Twenty detectors run in parallel.
Statistical detectors (new error, silence, volume anomaly, golden signals, outlier) consume baselines. Domain rule packs (Kubernetes, AWS, serverless, database, dependency, web, security, search, infrastructure) match deterministic patterns. Each detector emits independent candidate alerts with confidence scores.
Fingerprint dedup + ack-aware quieting.
Candidates that share a fingerprint are folded into a single alert with a fire count. Repeat-fires that don't get acknowledged learn to stay quieter — the pager isn't asked twice for the same thing. Acknowledged-and-not-resolved alerts suppress duplicates until they're closed.
Cascade graph across services.
When two or more detectors fire on related services within a time window, they're folded into one incident. The dependency graph between services is inferred from log content — "X called Y and got a timeout" is a directed edge. Cascades present as one page, not five.
Root-cause scoring + blast radius.
Each candidate cause gets a readable score — origin vs. victim classification, failure type (TIMEOUT / OOM / AUTH / CONFIG / CONNECTION / CRASH), and a recency-weighted evidence count. Blast radius (affected services, users, endpoints) and "what changed" (recent deploys, config diffs) are computed in parallel.
LLM writeup, cited evidence.
The final layer drafts a 2–3 sentence root-cause hypothesis from the diagnosis. Every claim links to the specific log lines that produced it. The output is what arrives in Slack and PagerDuty — what happened, probable cause, what to check first. AI included on every tier, including the trial.
Want to see all nine in action?
The live demo runs the full pipeline on a synthetic log stream. Click any alert to see the candidates, the suppressed dupes, the cited evidence — every layer leaves an audit trail.