epok
LOG MONITORING · ANOMALY DETECTION · ROOT CAUSE
What broke.What changed.What's new.What went silent.

The root cause. In English.

Other log tools store your data and wait for you to ask the right question. Epok watches your logs and tells you when something is wrong — what broke, why, and which customers are affected.

Every alert tells you "3 Enterprise · 12 Pro · 47 Free affected" — so on-call decides "wake up now" vs. "wait till morning" from the notification. Every AI claim cited to the log line that produced it.

Replaces the detection + alerting layer of Datadog / Splunk / CloudWatch. $500/mo flat, 14 days free, no card.

Live demo data · open in tab ↗Epok investigation panel — What Changed showing deploy correlation + 5 signals, AI Probable Cause with cited evidence, Blast Radius across 4 services, Cascade TimelineAI root cause · citedBlast radius · worsening

Synthetic data, real detectors. Try it →

20
Detectors available

Statistical + nine domain rule packs · every tier

94%
Pageable precision

Loghub HDFS replay, 2M lines · reproducible

118 ms
LiveTail p95

Ingest to render · p99 124 ms · SLO 500 ms

See full benchmark methodology →
DETECTORS

The six that catch most incidents. Every detector runs on every tier — starting with the 14-day trial.

Statistical detection ships on every tier, including the trial. AI root cause analysis included on every tier — capped on Team, larger budget on Growth.

epok.detectors6 of 20 shown · selected for frequencysee all 20 →
new_error

New Error Detection

Catches errors that have never appeared in your 7‑day baseline.

payment-service: "FATAL: connection pool exhausted" — first seen in 7d
COLD‑START · baselines from day 7 · rule pack for known signatures
silence

Silence Detection

Catches services that stop logging when they normally log every N seconds. The most dangerous failure mode: no errors, just absence.

worker-billing went silent — last log 6m ago (normally every 30s)
COLD‑START · baselines from day 7
volume_anomaly

Volume Anomaly

Detects spikes, drops, and flatlines in log volume vs daily and weekly baselines per service.

api: 12,400 lines/min vs 3,200 baseline (× 3.9, p99: 4.2σ)
COLD‑START · baselines from day 7 · backfilled at connect
pattern_cluster

Pattern Clustering

Groups errors with similar templates so many variants of the same problem cluster into one alert.

pat_db_pool grew 12× — 84 fingerprints folded into 1 alert
COLD‑START · active immediately
kubernetes

Kubernetes Detection

70+ rules for OOMKilled, CrashLoopBackOff, ImagePullBackOff, FailedScheduling, and more.

billing-7c4b OOMKilled (3rd restart in 4m)
COLD‑START · rule pack · fires from minute one
dependency

Dependency Detection

Upstream service failures, circuit breaker trips, retry exhaustion, and cascading failures between services.

3 services blame postgres-primary (api, worker, ingest) — cascade in 8s
COLD‑START · rule pack · fires from minute one
See all 20 detectors →+ 14 more across error intelligence, AWS, serverless, web, security, search, infrastructure, SLO + custom
LIVE DEMO

See it on data. No signup.

A 5-service example app generates a continuous synthetic log stream into a public Epok tenant. Anomaly detection, root cause analysis, and pattern clustering all run on it live — what you see is Epok working on real-shape data, not a marketing video.

app.getepok.dev/demolive tenant · click to openread‑only · no account needed
Epok overview dashboard — live demo tenant with incidents, alerts, and service health
5 services
Synthetic but real-shape traffic
Example incidents
Firing every few minutes
Live dashboards
Overview, detectors, alerts
Read-only
You can't break it · no account needed
Open live demo →Start trial instead

The demo runs the same product as the trial. Same UI, same detectors.

THE INTELLIGENCE LAYER

Detection is table stakes. What happens next is the product.

Datadog and Splunk catch signals too. Epok closes the loop — customer impact, plain-English search, evidence-cited postmortems, and your own runbooks matched into every alert.

customer_impact

Customer impact, on every alert

Paste a roster (id, name, tier). Every alert scans the incident window for customer_id and joins to the roster. The notification body carries the rollup — your VP of CS reads the same alert as your on-call.

Affected: 3 Enterprise · 12 Pro · 47 Free
First-in-market — no major vendor ships this today
ask_epok

Plain English → LogsQL

Type "why is checkout slow in the last hour." Claude translates it to a LogsQL query, a 42-test safety validator forces time + limit and enforces a pipe allowlist, then we run it. Query is ground truth; the AI explanation sits beside it.

query + results + explanation, side by side
p95 2.3s · 100% syntactic rate on the 8-question eval
postmortem

Postmortem draft, the moment it resolves

When the incident closes, a draft appears. It pulls the triggering signal, the cited evidence chain, the matched playbook, and the customer-impact rollup into one editable document. You edit; you don't author from a blank page.

trigger + evidence + playbook + impact, pre-assembled
Shipped cycle 17 · evidence is cited, not invented
playbook_match

Your runbooks, matched — not authored

Bulk-import from Confluence, Notion, GitHub, or Markdown. A citation engine matches the right runbook by affected service, symptom, and incident history. The specific steps land inside Slack, email, PagerDuty, and the deep RCA — not a link to a wiki.

payment-pool exhaustion → 3 matched runbook steps inlined in PagerDuty
Citation engine validates every step · 24 bundled seed playbooks
WORKFLOWS

When you'd actually use it.

Two moments that matter most — the deploy check and the incident response.

01AFTER EVERY DEPLOY

"Did my deploy break anything?"

You deployed ten minutes ago. Open Epok. If the New Issues feed is empty, you're good. If it's not, you know exactly what broke and when.

New issues feed · Every error, warning, or fatal your system has never thrown before, surfaced within minutes. Grouped by meaning, so you see one entry per root cause instead of fifty variants of the same failure.
Deploy correlation · "Appeared 4 min after deploy v2.4.1." Epok connects new errors to the deploy that caused them.
Pattern trends · Each error pattern gets a sparkline. Growing? Stable? One-off? You see the trajectory without writing a single query.
02DURING AN INCIDENT

"Where is the fire?"

PagerDuty is screaming. You need to know what broke, when it started, and which services are affected. You don't need to write fifteen queries to piece it together first.

Root cause analysis · Classifies errors by failure type (timeout, OOM, auth, config, connection, runtime crash) and traces causality across services. "3 services blame database-primary, and database-primary has OOM errors."
What changed · Compares the incident window against your baseline: volume shifts, new errors, recent deploys, service changes. One view, no diffing.
Blast radius · Which services. How many users. Which endpoints. Full impact scope in seconds.
Cascade timeline · "Database went silent → API got connection refused 3s later → Frontend 502s 5s after that." Origin identified automatically.
AI incident summary · Every Slack and PagerDuty notification includes what happened, probable root cause, and what to check first. On-call knows where to start before opening a laptop.
HOW IT WORKS

Five minutes from curl to root cause.

01STEP

Point your logs at Epok

Five minutes from a curl command to first lines.

Add a URL and an API key to whatever ships your logs. FluentBit, Vector, Promtail, a curl script. Anything that speaks HTTP works. Already on another tool? Dual-ship your logs during the trial — same shipper config, one extra output.

Loki · OTLP · Elasticsearch bulk · syslog · FluentBit · Fluentd · CloudWatch · JSON
No agents to install · no SDKs to add · run alongside your current tool
Searchable within seconds of POST
02STEP

Alerts on the first line, baselines on the first week

Nine rule packs active immediately.

Nine domain rule packs — Kubernetes, AWS, serverless, database, dependency, web, security, search, infrastructure — alert on the first matching line. Statistical detectors learn your seasonal normal over seven days. A seven-day historical backfill runs at connect so the first hour isn't a silent one.

Rule packs active on first matching line
Statistical detectors trained from a 7‑day backfill
Full seasonal baselines by day seven
03STEP

One incident, one page, with the root cause attached

Notifications carry the answer, not just the alert.

Slack, PagerDuty, email, or webhook fires when something breaks. Every alert includes root cause context — what happened, what caused it, what to check first. Resolve notifications tell you when it's fixed.

Slack · PagerDuty · webhook · email
AI root cause in notifications (paid tiers)
Incidents group related alerts — one page, not five
PRICING

Predictable pricing. Built for production teams.

1.5 TB/month logs on Datadog: $3,750–6,000/mo. On Epok Team: $500/mo flat. 14-day trial to prove it on your own logs — no card.

Predictable, not punishing

Flat monthly pricing for included volume. No per-host fees. No overage surprises.

Trial that proves the product

Fourteen days with every feature unlocked. No credit card. Convert when it's solved a real incident.

Designed for cost, not capture

No log filtering required to control cost. No cardinality upcharges. No premium-retention tier.

Trial
$014 days
Up to 1.5 TB
Full retention
No credit card
Start trial →
Team
$500/mo flat
or $5,400/yr — save $600
1.5 TB/month
30‑day retention
10 users · AI RCA included
Start Team →
Growth
$1,800/mo flat
or $19,440/yr — save $2,160
5 TB/month
30‑day retention
Unlimited users · SSO · priority support
Start Growth →
Enterprise — PrivateLink, SAML, custom retention, dedicated support, SLA. From $5,000/mo. Talk to us →
HOW WE COMPARE

Other tools meter ingest, indexing, queries, hosts, and users separately. Epok charges one number per TB and ships the detection on top.

Datadogsee the breakdown →Splunksee the breakdown →Grafana / Lokisee the breakdown →Elasticsee the breakdown →
NO CREDIT CARD · 30 SECONDS

Point your logs at Epok.
See the answer.

One HTTP endpoint. One API key. Five minutes to root cause alerts in Slack.

$ curl -X POST https://in.epok.dev/v1/logs \
    -H "Authorization: Bearer $EPOK_KEY" \
    -d '{"service":"api","level":"info","msg":"hello"}'

# {"ok":true}
Start trialOpen live demo