LOG MONITORING · ANOMALY DETECTION · ROOT CAUSE

What broke.What changed.What's new.What went silent.

The root cause. In English.

Other log tools store your data and wait for you to ask the right question. Epok watches your logs and tells you when something is wrong — what broke, why, and which customers are affected.

Every alert tells you "3 Enterprise · 12 Pro · 47 Free affected" — so on-call decides "wake up now" vs. "wait till morning" from the notification. Every AI claim cited to the log line that produced it.

Start 14-day trial · no card See it on a live demo →

Replaces the detection + alerting layer of Datadog / Splunk / CloudWatch. $500/mo flat, 14 days free, no card.

Live demo data · open in tab ↗

Epok investigation panel — What Changed showing deploy correlation + 5 signals, AI Probable Cause with cited evidence, Blast Radius across 4 services, Cascade Timeline

AI root cause · citedBlast radius · worsening

Synthetic data, real detectors. Try it →

Detectors available

Statistical + nine domain rule packs · every tier

94%

Pageable precision

Loghub HDFS replay, 2M lines · reproducible

118 ms

LiveTail p95

Ingest to render · p99 124 ms · SLO 500 ms

See full benchmark methodology →

DETECTORS

The six that catch most incidents. Every detector runs on every tier — starting with the 14-day trial.

Statistical detection ships on every tier, including the trial. AI root cause analysis included on every tier — capped on Team, larger budget on Growth.

epok.detectors6 of 20 shown · selected for frequencysee all 20 →

new_error

New Error Detection

Catches errors that have never appeared in your 7‑day baseline.

payment-service: "FATAL: connection pool exhausted" — first seen in 7d

COLD‑START · baselines from day 7 · rule pack for known signatures

silence

Silence Detection

Catches services that stop logging when they normally log every N seconds. The most dangerous failure mode: no errors, just absence.

worker-billing went silent — last log 6m ago (normally every 30s)

COLD‑START · baselines from day 7

volume_anomaly

Volume Anomaly

Detects spikes, drops, and flatlines in log volume vs daily and weekly baselines per service.

api: 12,400 lines/min vs 3,200 baseline (× 3.9, p99: 4.2σ)

COLD‑START · baselines from day 7 · backfilled at connect

pattern_cluster

Pattern Clustering

Groups errors with similar templates so many variants of the same problem cluster into one alert.

pat_db_pool grew 12× — 84 fingerprints folded into 1 alert

COLD‑START · active immediately

kubernetes

Kubernetes Detection

70+ rules for OOMKilled, CrashLoopBackOff, ImagePullBackOff, FailedScheduling, and more.

billing-7c4b OOMKilled (3rd restart in 4m)

COLD‑START · rule pack · fires from minute one

dependency

Dependency Detection

Upstream service failures, circuit breaker trips, retry exhaustion, and cascading failures between services.

3 services blame postgres-primary (api, worker, ingest) — cascade in 8s

COLD‑START · rule pack · fires from minute one

See all 20 detectors →+ 14 more across error intelligence, AWS, serverless, web, security, search, infrastructure, SLO + custom

LIVE DEMO

See it on data. No signup.

A 5-service example app generates a continuous synthetic log stream into a public Epok tenant. Anomaly detection, root cause analysis, and pattern clustering all run on it live — what you see is Epok working on real-shape data, not a marketing video.

app.getepok.dev/demolive tenant · click to openread‑only · no account needed

Epok overview dashboard — live demo tenant with incidents, alerts, and service health

5 services

Synthetic but real-shape traffic

Example incidents

Firing every few minutes

Live dashboards

Overview, detectors, alerts

Read-only

You can't break it · no account needed

Open live demo →Start trial instead

The demo runs the same product as the trial. Same UI, same detectors.

THE INTELLIGENCE LAYER

Detection is table stakes. What happens next is the product.

Datadog and Splunk catch signals too. Epok closes the loop — customer impact, plain-English search, evidence-cited postmortems, and your own runbooks matched into every alert.

customer_impact

Customer impact, on every alert

Paste a roster (id, name, tier). Every alert scans the incident window for customer_id and joins to the roster. The notification body carries the rollup — your VP of CS reads the same alert as your on-call.

Affected: 3 Enterprise · 12 Pro · 47 Free

First-in-market — no major vendor ships this today

ask_epok

Plain English → LogsQL

Type "why is checkout slow in the last hour." Claude translates it to a LogsQL query, a 42-test safety validator forces time + limit and enforces a pipe allowlist, then we run it. Query is ground truth; the AI explanation sits beside it.

query + results + explanation, side by side

p95 2.3s · 100% syntactic rate on the 8-question eval

postmortem

Postmortem draft, the moment it resolves

When the incident closes, a draft appears. It pulls the triggering signal, the cited evidence chain, the matched playbook, and the customer-impact rollup into one editable document. You edit; you don't author from a blank page.

trigger + evidence + playbook + impact, pre-assembled

Shipped cycle 17 · evidence is cited, not invented

playbook_match

Your runbooks, matched — not authored

Bulk-import from Confluence, Notion, GitHub, or Markdown. A citation engine matches the right runbook by affected service, symptom, and incident history. The specific steps land inside Slack, email, PagerDuty, and the deep RCA — not a link to a wiki.

payment-pool exhaustion → 3 matched runbook steps inlined in PagerDuty

Citation engine validates every step · 24 bundled seed playbooks

WORKFLOWS

When you'd actually use it.

Two moments that matter most — the deploy check and the incident response.

01AFTER EVERY DEPLOY

"Did my deploy break anything?"

You deployed ten minutes ago. Open Epok. If the New Issues feed is empty, you're good. If it's not, you know exactly what broke and when.

New issues feed · Every error, warning, or fatal your system has never thrown before, surfaced within minutes. Grouped by meaning, so you see one entry per root cause instead of fifty variants of the same failure.

Deploy correlation · "Appeared 4 min after deploy v2.4.1." Epok connects new errors to the deploy that caused them.

Pattern trends · Each error pattern gets a sparkline. Growing? Stable? One-off? You see the trajectory without writing a single query.

02DURING AN INCIDENT

"Where is the fire?"

PagerDuty is screaming. You need to know what broke, when it started, and which services are affected. You don't need to write fifteen queries to piece it together first.

Root cause analysis · Classifies errors by failure type (timeout, OOM, auth, config, connection, runtime crash) and traces causality across services. "3 services blame database-primary, and database-primary has OOM errors."

What changed · Compares the incident window against your baseline: volume shifts, new errors, recent deploys, service changes. One view, no diffing.

Blast radius · Which services. How many users. Which endpoints. Full impact scope in seconds.

Cascade timeline · "Database went silent → API got connection refused 3s later → Frontend 502s 5s after that." Origin identified automatically.

AI incident summary · Every Slack and PagerDuty notification includes what happened, probable root cause, and what to check first. On-call knows where to start before opening a laptop.

BENCHMARKS

94% precision on 2M HDFS lines. Reproducible.

Public datasets, published setup, full methodology on the dedicated page — not on a marketing tile.

Loghub HDFS v1

2M-line public replay

Detectors run against the canonical public log benchmark. Event rate, methodology, and seeded incidents are documented on the benchmarks page.

detectors · new_error · log_rate · error_rate · dependency_intelligence

Methodology + numbers →

OpenRCA Market

Microservices RCA suite

Cascade-failure and network-fault cases on a real e-commerce topology. Cold-start tenants. Detection latency and recall measured per fault class.

detectors · silence · new_error · golden_signals · dependency_intelligence

Methodology + numbers →

Rule-pack regression

Labeled corpus · CI-gated

Six rule packs against a labeled corpus. Tests run on every change. A regression fails CI before merge.

detectors · security · web · database · dependency · search · infrastructure

Methodology + numbers →

All benchmarks + reproduce instructions →

HOW IT WORKS

Five minutes from curl to root cause.

01STEP

Point your logs at Epok

Five minutes from a curl command to first lines.

Add a URL and an API key to whatever ships your logs. FluentBit, Vector, Promtail, a curl script. Anything that speaks HTTP works. Already on another tool? Dual-ship your logs during the trial — same shipper config, one extra output.

›Loki · OTLP · Elasticsearch bulk · syslog · FluentBit · Fluentd · CloudWatch · JSON

›No agents to install · no SDKs to add · run alongside your current tool

›Searchable within seconds of POST

02STEP

Alerts on the first line, baselines on the first week

Nine rule packs active immediately.

Nine domain rule packs — Kubernetes, AWS, serverless, database, dependency, web, security, search, infrastructure — alert on the first matching line. Statistical detectors learn your seasonal normal over seven days. A seven-day historical backfill runs at connect so the first hour isn't a silent one.

›Rule packs active on first matching line

›Statistical detectors trained from a 7‑day backfill

›Full seasonal baselines by day seven

03STEP

One incident, one page, with the root cause attached

Notifications carry the answer, not just the alert.

Slack, PagerDuty, email, or webhook fires when something breaks. Every alert includes root cause context — what happened, what caused it, what to check first. Resolve notifications tell you when it's fixed.

›Slack · PagerDuty · webhook · email

›AI root cause in notifications (paid tiers)

›Incidents group related alerts — one page, not five

PRICING

Predictable pricing. Built for production teams.

1.5 TB/month logs on Datadog: $3,750–6,000/mo. On Epok Team: $500/mo flat. 14-day trial to prove it on your own logs — no card.

Predictable, not punishing

Flat monthly pricing for included volume. No per-host fees. No overage surprises.

Trial that proves the product

Fourteen days with every feature unlocked. No credit card. Convert when it's solved a real incident.

Designed for cost, not capture

No log filtering required to control cost. No cardinality upcharges. No premium-retention tier.

Trial

$014 days

Up to 1.5 TB

Full retention

No credit card

Start trial →

Team

$500/mo flat

or $5,400/yr — save $600

1.5 TB/month

30‑day retention

10 users · AI RCA included

Start Team →

Growth

$1,800/mo flat

or $19,440/yr — save $2,160

5 TB/month

30‑day retention

Unlimited users · SSO · priority support

Start Growth →

Enterprise — PrivateLink, SAML, custom retention, dedicated support, SLA. From $5,000/mo. Talk to us →

HOW WE COMPARE

Other tools meter ingest, indexing, queries, hosts, and users separately. Epok charges one number per TB and ships the detection on top.

Datadogsee the breakdown →Splunksee the breakdown →Grafana / Lokisee the breakdown →Elasticsee the breakdown →

NO CREDIT CARD · 30 SECONDS

Point your logs at Epok.
See the answer.

One HTTP endpoint. One API key. Five minutes to root cause alerts in Slack.

$ curl -X POST https://in.epok.dev/v1/logs \
    -H "Authorization: Bearer $EPOK_KEY" \
    -d '{"service":"api","level":"info","msg":"hello"}'

# {"ok":true}

Start trial Open live demo

The root cause. In English.

The six that catch most incidents. Every detector runs on every tier — starting with the 14-day trial.

New Error Detection

Silence Detection

Volume Anomaly

Pattern Clustering

Kubernetes Detection

Dependency Detection

See it on data. No signup.

Detection is table stakes. What happens next is the product.

Customer impact, on every alert

Plain English → LogsQL

Postmortem draft, the moment it resolves

Your runbooks, matched — not authored

When you'd actually use it.

"Did my deploy break anything?"

"Where is the fire?"

94% precision on 2M HDFS lines. Reproducible.

Five minutes from curl to root cause.

Point your logs at Epok

Alerts on the first line, baselines on the first week

One incident, one page, with the root cause attached

Predictable pricing. Built for production teams.

Predictable, not punishing

Trial that proves the product

Designed for cost, not capture

Other tools meter ingest, indexing, queries, hosts, and users separately. Epok charges one number per TB and ships the detection on top.

Point your logs at Epok. See the answer.

Point your logs at Epok.
See the answer.