Know what broke
before your users do.

Send your logs. Get root cause analysis, anomaly detection, and incident intelligence — automatically. No dashboards to build. No alert rules to write. No agents to install.

Free forever. No credit card. First alert in under 5 minutes.

Built for teams running 5–50 services on AWS, GCP, or Kubernetes

app.getepok.dev/alerts
3 firing · 1 resolved
user-service
compose-post-service
home-timeline-service
post-storage-service
nginx-thrift
social-graph-service
text-service
unique-id-service
media-service
user-service: went silentsilence

No logs from user-service since 14:31:47. Was emitting ~40/min before.

user-service+4 alerts
3mInvestigate
Pattern surge: TSocket::open() connect() Connection refusedpattern_cluster

Pattern rate grew from 0/min to 38.8/min (new). Template mentions user-service:9090.

compose-post-servicehome-timeline-service+2 alerts
3mInvestigate
compose-post-service: volume anomalylog_rate

Error volume 12.3x above hourly baseline. Driven by connection refused errors.

compose-post-service+1 alerts
4mInvestigate
home-timeline-redis: reconnectionnew_errorresolved

"LOADING Redis is loading the dataset in memory" — 6 occurrences, subsided.

home-timeline-redis
11m

Cascade failure detected and root cause identified. No rules configured.

5 MIN
SETUP TO FIRST ALERT
17
DETECTORS — ZERO CONFIG
~200MS
SEARCH LATENCY
< 60s
TIME TO DETECT
100%
OF TIERS GET FULL RCA

How It Works

01

Point your logs at Epok

Add a URL and an API key to whatever ships your logs. FluentBit, Vector, Promtail, a curl script. Anything that speaks HTTP works. Takes about five minutes.

  • · Loki, OTLP, Elasticsearch bulk, syslog, raw JSON
  • · No agents to install, no SDKs to add
  • · Logs appear in real time immediately
02

Epok figures out what normal looks like

New errors, silence, and high error rates are caught from minute one. Baselines build continuously — thresholds automatically tighten as Epok learns your patterns. Full seasonal detection by day seven.

  • · New errors + silence detection from minute one
  • · Error rate anomalies from minute one (population baselines)
  • · Volume anomaly detection within one hour
  • · Full seasonal baselines by day seven
03

Get alerted before your users notice

Slack, PagerDuty, email, or webhook fires when something breaks. Every alert includes the AI-generated root cause summary — what happened, what caused it, what to check first. Resolve notifications tell you when it's fixed.

  • · Slack / PagerDuty / webhook / email: fires and resolves automatically
  • · AI root cause summary in every notification
  • · Incidents group related alerts — one page, not five

What engineers actually need from logs.

Not more dashboards. Not more queries. Answers. Every feature works from the moment your first log arrives — nothing to configure.

app.getepok.dev/new-errors
Last 30 min · 4 patterns
CONNECTIONFirst seen: never before

TSocket::open() connect() <Host: user-service Port: 9090>: Connection refused

compose-post-servicehome-timeline-servicesocial-graph-service
843hits
CONNECTIONFirst seen: never before

Failed to read user timeline: TTransportException: open() failed: connect()

home-timeline-service
289hits
TIMEOUTFirst seen: never before

upstream request timeout: exceeded 5000ms waiting for response from user-service

nginx-thrift
67hits
RESOURCEResurfaced · last seen 3 days ago

LOADING Redis is loading the dataset in memory

home-timeline-redis
6hits
AFTER EVERY DEPLOY

"Did my deploy break anything?"

You deployed ten minutes ago. Open Epok. If the New Issues feed is empty, you're good. If it's not, you know exactly what broke and when.

NEW ISSUES FEED

Every error, warning, or fatal your system has never thrown before, surfaced the moment it appears. Grouped by meaning — one entry per root cause, not 50 variants of the same failure.

DEPLOY CORRELATION

"Appeared 4 min after deploy v2.4.1." Epok connects new errors to the deploy that caused them. Fastest path from "something broke" to "this commit broke it."

PATTERN TRENDS

Each error pattern gets a sparkline. Growing? Stable? One-off? You see the trajectory without writing a single query.

Alerts / Incident #8461
Resolve AllSnooze
PROBABLE CAUSE

user-service stopped emitting logs at 14:31:47. 3 downstream services started throwing connection refused errors within 6 seconds.

5 alerts · 4 services affected · confidence: high

Root Cause Ranking

#1user-serviceSILENCEORIGIN
92%
#2compose-post-serviceCONNECTIONDOWNSTREAM
61%
#3home-timeline-serviceTIMEOUTDOWNSTREAM
43%

Cascade Timeline

14:31:47user-service: last log received
14:31:50compose-post-service: TSocket::open() connect() <Host: user-service Port: 9090>: Conn...
14:31:51home-timeline-service: Failed to read timeline: TTransportException: open() failed
14:31:53compose-post-service: error volume 843/5min (baseline: 12/5min)
14:32:01social-graph-service: TSocket::open() connect() <Host: user-service Port: 9090>: Conn...
DURING AN INCIDENT

"Where is the fire?"

PagerDuty is screaming. You need to know what broke, when it started, and which services are affected. Not write 15 queries to piece it together.

ROOT CAUSE ANALYSIS

Classifies errors into 8 categories (timeout, OOM, auth, config, etc.) and traces causality across services. "3 services blame database-primary, and database-primary has OOM errors." Probable root cause with transparent scoring — you see why each candidate ranked where it did.

WHAT CHANGED

Compares the incident window against your baseline: volume shifts, new errors, recent deploys, service changes. One view, no diffing.

BLAST RADIUS

Which services are affected? How many users? Which endpoints? Full impact scope in seconds.

CASCADE TIMELINE

See exactly how a failure propagated across services. "Database went silent → API got connection refused 3s later → Frontend 502s 5s after that." Origin identified automatically.

AI: INCIDENT SUMMARY

Every Slack and PagerDuty notification includes an LLM-generated summary: what happened, probable root cause, and what to check first. On-call engineers know where to start before they open a laptop.

WHILE YOU SLEEP

"What happened overnight?"

You can't watch logs 24/7. Epok can. It learns what normal looks like and tells you when something isn't.

VOLUME ANOMALIES

Learns normal log volume per service, per hour, per day of week. When a spike hits at 3am, you get one alert with the messages driving it. Not a storm of threshold violations that teach you to mute your phone.

SILENCE DETECTION

A service that logs every 30 seconds goes quiet for 5 minutes. That's the most dangerous kind of failure -- no errors, just absence. Epok catches it.

DOMAIN-SPECIFIC DETECTORS

Kubernetes (70+ rules), database, web, security, serverless, dependency, infrastructure, and AWS. Built on statistical baselines, not static thresholds. Detects what matters in your stack — no rules to write.

THRESHOLD RULES + SLO MONITORING

Custom alert rules on any query with duration guard and cooldown. SLO error budget tracking with burn rate prediction — get warned before your SLO breaches, not after.

app.getepok.dev/explore
1,247 results
level:error OR level:warn | service:compose-post*
14:32:08.417errorcompose-postTSocket::open() connect() <Host: user-service Port: 9090>: Connection refused
14:32:08.003errorcompose-postFailed to compose post for user_id=481: TTransportException: open() failed
14:32:07.891warnnginx-thriftupstream timeout (5000ms) POST /api/v1/compose-post 504 client=10.42.0.1
14:32:07.244infounique-idgenerated id=7493820174839201 type=post
14:32:06.118errorhome-timelineFailed to read timeline for user 219: TSocket: Connection refused (user-service:9090)
14:32:05.972infotext-serviceprocessed text len=142 mentions=2 urls=0
WHEN YOU NEED TO DIG DEEPER

"Why did this break?"

You know something is wrong. Now you need to understand why. Epok traces the path from symptom to root cause — across services, automatically.

ROOT CAUSE RANKING

Ranked list of probable causes with category (TIMEOUT, OOM, CONFIG), origin vs. victim classification, and transparent scoring. Not a black box.

AI: ROOT CAUSE HYPOTHESIS

An LLM reads the full causal context and proposes a 2-3 sentence hypothesis explaining the chain of causation. Cites specific evidence from your logs, not generic advice.

INCIDENT SIMILARITY

"This looks like the incident 3 days ago." Matches new incidents against history. When a match is found, you see what fixed it last time.

ALWAYS IMPROVING

"Why is this alert so accurate?"

Epok learns from your feedback. Thresholds auto-adjust. Noisy alerts get suppressed. Changepoints are recognized — not false-alarm storms. The longer you use it, the sharper it gets.

SELF-TUNING THRESHOLDS

Marking alerts as 'not helpful' tightens thresholds. Consistent signals get reinforced. No manual tuning — sensitivity adjusts every 6 hours.

CHANGEPOINT DETECTION

When traffic permanently shifts after a deploy, Epok recognizes it as a new baseline — not a prolonged anomaly. No false-positive storms.

SEMANTIC ERROR CLUSTERING

"connection refused to database" and "cannot connect to postgres: ECONNREFUSED" are the same problem. Errors clustered by meaning, not exact text.

You've Tried These. Here's the Difference.

Other tools store your logs and wait for you to ask questions. Epok watches your logs, tells you what broke, and explains why — no existing observability stack required.

CloudWatch

Slow queries. No anomaly detection. Costs scale with every scan.

Epok →

~200ms search. Automatic detection. Flat monthly price.

Full breakdown

Datadog

Per-GB ingest + cardinality tax + per-host fees. Surprise bills are the #1 complaint.

Epok →

Flat pricing. No cardinality fees. Root cause analysis included at every tier.

Alternatives compared

New Relic

Full platform complexity when you just need log intelligence. Manual sensitivity tuning for anomaly detection.

Epok →

Every detector self-tunes. No per-host fees. Same intelligence, fraction of the cost.

Full comparison

Grafana + Loki

You build everything: dashboards, alert rules, PromQL, recording rules. A full-time job that decays.

Epok →

Detection from day one — nothing to build. Self-tuning thresholds that improve from your feedback.

Why dashboards aren't enough

ELK / OpenSearch

JVM tuning, shard management, index lifecycle policies. You need an ops team for your logging stack.

Epok →

Send logs via the Elasticsearch bulk API you already use. Get intelligence back automatically.

Your logs are your data.

We know you are sending production data. Here is how we handle it.

Encrypted in Transit

TLS 1.3 for all connections. All data transmitted over encrypted channels.

GDPR Compliant

Built for data privacy from day one. Your logs are stored securely and handled in full compliance with GDPR.

Automatic Data Deletion

Logs are permanently deleted after your retention period. No archives, no surprises.

API Key Isolation

Each tenant is fully isolated. API keys scope access to ingest, read, or both.

Stop grep-ing logs at 3am.
Let Epok find the root cause.

Five minutes to connect. Every detector runs from minute one. Your first Slack alert arrives before you finish reading the docs.

No credit card. No sales call. Setup takes 5 minutes.