Know what broke
before your users do.
Send your logs. Get root cause analysis, anomaly detection, and incident intelligence — automatically. No dashboards to build. No alert rules to write. No agents to install.
Free forever. No credit card. First alert in under 5 minutes.
Built for teams running 5–50 services on AWS, GCP, or Kubernetes
No logs from user-service since 14:31:47. Was emitting ~40/min before.
Pattern rate grew from 0/min to 38.8/min (new). Template mentions user-service:9090.
Error volume 12.3x above hourly baseline. Driven by connection refused errors.
"LOADING Redis is loading the dataset in memory" — 6 occurrences, subsided.
Cascade failure detected and root cause identified. No rules configured.
How It Works
Point your logs at Epok
Add a URL and an API key to whatever ships your logs. FluentBit, Vector, Promtail, a curl script. Anything that speaks HTTP works. Takes about five minutes.
- · Loki, OTLP, Elasticsearch bulk, syslog, raw JSON
- · No agents to install, no SDKs to add
- · Logs appear in real time immediately
Epok figures out what normal looks like
New errors, silence, and high error rates are caught from minute one. Baselines build continuously — thresholds automatically tighten as Epok learns your patterns. Full seasonal detection by day seven.
- · New errors + silence detection from minute one
- · Error rate anomalies from minute one (population baselines)
- · Volume anomaly detection within one hour
- · Full seasonal baselines by day seven
Get alerted before your users notice
Slack, PagerDuty, email, or webhook fires when something breaks. Every alert includes the AI-generated root cause summary — what happened, what caused it, what to check first. Resolve notifications tell you when it's fixed.
- · Slack / PagerDuty / webhook / email: fires and resolves automatically
- · AI root cause summary in every notification
- · Incidents group related alerts — one page, not five
What engineers actually need from logs.
Not more dashboards. Not more queries. Answers. Every feature works from the moment your first log arrives — nothing to configure.
TSocket::open() connect() <Host: user-service Port: 9090>: Connection refused
Failed to read user timeline: TTransportException: open() failed: connect()
upstream request timeout: exceeded 5000ms waiting for response from user-service
LOADING Redis is loading the dataset in memory
"Did my deploy break anything?"
You deployed ten minutes ago. Open Epok. If the New Issues feed is empty, you're good. If it's not, you know exactly what broke and when.
NEW ISSUES FEED
Every error, warning, or fatal your system has never thrown before, surfaced the moment it appears. Grouped by meaning — one entry per root cause, not 50 variants of the same failure.
DEPLOY CORRELATION
"Appeared 4 min after deploy v2.4.1." Epok connects new errors to the deploy that caused them. Fastest path from "something broke" to "this commit broke it."
PATTERN TRENDS
Each error pattern gets a sparkline. Growing? Stable? One-off? You see the trajectory without writing a single query.
user-service stopped emitting logs at 14:31:47. 3 downstream services started throwing connection refused errors within 6 seconds.
5 alerts · 4 services affected · confidence: high
Root Cause Ranking
Cascade Timeline
"Where is the fire?"
PagerDuty is screaming. You need to know what broke, when it started, and which services are affected. Not write 15 queries to piece it together.
ROOT CAUSE ANALYSIS
Classifies errors into 8 categories (timeout, OOM, auth, config, etc.) and traces causality across services. "3 services blame database-primary, and database-primary has OOM errors." Probable root cause with transparent scoring — you see why each candidate ranked where it did.
WHAT CHANGED
Compares the incident window against your baseline: volume shifts, new errors, recent deploys, service changes. One view, no diffing.
BLAST RADIUS
Which services are affected? How many users? Which endpoints? Full impact scope in seconds.
CASCADE TIMELINE
See exactly how a failure propagated across services. "Database went silent → API got connection refused 3s later → Frontend 502s 5s after that." Origin identified automatically.
AI: INCIDENT SUMMARY
Every Slack and PagerDuty notification includes an LLM-generated summary: what happened, probable root cause, and what to check first. On-call engineers know where to start before they open a laptop.
"What happened overnight?"
You can't watch logs 24/7. Epok can. It learns what normal looks like and tells you when something isn't.
VOLUME ANOMALIES
Learns normal log volume per service, per hour, per day of week. When a spike hits at 3am, you get one alert with the messages driving it. Not a storm of threshold violations that teach you to mute your phone.
SILENCE DETECTION
A service that logs every 30 seconds goes quiet for 5 minutes. That's the most dangerous kind of failure -- no errors, just absence. Epok catches it.
DOMAIN-SPECIFIC DETECTORS
Kubernetes (70+ rules), database, web, security, serverless, dependency, infrastructure, and AWS. Built on statistical baselines, not static thresholds. Detects what matters in your stack — no rules to write.
THRESHOLD RULES + SLO MONITORING
Custom alert rules on any query with duration guard and cooldown. SLO error budget tracking with burn rate prediction — get warned before your SLO breaches, not after.
"Why did this break?"
You know something is wrong. Now you need to understand why. Epok traces the path from symptom to root cause — across services, automatically.
ROOT CAUSE RANKING
Ranked list of probable causes with category (TIMEOUT, OOM, CONFIG), origin vs. victim classification, and transparent scoring. Not a black box.
AI: ROOT CAUSE HYPOTHESIS
An LLM reads the full causal context and proposes a 2-3 sentence hypothesis explaining the chain of causation. Cites specific evidence from your logs, not generic advice.
INCIDENT SIMILARITY
"This looks like the incident 3 days ago." Matches new incidents against history. When a match is found, you see what fixed it last time.
"Why is this alert so accurate?"
Epok learns from your feedback. Thresholds auto-adjust. Noisy alerts get suppressed. Changepoints are recognized — not false-alarm storms. The longer you use it, the sharper it gets.
SELF-TUNING THRESHOLDS
Marking alerts as 'not helpful' tightens thresholds. Consistent signals get reinforced. No manual tuning — sensitivity adjusts every 6 hours.
CHANGEPOINT DETECTION
When traffic permanently shifts after a deploy, Epok recognizes it as a new baseline — not a prolonged anomaly. No false-positive storms.
SEMANTIC ERROR CLUSTERING
"connection refused to database" and "cannot connect to postgres: ECONNREFUSED" are the same problem. Errors clustered by meaning, not exact text.
You've Tried These. Here's the Difference.
Other tools store your logs and wait for you to ask questions. Epok watches your logs, tells you what broke, and explains why — no existing observability stack required.
CloudWatch
Slow queries. No anomaly detection. Costs scale with every scan.
~200ms search. Automatic detection. Flat monthly price.
Datadog
Per-GB ingest + cardinality tax + per-host fees. Surprise bills are the #1 complaint.
Flat pricing. No cardinality fees. Root cause analysis included at every tier.
New Relic
Full platform complexity when you just need log intelligence. Manual sensitivity tuning for anomaly detection.
Every detector self-tunes. No per-host fees. Same intelligence, fraction of the cost.
Grafana + Loki
You build everything: dashboards, alert rules, PromQL, recording rules. A full-time job that decays.
Detection from day one — nothing to build. Self-tuning thresholds that improve from your feedback.
ELK / OpenSearch
JVM tuning, shard management, index lifecycle policies. You need an ops team for your logging stack.
Send logs via the Elasticsearch bulk API you already use. Get intelligence back automatically.
Your logs are your data.
We know you are sending production data. Here is how we handle it.
Encrypted in Transit
TLS 1.3 for all connections. All data transmitted over encrypted channels.
GDPR Compliant
Built for data privacy from day one. Your logs are stored securely and handled in full compliance with GDPR.
Automatic Data Deletion
Logs are permanently deleted after your retention period. No archives, no surprises.
API Key Isolation
Each tenant is fully isolated. API keys scope access to ingest, read, or both.
Stop grep-ing logs at 3am.
Let Epok find the root cause.
Five minutes to connect. Every detector runs from minute one. Your first Slack alert arrives before you finish reading the docs.
No credit card. No sales call. Setup takes 5 minutes.