Documentation
Updated May 31, 2026 · today
Get started with Epok in under 5 minutes. Send your first logs and let the intelligence engine do the rest.
The sandbox is read-only, logged in as a demo tenant with pre-seeded logs. No sign-up required.
Quick Start
Send your first log entry. Replace YOUR_API_KEY with your key from Settings — see Authentication below for header formats.
curl -X POST https://ingest.getepok.dev/insert/elasticsearch/_bulk \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '
{"create":{}}
{"_msg":"Application started","level":"info","service":"api","_time":"2026-02-21T00:00:00Z"}
'That's it. Your logs appear in real time immediately. Anomaly detection activates automatically.
Want to see it working first? The sandbox tenant has live logs, real alerts, and working detectors.
Open sandbox → ExploreQuickstart wizard
Two questions → the exact snippet to paste.
Where are your logs coming from?
Authentication
Epok uses API keys for log ingestion. You'll get a default API key when you sign up. Find it in Settings.
Include your API key in every request using any of these methods:
Authorization: Bearer epk_your_api_key
or
Authorization: Basic base64(epk_your_api_key:x)
or
X-API-Key: epk_your_api_key
Basic Auth is used by Loki-native shippers (FluentBit, Promtail, Grafana Alloy). Set the username to your API key and the password to any value.
Quickstart by platform
Pick your stack — each guide is a 5-minute end-to-end setup with copy-paste-ready config, the exact verification steps, and the gotchas we've seen on real deployments.
Not on this list? The protocol table below works for any shipper that speaks Elasticsearch Bulk, Loki Push, OTLP HTTP, FluentBit, Fluentd, Syslog, or plain JSON.
Supported Integrations
Epok accepts logs from any source. Pick the integration that fits your stack.
| Protocol | Endpoint | Use With |
|---|---|---|
| Elasticsearch Bulk | /_bulk | curl, Logstash, Vector, Filebeat |
| Loki Push | /loki/api/v1/push | FluentBit, Promtail, Grafana Alloy, any Loki client |
| OTLP HTTP | /v1/logs | OpenTelemetry Collector, any OTEL SDK |
| FluentBit Native | /api/v1/fluent | FluentBit (with http output, alternative to Loki) |
| Fluentd | /api/v1/fluentd | Fluentd (out_http plugin) |
| Syslog (HTTP) | /api/v1/syslog | rsyslog, syslog-ng (via omhttp) |
| CloudWatch | /api/v1/cloudwatch | AWS Lambda subscription filter |
| GCP Cloud Logging | /api/v1/ingest | Cloud Function via Pub/Sub sink |
| Generic JSON | /api/v1/ingest | Any HTTP client, custom apps |
Configuration Examples
Copy-paste configs for every supported shipper. Replace YOUR_API_KEY with your key.
▶curlPOST /insert/elasticsearch/_bulk
The fastest way to test. Send a log line from your terminal.
curl -X POST https://ingest.getepok.dev/insert/elasticsearch/_bulk \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '
{"create":{}}
{"_msg":"Application started successfully","level":"info","service":"api","_time":"2026-02-21T00:00:00Z"}
{"create":{}}
{"_msg":"GET /api/users 200 42ms","level":"info","service":"api","status_code":200,"duration_ms":42}
'▶FluentBitPOST /loki/api/v1/push
Lightweight log shipper. Ideal for Docker, Kubernetes, and edge devices. Uses native Loki output with Basic Auth.
# /etc/fluent-bit/fluent-bit.conf
[INPUT]
Name tail
Path /var/log/app/*.log
Tag app
[OUTPUT]
Name loki
Match *
Host ingest.getepok.dev
Port 443
TLS On
HTTP_User YOUR_API_KEY
HTTP_Passwd x
Labels job=fluentbit, host=my-server
drop_single_key on▶VectorPOST /loki/api/v1/push
High-performance observability pipeline. Native Loki sink — same protocol as Promtail/Alloy.
# vector.toml
[sources.app_logs]
type = "file"
include = ["/var/log/app/*.log"]
[sinks.epok]
type = "loki"
inputs = ["app_logs"]
endpoint = "https://ingest.getepok.dev"
[sinks.epok.encoding]
codec = "json"
[sinks.epok.auth]
strategy = "bearer"
token = "YOUR_API_KEY"
[sinks.epok.labels]
service = "{{ service }}"
host = "{{ host }}"▶Promtail / Grafana AlloyPOST /loki/api/v1/push
If you already run Promtail or Grafana Alloy, point them at Epok. Native Loki protocol support.
# promtail-config.yml
clients:
- url: https://ingest.getepok.dev/loki/api/v1/push
basic_auth:
username: YOUR_API_KEY
password: x
scrape_configs:
- job_name: app
static_configs:
- targets: [localhost]
labels:
app: api
__path__: /var/log/app/*.log▶PythonPOST /loki/api/v1/push
Send logs directly from your application code.
import time, httpx
resp = httpx.post(
"https://ingest.getepok.dev/loki/api/v1/push",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"streams": [{
"stream": {"app": "myapp", "env": "production"},
"values": [[
str(int(time.time())) + "000000000",
"User signup completed for user_id=4821"
]]
}]
}
)▶Node.jsPOST /insert/elasticsearch/_bulk
Send logs directly from a Node service. Uses built-in fetch — no dependencies.
// No npm install needed — fetch is built-in on Node 18+.
const API_KEY = process.env.EPOK_API_KEY;
async function log(entries) {
const body = entries
.flatMap((e) => [
JSON.stringify({ create: {} }),
JSON.stringify({ _time: new Date().toISOString(), ...e }),
])
.join("\n") + "\n";
const r = await fetch(
"https://ingest.getepok.dev/insert/elasticsearch/_bulk",
{
method: "POST",
headers: {
Authorization: `Bearer ${API_KEY}`,
"Content-Type": "application/x-ndjson",
},
body,
},
);
if (!r.ok) throw new Error(`epok ingest failed: ${r.status}`);
}
await log([
{ level: "info", service: "api", _msg: "signup ok user_id=4821" },
{ level: "error", service: "api", _msg: "payment failed order_id=9012" },
]);▶GoPOST /insert/elasticsearch/_bulk
Stdlib-only Go client. Batch entries, use http.Client with timeout.
package epok
import (
"bytes"
"encoding/json"
"fmt"
"net/http"
"os"
"time"
)
type Entry struct {
Time string `json:"_time"`
Msg string `json:"_msg"`
Level string `json:"level"`
Service string `json:"service"`
}
var client = &http.Client{Timeout: 30 * time.Second}
func Send(entries []Entry) error {
var buf bytes.Buffer
for _, e := range entries {
buf.WriteString(`{"create":{}}` + "\n")
if err := json.NewEncoder(&buf).Encode(e); err != nil {
return err
}
}
req, err := http.NewRequest("POST",
"https://ingest.getepok.dev/insert/elasticsearch/_bulk", &buf)
if err != nil {
return err
}
req.Header.Set("Authorization", "Bearer "+os.Getenv("EPOK_API_KEY"))
req.Header.Set("Content-Type", "application/x-ndjson")
resp, err := client.Do(req)
if err != nil {
return err
}
defer resp.Body.Close()
if resp.StatusCode >= 300 {
return fmt.Errorf("epok ingest failed: %s", resp.Status)
}
return nil
}▶OpenTelemetry (OTLP)POST /v1/logs
Native OTLP HTTP support. Works with any OpenTelemetry SDK or Collector. Use the otlphttp exporter (not otlp/gRPC).
# otel-collector-config.yml
exporters:
otlphttp:
endpoint: https://ingest.getepok.dev
headers:
Authorization: "Bearer YOUR_API_KEY"
service:
pipelines:
logs:
receivers: [otlp]
exporters: [otlphttp]▶Loki APIPOST /loki/api/v1/push
Direct Loki push API. Works with any Loki-compatible client.
curl -X POST https://ingest.getepok.dev/loki/api/v1/push \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"streams": [{
"stream": {"app": "api", "env": "production"},
"values": [
["1771632000000000000", "Application started successfully"],
["1771632001000000000", "GET /api/users 200 42ms"]
]
}]
}'▶FluentdPOST /api/v1/fluentd
Native Fluentd HTTP output. Tag-based service routing.
# /etc/fluentd/fluent.conf
<source>
@type tail
path /var/log/app/*.log
tag app.logs
</source>
<match app.**>
@type http
endpoint https://ingest.getepok.dev/api/v1/fluentd
headers {"Authorization": "Bearer YOUR_API_KEY"}
json_array false
<format>
@type json
</format>
</match>▶LogstashPOST /insert/elasticsearch/_bulk
Drop-in Elasticsearch output. Point your existing Logstash pipeline at Epok and flip API keys.
# /etc/logstash/conf.d/epok.conf
output {
elasticsearch {
hosts => ["https://ingest.getepok.dev"]
index => "logs"
user => "${EPOK_API_KEY}"
password => "x"
ssl_enabled => true
http_compression => true
}
}▶FilebeatPOST /insert/elasticsearch/_bulk
Elastic's lightweight shipper. Uses the native Elasticsearch output — no plugin install.
# filebeat.yml
filebeat.inputs:
- type: filestream
id: app-logs
paths:
- /var/log/app/*.log
output.elasticsearch:
hosts: ["https://ingest.getepok.dev"]
api_key: "${EPOK_API_KEY}"
compression_level: 3
bulk_max_size: 1000▶rsyslogPOST /api/v1/syslog
Built-in on most Linux distros. Use the omhttp module to ship RFC 5424 frames directly over HTTPS.
# /etc/rsyslog.d/50-epok.conf
module(load="omhttp")
action(
type="omhttp"
server="ingest.getepok.dev"
serverport="443"
usehttps="on"
restpath="api/v1/syslog"
httpcontenttype="text/plain"
httpheaders=["Authorization: Bearer YOUR_API_KEY"]
template="RSYSLOG_SyslogProtocol23Format"
action.resumeRetryCount="-1"
queue.type="LinkedList"
queue.size="50000"
)▶Splunk (migration)via Vector shim
Epok doesn't emulate the Splunk HEC wire protocol; point your existing HEC-bound Heavy Forwarder or Vector pipeline at Epok instead. Vector's `splunk_hec_logs` source pairs cleanly with its `elasticsearch` sink.
# vector.toml — drop-in Splunk HEC shim
[sources.splunk_in]
type = "splunk_hec_logs"
address = "0.0.0.0:8088"
token = "YOUR_INTERNAL_HEC_TOKEN"
[sinks.epok]
type = "elasticsearch"
inputs = ["splunk_in"]
endpoint = "https://ingest.getepok.dev"
bulk.index = "logs"
[sinks.epok.auth]
strategy = "basic"
user = "YOUR_API_KEY"
password = "x"▶Syslog (native HTTP)POST /api/v1/syslog
Native syslog HTTP endpoint. Accepts RFC 5424 and RFC 3164 frames as the request body. Simplest path for any tool that can POST — no relay needed.
# Send a single syslog frame directly with curl
curl -X POST https://ingest.getepok.dev/api/v1/syslog \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Content-Type: text/plain' \
--data-binary '<134>1 2026-04-08T12:34:56Z host01 myapp 1234 - - User signup failed for user_id=4821'
# Or batch many frames in one POST (newline-delimited)
curl -X POST https://ingest.getepok.dev/api/v1/syslog \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Content-Type: text/plain' \
--data-binary @syslog-batch.txt▶Syslog (via FluentBit relay)UDP/TCP 514 → HTTP
For network appliances (Cisco, Fortinet, Palo Alto) and legacy systems that can only send raw UDP/TCP syslog. Use FluentBit as a local relay to receive syslog and forward over HTTP.
# /etc/fluent-bit/syslog-relay.conf
[INPUT]
Name syslog
Listen 0.0.0.0
Port 514
Mode udp
[OUTPUT]
Name loki
Match *
Host ingest.getepok.dev
Port 443
TLS On
HTTP_User YOUR_API_KEY
HTTP_Passwd x
Labels job=syslog-relay▶AWS CloudWatchPOST /api/v1/cloudwatch
Forward CloudWatch Logs via subscription filter. Native gzip decompression.
# Create a Lambda subscription filter that POSTs to Epok.
# CloudWatch → Lambda → Epok
import base64, urllib3
EPOK_URL = "https://ingest.getepok.dev/api/v1/cloudwatch"
API_KEY = "YOUR_API_KEY"
http = urllib3.PoolManager()
def handler(event, context):
# CloudWatch payload is base64-encoded gzip, send as-is
compressed = base64.b64decode(event["awslogs"]["data"])
http.request("POST", EPOK_URL,
body=compressed,
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Encoding": "gzip",
"Content-Type": "application/json"
})▶GCP Cloud LoggingPOST /api/v1/ingest
Forward Google Cloud Logging via Pub/Sub sink and a Cloud Function.
# GCP Cloud Logging → Pub/Sub → Cloud Function → Epok
# 1. Create a log sink that routes to a Pub/Sub topic
# 2. Deploy this Cloud Function as a Pub/Sub subscriber
import base64, json, requests
EPOK_URL = "https://ingest.getepok.dev/api/v1/ingest"
API_KEY = "YOUR_API_KEY"
def handle_log(event, context):
data = json.loads(base64.b64decode(event["data"]))
entry = {
"_msg": data.get("textPayload", json.dumps(data.get("jsonPayload", {}))),
"level": data.get("severity", "info").lower(),
"service": data.get("resource", {}).get("type", "gcp"),
"_time": data.get("timestamp"),
}
requests.post(EPOK_URL,
headers={"Authorization": f"Bearer {API_KEY}"},
json=[entry])▶Generic JSONPOST /api/v1/ingest
Simplest format for custom applications. Send a JSON array or newline-delimited JSON.
curl -X POST https://ingest.getepok.dev/api/v1/ingest \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '[
{"_msg": "User signed up", "level": "info", "service": "auth", "user_id": 4821},
{"_msg": "Payment processed", "level": "info", "service": "billing", "amount": 29.99}
]'What Happens Next
Once your logs start flowing, Epok's intelligence engine activates automatically. No configuration needed.
Search and live tail work immediately
As soon as your first log arrives, you can search it and stream it live. No indexing delay.
New errors are detected from the first log
Epok fingerprints every error-level log message. When a never-before-seen error appears, it shows up in the New Errors feed within 5 minutes.
Silence detection activates within 1 hour
Epok learns each service's expected log cadence. If a service that was sending logs every 30 seconds goes quiet for 5 minutes, you'll get an alert.
Volume baselines build over 7 days
Log rate anomaly detection learns your normal patterns per service, per hour, per day of week. Early detection is active from day one with wider thresholds. Full precision by day seven.
Detectors
Epok ships 20 detectors that run automatically on every connected log stream. All detectors are included on every tier, including the 14-day trial — there is no detection-quality gate.
Each entry below is paired with an example alert as it would appear in the product. Thresholds, baseline windows, and tuning details are intentionally not published — those are implementation choices that change as we learn from your feedback.
Statistical (5)
log_rateDetects spikes, drops, and flatlines in log volume vs daily and weekly baselines per service.
EXAMPLE ALERT
api-gateway log volume dropped 87% from typical Wednesday-2am baseline
silenceCatches services that stop logging when they normally log every N seconds. The most dangerous failure mode: no errors, just absence.
EXAMPLE ALERT
auth-service silent for 5 minutes (typical baseline 30 logs/min)
isolation_forestMulti-dimensional outliers in log feature space. Catches subtle anomalies that single-axis thresholds miss.
EXAMPLE ALERT
api-gateway request cluster outlier: status=200 + latency=12s + body=8MB (1 in 50,000 vs baseline)
error_ratePer-service error percentage anomalies vs baseline, with sustained-elevation guards so a single noisy minute doesn't fire and slow ramps still get caught.
EXAMPLE ALERT
api-gateway: 4.2% error rate vs 0.3% baseline (14x normal)
recurring_patternIdentifies log patterns that recur on a schedule — daily batch jobs, hourly cron runs, weekly reports — and flags when one fails to fire on its expected cadence.
EXAMPLE ALERT
nightly-backup: expected at 02:00 UTC, last seen 9 hours ago (3 missed runs)
Error Intelligence (2)
new_errorCatches errors that have never appeared in your 7-day baseline. On connect, the baseline seeds from your last 7 days of historical logs — push history for day-1 alerts, or wait a week for organic warm-up.
EXAMPLE ALERT
FATAL: connection pool exhausted (first seen 2 min ago in payment-service)
pattern_clusterGroups errors with similar templates so many variants of the same problem cluster into one alert. Surfaces brand-new clusters as they appear.
EXAMPLE ALERT
1,247 instances of new error pattern in 5 min: 'Deadlock detected on table users, retrying transaction...'
Domain-Specific (9)
k8s_intelligence70+ rules for OOMKilled, CrashLoopBackOff, ImagePullBackOff, FailedScheduling, evictions, probe failures, and other Kubernetes failure modes.
EXAMPLE ALERT
pod payment-service-7f4b OOMKilled (3rd restart in 10 min)
aws_intelligencePatterns for RDS, S3, DynamoDB, ECS, EKS, IAM, KMS, Lambda, and 20+ other AWS services. Catches throttling, capacity events, IAM denials, and service-specific failure modes.
EXAMPLE ALERT
rds-primary: storage autoscaling triggered 3rd time in 2h (300 GB → 350 GB)
serverless_intelligenceLambda timeouts, cold starts, throttling, init failures, runtime crashes, and concurrency limits across functions.
EXAMPLE ALERT
lambda payment-handler: 12 timeouts in 5 min, all hitting the 30s limit
database_intelligenceConnection pool exhaustion, deadlocks, slow queries, replication lag, schema migration errors, and transaction aborts across Postgres, MySQL, and MongoDB.
EXAMPLE ALERT
postgres-primary: 8 deadlocks on the orders table in 90 seconds
dependency_intelligenceUpstream service failures, circuit breaker trips, retry exhaustion, and cascading failures between services.
EXAMPLE ALERT
notification-service: 60% of outbound calls to email-relay returning 502
web_intelligence4xx and 5xx surges, slow endpoints, TLS handshake failures, gateway timeouts, and load balancer health events.
EXAMPLE ALERT
nginx-edge: 5xx rate jumped from 0.1% to 4.3% in 60 seconds
security_intelligenceBrute-force authentication attempts, anomalous auth failures, privilege escalations, and suspicious access patterns from your auth and audit logs.
EXAMPLE ALERT
23 failed SSH auth attempts to bastion-1 from a single IP in 90 seconds
search_intelligenceSlow queries, query failures, index issues, and scoring anomalies in Elasticsearch / OpenSearch / Solr.
EXAMPLE ALERT
elasticsearch-primary: query latency p99 above 8 seconds for 4 consecutive minutes
infrastructure_intelligenceDisk pressure, memory pressure, CPU steal, swap activity, kernel errors, and other host-level signals from system logs.
EXAMPLE ALERT
host worker-12: disk usage 94% on /var, growing 1.2 GB/hour
SLO & Performance (2)
golden_signalLatency, saturation, and per-service error rate. Three of the four SRE golden signals — traffic is covered by Volume Anomaly.
EXAMPLE ALERT
checkout-service p99 latency: 4.2s vs baseline 380ms
slo_monitorError budget tracking with burn rate prediction. Get warned before the SLO breaches, not after.
EXAMPLE ALERT
checkout-service SLO: 14-day error budget at 87% burn, projected to exhaust in 36h
User-Defined (2)
threshold_ruleCustom alert conditions on any query, with cooldown and duration guards. Use when you need a hand-tuned alert beside automatic detection.
EXAMPLE ALERT
user-defined: error_rate > 5% for 3 minutes (checkout-service)
composite_ruleMulti-condition alert rules combining several signals. Use when no single threshold captures the failure mode.
EXAMPLE ALERT
user-defined: (latency p99 > 2s AND error_rate spike) for 5 min on payment-service
Alert Management
Epok handles alert deduplication, grouping, escalation, and lifecycle automatically. Try in sandbox →
Deduplication
If the same anomaly (same detector, same service, same type) fires again while an incident is still open, Epok updates the existing alert with new evidence instead of sending another notification. Suppression windows stretch automatically for persistent issues so you never get paged about the same thing twice.
Severity escalation
Alerts that keep re-firing automatically escalate in severity. An INFO that refuses to resolve becomes a WARNING; a WARNING that persists becomes CRITICAL. Persistent problems get the attention they deserve without manual intervention.
Incident grouping
Multiple alerts from the same tenant within a short window are grouped into a single incident. Epok correlates related alerts across services so a cascade of failures produces one coherent incident instead of fifteen disconnected pages.
Auto-resolve
Alerts automatically resolve when the detector stops producing anomalies for that service. You can also manually resolve alerts from the dashboard with an optional note for the timeline.
Snooze and mute
Snooze an alert for a set duration during maintenance windows. Mute specific services or detector types to suppress known noisy patterns. Feedback from snooze/mute actions trains the self-tuning system.
Analysis Tools
When an alert fires, Epok automatically runs analysis to help you understand what happened, why, and what to do next. Deterministic analysis runs on every tier; AI-assisted explanations are included starting with the 14-day trial (capped by daily AI budget). Open an incident in the sandbox →
Root Cause Ranking
All tiersRanks potential root causes by scoring error patterns, causal language signals, timing correlation, and cross-service propagation. Outputs a ranked list of hypotheses.
Error categorization
All tiersClassifies errors by failure type — connection, timeout, resource exhaustion, auth, configuration, data schema, rate limit, runtime crash, and more. Categories drive different investigation paths and RCA scoring.
What Changed (9 methods)
All tiersCompares the anomaly window against a baseline period across 9 dimensions: new error patterns, volume shifts, field distribution changes, new log streams, disappeared streams, latency changes, status code shifts, new field values, and pattern frequency changes.
Blast Radius
All tiersDetermines which services, endpoints, and users are affected by an incident. Shows the scope of impact to help you prioritize response.
Cascade Timeline
All tiersReconstructs the sequence of failures across services. Shows which service failed first and how the failure propagated through dependencies.
Dimension Lift
All tiersIdentifies which field values are disproportionately represented in the anomaly. If 90% of errors come from region=us-east-1, Dimension Lift surfaces that automatically. AI-generated plain-language explanations are included starting with the trial.
Cross-service error matching
All tiersMatches related errors across different services. When your API returns 500s and your database logs connection timeouts at the same time, Epok links them.
Service dependency graph
All tiersInfers service-to-service dependencies from log patterns and error propagation. Visualizes which services depend on what.
Deploy correlation
All tiersDetects recent deploys from log patterns (version strings, restart markers, config changes) and correlates anomalies with deploy timing.
AI incident narrative
Trial+Plain-language summary of what happened, what's affected, and suggested next steps. Inlined into Slack alerts and the investigation view.
AI root-cause hypothesis
Trial+LLM-assisted explanation on top of the deterministic RCA ranking. Turns signals into a readable theory of the incident.
Deep RCA
Trial+On-demand, slower analysis that pulls more context (baseline comparison, correlated events, pattern history) and produces a longer write-up.
Dimension Lift explanation
Trial+Natural-language explanation of why a dimension spiked — the shift, its scale, and whether it's the most likely cause.
Noise scoring
Trial+LLM-scored noise rating on every alert to auto-tune suppression over time. Reduces alert fatigue without manual rule edits.
Natural-language query
Trial+Ask "show me 5xx spikes from checkout in the last hour" and Epok translates to LogsQL. Scoped to your tenant.
AI Features
Every detector and deterministic analysis tool runs on every tier. AI-powered explanations sit on top: they turn detector output into readable prose, explain dimension shifts, auto-tune alert noise, and translate English into LogsQL. AI runs against your logs only at your tenant's request; your data is never used for model training.
| Feature | Tier | What it does |
|---|---|---|
| Incident narrative | Trial+ | Plain-language summary of what happened, what's affected, and suggested next steps. Inlined into Slack alerts and the investigation view. |
| Root-cause hypothesis | Trial+ | LLM-assisted explanation on top of deterministic RCA ranking. Turns signals into a readable theory of the incident. |
| Suggested actions | Trial+ | Actionable next steps tailored to the incident — "restart pod X", "check migration 0042", "rate-limit caller Y". |
| Title rewrite | Trial+ | Converts detector-generated alert titles into human-readable summaries for alerts list and notification channels. |
| Deep RCA | Trial+ | On-demand slower analysis pulling more context (baseline comparison, correlated events, pattern history) to produce a longer write-up. |
| Dimension Lift explanation | Trial+ | Natural-language explanation of why a dimension spiked — the shift, its scale, and whether it's the most likely cause. |
| Noise scoring | Trial+ | LLM-scored noise rating on every alert to auto-tune suppression over time. Reduces alert fatigue without manual rule edits. |
| Natural-language query | Trial+ | Type "show me 5xx spikes from checkout in the last hour" and Epok translates it to LogsQL. Scoped to your tenant. |
Daily AI credits
1 credit = 1 AI action. Trial: 200/day. Team: 500/day. Growth: larger budget. Credits reset at 00:00 UTC. Alert narratives are generated eagerly; on-demand features (Deep RCA, NL query) consume credits per invocation.
Data privacy
Epok sends only the minimum necessary context (log samples, detector evidence, service names) to the AI provider. Payloads are not retained by the provider and are not used for model training. Your logs stay on Epok's servers; the LLM never gets bulk access.
Notifications
Configure where Epok sends alerts. Trial and Team include channels; Growth and Enterprise are unlimited.
Slack
Incoming webhook integration. Alerts include severity, affected service, description, and a link to the investigation view. On Team tier and above, AI-generated incident narratives are included inline.
Add a Slack incoming webhook URL in Settings > Notification Channels.
PagerDuty
Native Events API v2 integration. Alerts map to PagerDuty incidents with severity, dedup key, and custom details. Resolved alerts auto-resolve in PagerDuty.
Add your PagerDuty integration key (Events API v2) in Settings > Notification Channels.
Webhook
Send alert JSON to any HTTP endpoint. Use this to integrate with OpsGenie, Microsoft Teams, Discord, or custom systems.
Add a webhook URL in Settings > Notification Channels. Epok sends a POST with the alert payload as JSON.
Email notifications for alerts. Includes a summary with links to the dashboard for investigation.
Add email addresses in Settings > Notification Channels.
Team Management
Epok supports team collaboration with role-based access control.
Roles
Three roles: Owner (full access, can manage billing and delete tenant), Admin (manage members, API keys, settings), and Member (view alerts, search logs, investigate incidents).
Inviting team members
Owners and admins can create invite links in Settings. New members sign in with Google and are automatically added to your tenant with the role you specify.
Tier limits
| Tier | Daily ingest | Retention | Users | API keys | Services |
|---|---|---|---|---|---|
| Trial | 107 GB | 14 days | 3 | 2 | 10,000 |
| Team | 50 GB | 30 days | 10 | 5 | 10,000 |
| Growth | 167 GB | 30 days | Unlimited | 20 | Unlimited |
| Custom | 1024 GB | 365 days | Unlimited | Unlimited | Unlimited |
All tiers include every intelligence detector. See pricing for full feature comparison.
Configuration
Epok works with zero configuration out of the box. All settings below are optional and can be adjusted in the dashboard.
Detection sensitivity
Volume anomaly detection calibrates itself to each service's normal traffic pattern and flags spikes, drops, and flatlines. You can adjust sensitivity per service if a stream is genuinely bursty by design, but the defaults work without tuning for almost every workload.
Threshold + composite rules
Custom rules for hard constraints. Threshold rules fire when a LogsQL query crosses a number; composite rules fire when multiple signals are simultaneously active. Full reference below →
Trial: 5 threshold + 5 composite. Team: 20 + 5. Growth: unlimited.
SLO monitoring
Define Service Level Objectives with error budget tracking. Epok monitors burn rate and predicts when your SLO will breach. Trial: 5 SLOs. Team: 5. Growth: unlimited.
Self-tuning thresholds (Team+)
Epok learns from your feedback. When you snooze, mute, or resolve alerts, the system adjusts sensitivity to reduce noise over time. No manual threshold tuning needed.
Custom rules — reference
Two flavours of user-defined rule. Threshold rules fire when a LogsQL query crosses a number over a window. Composite rules fire when two or more signals are simultaneously true. Both are plain JSON; both speak the same LogsQL you use in Explore.
Threshold rules
Endpoint: POST /api/v1/tenants/<id>/rules
{
"name": "Payment refund burst",
"query": "service:payment AND _msg:refund AND amount > 1000",
"condition_op": "gt",
"condition_value": 5,
"window_seconds": 300,
"severity": "critical",
"for_duration_seconds": 60,
"cooldown_seconds": 600,
"channel_ids": [12, 8],
"enabled": true
}| Field | Type | Description |
|---|---|---|
| name | string | Human-readable, surfaced in alerts. |
| query | LogsQL | The query whose count is checked against the condition. Same syntax as Explore. |
| condition_op | enum | gt · gte · lt · lte · eq · neq |
| condition_value | number | Threshold the hit count must satisfy. |
| window_seconds | int | Look-back window. Default 300 (5 min). |
| severity | enum | info · warning · critical. Drives notification routing + paging behavior. |
| for_duration_seconds | int | Condition must hold for at least this long before firing. Eliminates flapping. |
| cooldown_seconds | int | Minimum time between consecutive fires for this rule. Default 0. |
| channel_ids | int[] | Notification channel IDs. Omit to use tenant defaults. |
| enabled | bool | Toggle without deleting. Default true. |
Worked examples
fires: count gt 50 in 60s · critical
Brute-force / credential stuffing. 50 events in 60s is too many.
fires: count gt 10 in 60s · critical
Upstream DB is down or pool is exhausted. Pair with composite rule.
fires: count gt 3 in 300s · warning
Hard ceiling on user-visible latency. 3 occurrences in 5min crosses SLO budget.
fires: count gt 5 in 300s · critical
High-value refund cluster — page someone immediately for fraud review.
Composite rules
Endpoint: POST /api/v1/tenants/<id>/composite-rules
{
"name": "API degraded + DB struggling",
"expression": {
"op": "and",
"conditions": [
{
"op": "threshold",
"query": "service:api AND status_code:>=500",
"comparator": "gt",
"value": 10,
"window_seconds": 300
},
{
"op": "detector_active",
"detector_type": "database_intelligence"
}
]
},
"severity": "critical",
"cooldown_seconds": 300,
"channel_ids": [12],
"enabled": true
}Leaf operators (measurements)
A LogsQL query crossing a number in a window. Requires query + value; comparator defaults to gt; window_seconds defaults to 300.
{ "op": "threshold",
"query": "level:error",
"comparator": "gt",
"value": 50,
"window_seconds": 300 }A specific alert (by dedup-key, severity, or service) is currently active. All fields optional — match anything when omitted.
{ "op": "alert_firing",
"dedup_key": "lr:api:..." }A built-in detector type has produced anomalies. Pair built-in detection with your own threshold to filter noise.
{ "op": "detector_active",
"detector_type": "silence" }Branch operators (combine)
and (all conditions true), or (any true), not (single child false). Branches take a conditions array. Expressions can nest up to 5 levels deep; expression size is capped at 10KB.
Version-control your rules
Rules are JSON; export with GET /rules, commit to your repo, apply from CI with POST. Put alert config next to service code so a deploy and its alerts ship together.
Test before deploy
Run the rule's LogsQL query in Explore against your last 24h. The condition fires on the count of hits over your window — same number Explore shows you.
Anti-patterns
- Don't alert on noisy
level:infopatterns — use the detectors. - Set
for_duration_seconds> 0 for any latency or rate rule to kill flapping. - Composite rules with a single operand should be threshold rules — keep composites for real AND/OR logic.
Tier limits
- Trial: 5 threshold + 5 composite
- Team: 20 threshold + 5 composite
- Growth, Enterprise: unlimited
API Reference
Key endpoints for programmatic access. All endpoints require authentication via API key. Full OpenAPI reference →
Download the raw spec at /openapi.json to generate clients or import into Postman.
| Method | Endpoint | Description |
|---|---|---|
| GET | /health | Health check |
| GET | /api/v1/alerts | List alerts (active + recent resolved) |
| GET | /api/v1/alerts/:id | Get alert detail with analysis |
| POST | /api/v1/alerts/:id/resolve | Manually resolve an alert |
| GET | /api/v1/streams | List monitored log streams |
| GET | /api/v1/new-errors | List new error patterns |
| GET | /api/v1/patterns | List detected log patterns |
| GET | /api/v1/search | Full-text log search |
| GET | /api/v1/facets | Field facets for filtering |
| GET | /api/v1/hits | Log volume histogram |
| WS | /ws/livetail/:tenant_id | WebSocket live tail (authenticate with API key in query string or cookie) |
| WS | /ws/alerts/:tenant_id | WebSocket alert stream (real-time incident updates) |
| GET | /api/v1/detectors | List registered detectors |
| POST | /api/v1/channels | Add notification channel |
| GET | /api/v1/channels | List notification channels |
| GET | /metrics | Prometheus metrics |
Example: List active alerts
curl https://app.getepok.dev/api/v1/alerts?state=firing \
-H 'Authorization: Bearer YOUR_API_KEY'Example: Search logs
curl 'https://app.getepok.dev/api/v1/search?query=level%3Aerror&start=-1h&limit=100' \
-H 'Authorization: Bearer YOUR_API_KEY'Rate Limits & Errors
Epok enforces per-tenant quotas so one tenant can't degrade the platform for everyone else. All limits are documented; nothing is secret or negotiated on a case-by-case basis.
Ingest rate limit
Logs are rate-limited per tenant. When you exceed the limit, Epok returns HTTP 429 Too Many Requests with a Retry-After header (seconds). Retry after that many seconds with exponential backoff to avoid thundering herd on recovery.
| Tier | Ingest rate | API query rate | Daily volume |
|---|---|---|---|
| Trial | 500 events/sec | 1,200 req/min | 107 GB |
| Team | 500 events/sec | 1,200 req/min | 50 GB |
| Growth | 2,000 events/sec | 6,000 req/min | 167 GB |
Hitting the daily-volume ceiling pauses ingest until the next UTC day on the trial; paid tiers bill overage per GB. Live-tail sessions and saved-search counts have separate caps — see pricing.
HTTP error codes
| Code | Meaning | What to do |
|---|---|---|
| 200 | OK | Successful response. For ingest, all logs were accepted. |
| 400 | Bad Request | Payload is malformed. Check NDJSON formatting, timestamp, and required fields. Response body has the specific issue. |
| 401 | Unauthorized | Missing, invalid, or expired credentials. Check your API key or re-authenticate via Google OAuth. |
| 403 | Forbidden | Authenticated but lacking permission. Some admin endpoints require the owner or admin role. |
| 404 | Not Found | Resource doesn't exist or your API key isn't scoped to its tenant. |
| 409 | Conflict | Duplicate resource (e.g. creating a tenant with an existing account_id). |
| 413 | Payload Too Large | Single log line exceeds 1 MB or batch exceeds the ingest size cap. Split into smaller batches. |
| 429 | Too Many Requests | Rate limit hit. Honor the `Retry-After` header and back off exponentially. |
| 500 | Internal Server Error | Server-side issue. Retry with backoff. If it persists, email support@getepok.dev with the request ID from the response headers. |
| 503 | Service Unavailable | Temporary overload or deploy in progress. Retry with backoff. |
Recommended retry strategy
For ingest, buffer locally and retry idempotently. Exponential backoff with jitter prevents correlated retries after an upstream hiccup. A simple loop:
import time, random, httpx
def send(events, *, max_attempts=5):
for attempt in range(max_attempts):
r = httpx.post(
"https://ingest.getepok.dev/insert/elasticsearch/_bulk",
headers={"Authorization": f"Bearer {API_KEY}"},
content=events,
timeout=30,
)
if r.status_code < 500 and r.status_code != 429:
return r
delay = float(r.headers.get("Retry-After", 2 ** attempt))
time.sleep(delay + random.uniform(0, 0.5))
raise RuntimeError(f"ingest failed after {max_attempts} attempts")Migrating From Another Tool
Moving from Datadog, Splunk, or Loki? Dual-ship for a day, verify parity, then cut over on your own schedule. The migration guide walks through each source with concrete Vector configs.
What's New
Release notes with the actual commits. Every change is traceable back to the code that shipped it.
Further Reading
20 Kubernetes Failures You Should Be Alerting On
CrashLoopBackOff, OOMKilled, ImagePullBackOff, and 17 more failure modes with automatic detection.
Catch New Errors Before Users Report Them
How error fingerprinting detects never-before-seen errors automatically.
Stop Writing Alert Rules by Hand
Why anomaly detection catches things that static thresholds miss.
What Log Management Actually Costs in 2026
Real pricing at 50 GB, 600 GB, and 3 TB across CloudWatch, Datadog, Grafana, Splunk, Elastic, and Epok.
Ready to get started?
Open Epok Dashboard14-day trial includes every detector and full AI root cause analysis. No credit card required.