Documentation

Updated May 31, 2026 · today

Get started with Epok in under 5 minutes. Send your first logs and let the intelligence engine do the rest.

The sandbox is read-only, logged in as a demo tenant with pre-seeded logs. No sign-up required.

Quick Start

Send your first log entry. Replace YOUR_API_KEY with your key from Settings — see Authentication below for header formats.

terminal

bash

curl -X POST https://ingest.getepok.dev/insert/elasticsearch/_bulk \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '
{"create":{}}
{"_msg":"Application started","level":"info","service":"api","_time":"2026-02-21T00:00:00Z"}
'

That's it. Your logs appear in real time immediately. Anomaly detection activates automatically.

Want to see it working first? The sandbox tenant has live logs, real alerts, and working detectors.

Open sandbox → Explore

Quickstart wizard

Two questions → the exact snippet to paste.

Where are your logs coming from?

Authentication

Epok uses API keys for log ingestion. You'll get a default API key when you sign up. Find it in Settings.

Include your API key in every request using any of these methods:

Authorization: Bearer epk_your_api_key

Authorization: Basic base64(epk_your_api_key:x)

X-API-Key: epk_your_api_key

Basic Auth is used by Loki-native shippers (FluentBit, Promtail, Grafana Alloy). Set the username to your API key and the password to any value.

Quickstart by platform

Pick your stack — each guide is a 5-minute end-to-end setup with copy-paste-ready config, the exact verification steps, and the gotchas we've seen on real deployments.

AWS

CloudWatch · ECS · EC2 · Lambda

~5–10 min to first log →

Google Cloud

Cloud Logging · GKE · Cloud Run

~5–10 min to first log →

Kubernetes

EKS · GKE · AKS · k3s · self-hosted

~5 min to first log →

Railway

Built-in HTTP log drain

~3 min to first log →

Vercel

Built-in log drain (Pro+)

~5 min to first log →

Not on this list? The protocol table below works for any shipper that speaks Elasticsearch Bulk, Loki Push, OTLP HTTP, FluentBit, Fluentd, Syslog, or plain JSON.

Supported Integrations

Epok accepts logs from any source. Pick the integration that fits your stack.

Protocol	Endpoint	Use With
Elasticsearch Bulk	/_bulk	curl, Logstash, Vector, Filebeat
Loki Push	/loki/api/v1/push	FluentBit, Promtail, Grafana Alloy, any Loki client
OTLP HTTP	/v1/logs	OpenTelemetry Collector, any OTEL SDK
FluentBit Native	/api/v1/fluent	FluentBit (with http output, alternative to Loki)
Fluentd	/api/v1/fluentd	Fluentd (out_http plugin)
Syslog (HTTP)	/api/v1/syslog	rsyslog, syslog-ng (via omhttp)
CloudWatch	/api/v1/cloudwatch	AWS Lambda subscription filter
GCP Cloud Logging	/api/v1/ingest	Cloud Function via Pub/Sub sink
Generic JSON	/api/v1/ingest	Any HTTP client, custom apps

Syslog formats: RFC 5424, RFC 3164, Cisco IOS, Fortinet FortiGate, Palo Alto Networks, and HP/Aruba ProCurve are all parsed automatically.

Configuration Examples

Copy-paste configs for every supported shipper. Replace YOUR_API_KEY with your key.

▶curlPOST /insert/elasticsearch/_bulk

The fastest way to test. Send a log line from your terminal.

terminal

bash

curl -X POST https://ingest.getepok.dev/insert/elasticsearch/_bulk \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '
{"create":{}}
{"_msg":"Application started successfully","level":"info","service":"api","_time":"2026-02-21T00:00:00Z"}
{"create":{}}
{"_msg":"GET /api/users 200 42ms","level":"info","service":"api","status_code":200,"duration_ms":42}
'

▶FluentBitPOST /loki/api/v1/push

Lightweight log shipper. Ideal for Docker, Kubernetes, and edge devices. Uses native Loki output with Basic Auth.

terminal

bash

# /etc/fluent-bit/fluent-bit.conf

[INPUT]
    Name         tail
    Path         /var/log/app/*.log
    Tag          app

[OUTPUT]
    Name         loki
    Match        *
    Host         ingest.getepok.dev
    Port         443
    TLS          On
    HTTP_User    YOUR_API_KEY
    HTTP_Passwd  x
    Labels       job=fluentbit, host=my-server
    drop_single_key on

▶VectorPOST /loki/api/v1/push

High-performance observability pipeline. Native Loki sink — same protocol as Promtail/Alloy.

terminal

bash

# vector.toml

[sources.app_logs]
type = "file"
include = ["/var/log/app/*.log"]

[sinks.epok]
type = "loki"
inputs = ["app_logs"]
endpoint = "https://ingest.getepok.dev"

[sinks.epok.encoding]
codec = "json"

[sinks.epok.auth]
strategy = "bearer"
token = "YOUR_API_KEY"

[sinks.epok.labels]
service = "{{ service }}"
host = "{{ host }}"

▶Promtail / Grafana AlloyPOST /loki/api/v1/push

If you already run Promtail or Grafana Alloy, point them at Epok. Native Loki protocol support.

terminal

bash

# promtail-config.yml

clients:
  - url: https://ingest.getepok.dev/loki/api/v1/push
    basic_auth:
      username: YOUR_API_KEY
      password: x

scrape_configs:
  - job_name: app
    static_configs:
      - targets: [localhost]
        labels:
          app: api
          __path__: /var/log/app/*.log

▶PythonPOST /loki/api/v1/push

Send logs directly from your application code.

terminal

bash

import time, httpx

resp = httpx.post(
    "https://ingest.getepok.dev/loki/api/v1/push",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "streams": [{
            "stream": {"app": "myapp", "env": "production"},
            "values": [[
                str(int(time.time())) + "000000000",
                "User signup completed for user_id=4821"
            ]]
        }]
    }
)

▶Node.jsPOST /insert/elasticsearch/_bulk

Send logs directly from a Node service. Uses built-in fetch — no dependencies.

terminal

bash

// No npm install needed — fetch is built-in on Node 18+.

const API_KEY = process.env.EPOK_API_KEY;

async function log(entries) {
  const body = entries
    .flatMap((e) => [
      JSON.stringify({ create: {} }),
      JSON.stringify({ _time: new Date().toISOString(), ...e }),
    ])
    .join("\n") + "\n";

  const r = await fetch(
    "https://ingest.getepok.dev/insert/elasticsearch/_bulk",
    {
      method: "POST",
      headers: {
        Authorization: `Bearer ${API_KEY}`,
        "Content-Type": "application/x-ndjson",
      },
      body,
    },
  );
  if (!r.ok) throw new Error(`epok ingest failed: ${r.status}`);
}

await log([
  { level: "info", service: "api", _msg: "signup ok user_id=4821" },
  { level: "error", service: "api", _msg: "payment failed order_id=9012" },
]);

▶GoPOST /insert/elasticsearch/_bulk

Stdlib-only Go client. Batch entries, use http.Client with timeout.

terminal

bash

package epok

import (
	"bytes"
	"encoding/json"
	"fmt"
	"net/http"
	"os"
	"time"
)

type Entry struct {
	Time    string `json:"_time"`
	Msg     string `json:"_msg"`
	Level   string `json:"level"`
	Service string `json:"service"`
}

var client = &http.Client{Timeout: 30 * time.Second}

func Send(entries []Entry) error {
	var buf bytes.Buffer
	for _, e := range entries {
		buf.WriteString(`{"create":{}}` + "\n")
		if err := json.NewEncoder(&buf).Encode(e); err != nil {
			return err
		}
	}
	req, err := http.NewRequest("POST",
		"https://ingest.getepok.dev/insert/elasticsearch/_bulk", &buf)
	if err != nil {
		return err
	}
	req.Header.Set("Authorization", "Bearer "+os.Getenv("EPOK_API_KEY"))
	req.Header.Set("Content-Type", "application/x-ndjson")
	resp, err := client.Do(req)
	if err != nil {
		return err
	}
	defer resp.Body.Close()
	if resp.StatusCode >= 300 {
		return fmt.Errorf("epok ingest failed: %s", resp.Status)
	}
	return nil
}

▶OpenTelemetry (OTLP)POST /v1/logs

Native OTLP HTTP support. Works with any OpenTelemetry SDK or Collector. Use the otlphttp exporter (not otlp/gRPC).

terminal

bash

# otel-collector-config.yml

exporters:
  otlphttp:
    endpoint: https://ingest.getepok.dev
    headers:
      Authorization: "Bearer YOUR_API_KEY"

service:
  pipelines:
    logs:
      receivers: [otlp]
      exporters: [otlphttp]

▶Loki APIPOST /loki/api/v1/push

Direct Loki push API. Works with any Loki-compatible client.

terminal

bash

curl -X POST https://ingest.getepok.dev/loki/api/v1/push \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
  "streams": [{
    "stream": {"app": "api", "env": "production"},
    "values": [
      ["1771632000000000000", "Application started successfully"],
      ["1771632001000000000", "GET /api/users 200 42ms"]
    ]
  }]
}'

▶FluentdPOST /api/v1/fluentd

Native Fluentd HTTP output. Tag-based service routing.

terminal

bash

# /etc/fluentd/fluent.conf

<source>
  @type tail
  path /var/log/app/*.log
  tag app.logs
</source>

<match app.**>
  @type http
  endpoint https://ingest.getepok.dev/api/v1/fluentd
  headers {"Authorization": "Bearer YOUR_API_KEY"}
  json_array false
  <format>
    @type json
  </format>
</match>

▶LogstashPOST /insert/elasticsearch/_bulk

Drop-in Elasticsearch output. Point your existing Logstash pipeline at Epok and flip API keys.

terminal

bash

# /etc/logstash/conf.d/epok.conf

output {
  elasticsearch {
    hosts         => ["https://ingest.getepok.dev"]
    index         => "logs"
    user          => "${EPOK_API_KEY}"
    password      => "x"
    ssl_enabled   => true
    http_compression => true
  }
}

▶FilebeatPOST /insert/elasticsearch/_bulk

Elastic's lightweight shipper. Uses the native Elasticsearch output — no plugin install.

terminal

bash

# filebeat.yml

filebeat.inputs:
  - type: filestream
    id: app-logs
    paths:
      - /var/log/app/*.log

output.elasticsearch:
  hosts: ["https://ingest.getepok.dev"]
  api_key: "${EPOK_API_KEY}"
  compression_level: 3
  bulk_max_size: 1000

▶rsyslogPOST /api/v1/syslog

Built-in on most Linux distros. Use the omhttp module to ship RFC 5424 frames directly over HTTPS.

terminal

bash

# /etc/rsyslog.d/50-epok.conf

module(load="omhttp")

action(
  type="omhttp"
  server="ingest.getepok.dev"
  serverport="443"
  usehttps="on"
  restpath="api/v1/syslog"
  httpcontenttype="text/plain"
  httpheaders=["Authorization: Bearer YOUR_API_KEY"]
  template="RSYSLOG_SyslogProtocol23Format"
  action.resumeRetryCount="-1"
  queue.type="LinkedList"
  queue.size="50000"
)

▶Splunk (migration)via Vector shim

Epok doesn't emulate the Splunk HEC wire protocol; point your existing HEC-bound Heavy Forwarder or Vector pipeline at Epok instead. Vector's `splunk_hec_logs` source pairs cleanly with its `elasticsearch` sink.

terminal

bash

# vector.toml — drop-in Splunk HEC shim

[sources.splunk_in]
type = "splunk_hec_logs"
address = "0.0.0.0:8088"
token = "YOUR_INTERNAL_HEC_TOKEN"

[sinks.epok]
type = "elasticsearch"
inputs = ["splunk_in"]
endpoint = "https://ingest.getepok.dev"
bulk.index = "logs"

[sinks.epok.auth]
strategy = "basic"
user = "YOUR_API_KEY"
password = "x"

▶Syslog (native HTTP)POST /api/v1/syslog

Native syslog HTTP endpoint. Accepts RFC 5424 and RFC 3164 frames as the request body. Simplest path for any tool that can POST — no relay needed.

terminal

bash

# Send a single syslog frame directly with curl
curl -X POST https://ingest.getepok.dev/api/v1/syslog \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: text/plain' \
  --data-binary '<134>1 2026-04-08T12:34:56Z host01 myapp 1234 - - User signup failed for user_id=4821'

# Or batch many frames in one POST (newline-delimited)
curl -X POST https://ingest.getepok.dev/api/v1/syslog \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: text/plain' \
  --data-binary @syslog-batch.txt

▶Syslog (via FluentBit relay)UDP/TCP 514 → HTTP

For network appliances (Cisco, Fortinet, Palo Alto) and legacy systems that can only send raw UDP/TCP syslog. Use FluentBit as a local relay to receive syslog and forward over HTTP.

terminal

bash

# /etc/fluent-bit/syslog-relay.conf

[INPUT]
    Name        syslog
    Listen      0.0.0.0
    Port        514
    Mode        udp

[OUTPUT]
    Name         loki
    Match        *
    Host         ingest.getepok.dev
    Port         443
    TLS          On
    HTTP_User    YOUR_API_KEY
    HTTP_Passwd  x
    Labels       job=syslog-relay

▶AWS CloudWatchPOST /api/v1/cloudwatch

Forward CloudWatch Logs via subscription filter. Native gzip decompression.

terminal

bash

# Create a Lambda subscription filter that POSTs to Epok.
# CloudWatch → Lambda → Epok

import base64, urllib3

EPOK_URL = "https://ingest.getepok.dev/api/v1/cloudwatch"
API_KEY = "YOUR_API_KEY"
http = urllib3.PoolManager()

def handler(event, context):
    # CloudWatch payload is base64-encoded gzip, send as-is
    compressed = base64.b64decode(event["awslogs"]["data"])
    http.request("POST", EPOK_URL,
        body=compressed,
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Content-Encoding": "gzip",
            "Content-Type": "application/json"
        })

▶GCP Cloud LoggingPOST /api/v1/ingest

Forward Google Cloud Logging via Pub/Sub sink and a Cloud Function.

terminal

bash

# GCP Cloud Logging → Pub/Sub → Cloud Function → Epok
# 1. Create a log sink that routes to a Pub/Sub topic
# 2. Deploy this Cloud Function as a Pub/Sub subscriber

import base64, json, requests

EPOK_URL = "https://ingest.getepok.dev/api/v1/ingest"
API_KEY = "YOUR_API_KEY"

def handle_log(event, context):
    data = json.loads(base64.b64decode(event["data"]))
    entry = {
        "_msg": data.get("textPayload", json.dumps(data.get("jsonPayload", {}))),
        "level": data.get("severity", "info").lower(),
        "service": data.get("resource", {}).get("type", "gcp"),
        "_time": data.get("timestamp"),
    }
    requests.post(EPOK_URL,
        headers={"Authorization": f"Bearer {API_KEY}"},
        json=[entry])

▶Generic JSONPOST /api/v1/ingest

Simplest format for custom applications. Send a JSON array or newline-delimited JSON.

terminal

bash

curl -X POST https://ingest.getepok.dev/api/v1/ingest \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '[
  {"_msg": "User signed up", "level": "info", "service": "auth", "user_id": 4821},
  {"_msg": "Payment processed", "level": "info", "service": "billing", "amount": 29.99}
]'

What Happens Next

Once your logs start flowing, Epok's intelligence engine activates automatically. No configuration needed.

Search and live tail work immediately

As soon as your first log arrives, you can search it and stream it live. No indexing delay.

New errors are detected from the first log

Epok fingerprints every error-level log message. When a never-before-seen error appears, it shows up in the New Errors feed within 5 minutes.

Silence detection activates within 1 hour

Epok learns each service's expected log cadence. If a service that was sending logs every 30 seconds goes quiet for 5 minutes, you'll get an alert.

Volume baselines build over 7 days

Log rate anomaly detection learns your normal patterns per service, per hour, per day of week. Early detection is active from day one with wider thresholds. Full precision by day seven.

Detectors

Epok ships 20 detectors that run automatically on every connected log stream. All detectors are included on every tier, including the 14-day trial — there is no detection-quality gate.

Each entry below is paired with an example alert as it would appear in the product. Thresholds, baseline windows, and tuning details are intentionally not published — those are implementation choices that change as we learn from your feedback.

See live detectors firing in the sandbox →

Statistical (5)

Volume Anomalylog_rate

Detects spikes, drops, and flatlines in log volume vs daily and weekly baselines per service.

EXAMPLE ALERT

api-gateway log volume dropped 87% from typical Wednesday-2am baseline

Silence Detectionsilence

Catches services that stop logging when they normally log every N seconds. The most dangerous failure mode: no errors, just absence.

EXAMPLE ALERT

auth-service silent for 5 minutes (typical baseline 30 logs/min)

Outlier Detectionisolation_forest

Multi-dimensional outliers in log feature space. Catches subtle anomalies that single-axis thresholds miss.

EXAMPLE ALERT

api-gateway request cluster outlier: status=200 + latency=12s + body=8MB (1 in 50,000 vs baseline)

Error Rate Anomalyerror_rate

Per-service error percentage anomalies vs baseline, with sustained-elevation guards so a single noisy minute doesn't fire and slow ramps still get caught.

EXAMPLE ALERT

api-gateway: 4.2% error rate vs 0.3% baseline (14x normal)

Recurring Pattern Detectionrecurring_pattern

Identifies log patterns that recur on a schedule — daily batch jobs, hourly cron runs, weekly reports — and flags when one fails to fire on its expected cadence.

EXAMPLE ALERT

nightly-backup: expected at 02:00 UTC, last seen 9 hours ago (3 missed runs)

Error Intelligence (2)

New Error Detectionnew_error

Catches errors that have never appeared in your 7-day baseline. On connect, the baseline seeds from your last 7 days of historical logs — push history for day-1 alerts, or wait a week for organic warm-up.

EXAMPLE ALERT

FATAL: connection pool exhausted (first seen 2 min ago in payment-service)

Pattern Clusteringpattern_cluster

Groups errors with similar templates so many variants of the same problem cluster into one alert. Surfaces brand-new clusters as they appear.

EXAMPLE ALERT

1,247 instances of new error pattern in 5 min: 'Deadlock detected on table users, retrying transaction...'

Domain-Specific (9)

Kubernetes Detectionk8s_intelligence

70+ rules for OOMKilled, CrashLoopBackOff, ImagePullBackOff, FailedScheduling, evictions, probe failures, and other Kubernetes failure modes.

EXAMPLE ALERT

pod payment-service-7f4b OOMKilled (3rd restart in 10 min)

AWS Service Detectionaws_intelligence

Patterns for RDS, S3, DynamoDB, ECS, EKS, IAM, KMS, Lambda, and 20+ other AWS services. Catches throttling, capacity events, IAM denials, and service-specific failure modes.

EXAMPLE ALERT

rds-primary: storage autoscaling triggered 3rd time in 2h (300 GB → 350 GB)

Serverless Detectionserverless_intelligence

Lambda timeouts, cold starts, throttling, init failures, runtime crashes, and concurrency limits across functions.

EXAMPLE ALERT

lambda payment-handler: 12 timeouts in 5 min, all hitting the 30s limit

Database Detectiondatabase_intelligence

Connection pool exhaustion, deadlocks, slow queries, replication lag, schema migration errors, and transaction aborts across Postgres, MySQL, and MongoDB.

EXAMPLE ALERT

postgres-primary: 8 deadlocks on the orders table in 90 seconds

Dependency Detectiondependency_intelligence

Upstream service failures, circuit breaker trips, retry exhaustion, and cascading failures between services.

EXAMPLE ALERT

notification-service: 60% of outbound calls to email-relay returning 502

Web / HTTP Detectionweb_intelligence

4xx and 5xx surges, slow endpoints, TLS handshake failures, gateway timeouts, and load balancer health events.

EXAMPLE ALERT

nginx-edge: 5xx rate jumped from 0.1% to 4.3% in 60 seconds

Security Event Detectionsecurity_intelligence

Brute-force authentication attempts, anomalous auth failures, privilege escalations, and suspicious access patterns from your auth and audit logs.

EXAMPLE ALERT

23 failed SSH auth attempts to bastion-1 from a single IP in 90 seconds

Search Detectionsearch_intelligence

Slow queries, query failures, index issues, and scoring anomalies in Elasticsearch / OpenSearch / Solr.

EXAMPLE ALERT

elasticsearch-primary: query latency p99 above 8 seconds for 4 consecutive minutes

Infrastructure Detectioninfrastructure_intelligence

Disk pressure, memory pressure, CPU steal, swap activity, kernel errors, and other host-level signals from system logs.

EXAMPLE ALERT

host worker-12: disk usage 94% on /var, growing 1.2 GB/hour

SLO & Performance (2)

Golden Signal Monitoringgolden_signal

Latency, saturation, and per-service error rate. Three of the four SRE golden signals — traffic is covered by Volume Anomaly.

EXAMPLE ALERT

checkout-service p99 latency: 4.2s vs baseline 380ms

SLO Monitoringslo_monitor

Error budget tracking with burn rate prediction. Get warned before the SLO breaches, not after.

EXAMPLE ALERT

checkout-service SLO: 14-day error budget at 87% burn, projected to exhaust in 36h

User-Defined (2)

Threshold Rulesthreshold_rule

Custom alert conditions on any query, with cooldown and duration guards. Use when you need a hand-tuned alert beside automatic detection.

EXAMPLE ALERT

user-defined: error_rate > 5% for 3 minutes (checkout-service)

Composite Rulescomposite_rule

Multi-condition alert rules combining several signals. Use when no single threshold captures the failure mode.

EXAMPLE ALERT

user-defined: (latency p99 > 2s AND error_rate spike) for 5 min on payment-service

Alert Management

Epok handles alert deduplication, grouping, escalation, and lifecycle automatically. Try in sandbox →

Deduplication

If the same anomaly (same detector, same service, same type) fires again while an incident is still open, Epok updates the existing alert with new evidence instead of sending another notification. Suppression windows stretch automatically for persistent issues so you never get paged about the same thing twice.

Severity escalation

Alerts that keep re-firing automatically escalate in severity. An INFO that refuses to resolve becomes a WARNING; a WARNING that persists becomes CRITICAL. Persistent problems get the attention they deserve without manual intervention.

Incident grouping

Multiple alerts from the same tenant within a short window are grouped into a single incident. Epok correlates related alerts across services so a cascade of failures produces one coherent incident instead of fifteen disconnected pages.

Auto-resolve

Alerts automatically resolve when the detector stops producing anomalies for that service. You can also manually resolve alerts from the dashboard with an optional note for the timeline.

Snooze and mute

Snooze an alert for a set duration during maintenance windows. Mute specific services or detector types to suppress known noisy patterns. Feedback from snooze/mute actions trains the self-tuning system.

Analysis Tools

When an alert fires, Epok automatically runs analysis to help you understand what happened, why, and what to do next. Deterministic analysis runs on every tier; AI-assisted explanations are included starting with the 14-day trial (capped by daily AI budget). Open an incident in the sandbox →

Root Cause Ranking

All tiers

Ranks potential root causes by scoring error patterns, causal language signals, timing correlation, and cross-service propagation. Outputs a ranked list of hypotheses.

Error categorization

All tiers

Classifies errors by failure type — connection, timeout, resource exhaustion, auth, configuration, data schema, rate limit, runtime crash, and more. Categories drive different investigation paths and RCA scoring.

What Changed (9 methods)

All tiers

Compares the anomaly window against a baseline period across 9 dimensions: new error patterns, volume shifts, field distribution changes, new log streams, disappeared streams, latency changes, status code shifts, new field values, and pattern frequency changes.

Blast Radius

All tiers

Determines which services, endpoints, and users are affected by an incident. Shows the scope of impact to help you prioritize response.

Cascade Timeline

All tiers

Reconstructs the sequence of failures across services. Shows which service failed first and how the failure propagated through dependencies.

Dimension Lift

All tiers

Identifies which field values are disproportionately represented in the anomaly. If 90% of errors come from region=us-east-1, Dimension Lift surfaces that automatically. AI-generated plain-language explanations are included starting with the trial.

Cross-service error matching

All tiers

Matches related errors across different services. When your API returns 500s and your database logs connection timeouts at the same time, Epok links them.

Service dependency graph

All tiers

Infers service-to-service dependencies from log patterns and error propagation. Visualizes which services depend on what.

Deploy correlation

All tiers

Detects recent deploys from log patterns (version strings, restart markers, config changes) and correlates anomalies with deploy timing.

AI incident narrative

Trial+

Plain-language summary of what happened, what's affected, and suggested next steps. Inlined into Slack alerts and the investigation view.

AI root-cause hypothesis

Trial+

LLM-assisted explanation on top of the deterministic RCA ranking. Turns signals into a readable theory of the incident.

Deep RCA

Trial+

On-demand, slower analysis that pulls more context (baseline comparison, correlated events, pattern history) and produces a longer write-up.

Dimension Lift explanation

Trial+

Natural-language explanation of why a dimension spiked — the shift, its scale, and whether it's the most likely cause.

Noise scoring

Trial+

LLM-scored noise rating on every alert to auto-tune suppression over time. Reduces alert fatigue without manual rule edits.

Natural-language query

Trial+

Ask "show me 5xx spikes from checkout in the last hour" and Epok translates to LogsQL. Scoped to your tenant.

AI Features

Every detector and deterministic analysis tool runs on every tier. AI-powered explanations sit on top: they turn detector output into readable prose, explain dimension shifts, auto-tune alert noise, and translate English into LogsQL. AI runs against your logs only at your tenant's request; your data is never used for model training.

Feature	Tier	What it does
Incident narrative	Trial+	Plain-language summary of what happened, what's affected, and suggested next steps. Inlined into Slack alerts and the investigation view.
Root-cause hypothesis	Trial+	LLM-assisted explanation on top of deterministic RCA ranking. Turns signals into a readable theory of the incident.
Suggested actions	Trial+	Actionable next steps tailored to the incident — "restart pod X", "check migration 0042", "rate-limit caller Y".
Title rewrite	Trial+	Converts detector-generated alert titles into human-readable summaries for alerts list and notification channels.
Deep RCA	Trial+	On-demand slower analysis pulling more context (baseline comparison, correlated events, pattern history) to produce a longer write-up.
Dimension Lift explanation	Trial+	Natural-language explanation of why a dimension spiked — the shift, its scale, and whether it's the most likely cause.
Noise scoring	Trial+	LLM-scored noise rating on every alert to auto-tune suppression over time. Reduces alert fatigue without manual rule edits.
Natural-language query	Trial+	Type "show me 5xx spikes from checkout in the last hour" and Epok translates it to LogsQL. Scoped to your tenant.

Daily AI credits

1 credit = 1 AI action. Trial: 200/day. Team: 500/day. Growth: larger budget. Credits reset at 00:00 UTC. Alert narratives are generated eagerly; on-demand features (Deep RCA, NL query) consume credits per invocation.

Data privacy

Epok sends only the minimum necessary context (log samples, detector evidence, service names) to the AI provider. Payloads are not retained by the provider and are not used for model training. Your logs stay on Epok's servers; the LLM never gets bulk access.

Notifications

Configure where Epok sends alerts. Trial and Team include channels; Growth and Enterprise are unlimited.

Slack

Incoming webhook integration. Alerts include severity, affected service, description, and a link to the investigation view. On Team tier and above, AI-generated incident narratives are included inline.

Add a Slack incoming webhook URL in Settings > Notification Channels.

PagerDuty

Native Events API v2 integration. Alerts map to PagerDuty incidents with severity, dedup key, and custom details. Resolved alerts auto-resolve in PagerDuty.

Add your PagerDuty integration key (Events API v2) in Settings > Notification Channels.

Webhook

Send alert JSON to any HTTP endpoint. Use this to integrate with OpsGenie, Microsoft Teams, Discord, or custom systems.

Add a webhook URL in Settings > Notification Channels. Epok sends a POST with the alert payload as JSON.

Email

Email notifications for alerts. Includes a summary with links to the dashboard for investigation.

Add email addresses in Settings > Notification Channels.

Delivery guarantees: Notifications are batched and retried with exponential backoff. Failed notifications go to a dead-letter queue and are recovered on restart.

Team Management

Epok supports team collaboration with role-based access control.

Roles

Three roles: Owner (full access, can manage billing and delete tenant), Admin (manage members, API keys, settings), and Member (view alerts, search logs, investigate incidents).

Inviting team members

Owners and admins can create invite links in Settings. New members sign in with Google and are automatically added to your tenant with the role you specify.

Tier limits

Tier	Daily ingest	Retention	Users	API keys	Services
Trial	107 GB	14 days	3	2	10,000
Team	50 GB	30 days	10	5	10,000
Growth	167 GB	30 days	Unlimited	20	Unlimited
Custom	1024 GB	365 days	Unlimited	Unlimited	Unlimited

All tiers include every intelligence detector. See pricing for full feature comparison.

Configuration

Epok works with zero configuration out of the box. All settings below are optional and can be adjusted in the dashboard.

Detection sensitivity

Volume anomaly detection calibrates itself to each service's normal traffic pattern and flags spikes, drops, and flatlines. You can adjust sensitivity per service if a stream is genuinely bursty by design, but the defaults work without tuning for almost every workload.

Threshold + composite rules

Custom rules for hard constraints. Threshold rules fire when a LogsQL query crosses a number; composite rules fire when multiple signals are simultaneously active. Full reference below →

Trial: 5 threshold + 5 composite. Team: 20 + 5. Growth: unlimited.

SLO monitoring

Define Service Level Objectives with error budget tracking. Epok monitors burn rate and predicts when your SLO will breach. Trial: 5 SLOs. Team: 5. Growth: unlimited.

Self-tuning thresholds (Team+)

Epok learns from your feedback. When you snooze, mute, or resolve alerts, the system adjusts sensitivity to reduce noise over time. No manual threshold tuning needed.

Custom rules — reference

Two flavours of user-defined rule. Threshold rules fire when a LogsQL query crosses a number over a window. Composite rules fire when two or more signals are simultaneously true. Both are plain JSON; both speak the same LogsQL you use in Explore.

Threshold rules

Endpoint: POST /api/v1/tenants/<id>/rules

json

{
  "name": "Payment refund burst",
  "query": "service:payment AND _msg:refund AND amount > 1000",
  "condition_op": "gt",
  "condition_value": 5,
  "window_seconds": 300,
  "severity": "critical",
  "for_duration_seconds": 60,
  "cooldown_seconds": 600,
  "channel_ids": [12, 8],
  "enabled": true
}

Field	Type	Description
name	string	Human-readable, surfaced in alerts.
query	LogsQL	The query whose count is checked against the condition. Same syntax as Explore.
condition_op	enum	gt · gte · lt · lte · eq · neq
condition_value	number	Threshold the hit count must satisfy.
window_seconds	int	Look-back window. Default 300 (5 min).
severity	enum	info · warning · critical. Drives notification routing + paging behavior.
for_duration_seconds	int	Condition must hold for at least this long before firing. Eliminates flapping.
cooldown_seconds	int	Minimum time between consecutive fires for this rule. Default 0.
channel_ids	int[]	Notification channel IDs. Omit to use tenant defaults.
enabled	bool	Toggle without deleting. Default true.

Worked examples

Auth-failed burst

query: level:warn AND _msg:"auth failed"
fires: count gt 50 in 60s · critical

Brute-force / credential stuffing. 50 events in 60s is too many.

Database connection storm

query: service:checkout AND _msg:"connection refused"
fires: count gt 10 in 60s · critical

Upstream DB is down or pool is exhausted. Pair with composite rule.

Slow request anomaly

query: service:api AND latency_ms:>5000
fires: count gt 3 in 300s · warning

Hard ceiling on user-visible latency. 3 occurrences in 5min crosses SLO budget.

Refund / fraud guard

query: service:payment AND _msg:"refund issued" AND amount:>1000
fires: count gt 5 in 300s · critical

High-value refund cluster — page someone immediately for fraud review.

Composite rules

Endpoint: POST /api/v1/tenants/<id>/composite-rules

json

{
  "name": "API degraded + DB struggling",
  "expression": {
    "op": "and",
    "conditions": [
      {
        "op": "threshold",
        "query": "service:api AND status_code:>=500",
        "comparator": "gt",
        "value": 10,
        "window_seconds": 300
      },
      {
        "op": "detector_active",
        "detector_type": "database_intelligence"
      }
    ]
  },
  "severity": "critical",
  "cooldown_seconds": 300,
  "channel_ids": [12],
  "enabled": true
}

Leaf operators (measurements)

threshold

A LogsQL query crossing a number in a window. Requires query + value; comparator defaults to gt; window_seconds defaults to 300.

{ "op": "threshold",
  "query": "level:error",
  "comparator": "gt",
  "value": 50,
  "window_seconds": 300 }

alert_firing

A specific alert (by dedup-key, severity, or service) is currently active. All fields optional — match anything when omitted.

{ "op": "alert_firing",
  "dedup_key": "lr:api:..." }

detector_active

A built-in detector type has produced anomalies. Pair built-in detection with your own threshold to filter noise.

{ "op": "detector_active",
  "detector_type": "silence" }

Branch operators (combine)

and (all conditions true), or (any true), not (single child false). Branches take a conditions array. Expressions can nest up to 5 levels deep; expression size is capped at 10KB.

Version-control your rules

Rules are JSON; export with GET /rules, commit to your repo, apply from CI with POST. Put alert config next to service code so a deploy and its alerts ship together.

Test before deploy

Run the rule's LogsQL query in Explore against your last 24h. The condition fires on the count of hits over your window — same number Explore shows you.

Anti-patterns

Don't alert on noisy level:info patterns — use the detectors.
Set for_duration_seconds > 0 for any latency or rate rule to kill flapping.
Composite rules with a single operand should be threshold rules — keep composites for real AND/OR logic.

Tier limits

Trial: 5 threshold + 5 composite
Team: 20 threshold + 5 composite
Growth, Enterprise: unlimited

API Reference

Key endpoints for programmatic access. All endpoints require authentication via API key. Full OpenAPI reference →

Download the raw spec at /openapi.json to generate clients or import into Postman.

Method	Endpoint	Description
GET	/health	Health check
GET	/api/v1/alerts	List alerts (active + recent resolved)
GET	/api/v1/alerts/:id	Get alert detail with analysis
POST	/api/v1/alerts/:id/resolve	Manually resolve an alert
GET	/api/v1/streams	List monitored log streams
GET	/api/v1/new-errors	List new error patterns
GET	/api/v1/patterns	List detected log patterns
GET	/api/v1/search	Full-text log search
GET	/api/v1/facets	Field facets for filtering
GET	/api/v1/hits	Log volume histogram
WS	/ws/livetail/:tenant_id	WebSocket live tail (authenticate with API key in query string or cookie)
WS	/ws/alerts/:tenant_id	WebSocket alert stream (real-time incident updates)
GET	/api/v1/detectors	List registered detectors
POST	/api/v1/channels	Add notification channel
GET	/api/v1/channels	List notification channels
GET	/metrics	Prometheus metrics

Example: List active alerts

terminal

bash

curl https://app.getepok.dev/api/v1/alerts?state=firing \
  -H 'Authorization: Bearer YOUR_API_KEY'

Example: Search logs

terminal

bash

curl 'https://app.getepok.dev/api/v1/search?query=level%3Aerror&start=-1h&limit=100' \
  -H 'Authorization: Bearer YOUR_API_KEY'

Rate Limits & Errors

Epok enforces per-tenant quotas so one tenant can't degrade the platform for everyone else. All limits are documented; nothing is secret or negotiated on a case-by-case basis.

Ingest rate limit

Logs are rate-limited per tenant. When you exceed the limit, Epok returns HTTP 429 Too Many Requests with a Retry-After header (seconds). Retry after that many seconds with exponential backoff to avoid thundering herd on recovery.

Tier	Ingest rate	API query rate	Daily volume
Trial	500 events/sec	1,200 req/min	107 GB
Team	500 events/sec	1,200 req/min	50 GB
Growth	2,000 events/sec	6,000 req/min	167 GB

Hitting the daily-volume ceiling pauses ingest until the next UTC day on the trial; paid tiers bill overage per GB. Live-tail sessions and saved-search counts have separate caps — see pricing.

HTTP error codes

Code	Meaning	What to do
200	OK	Successful response. For ingest, all logs were accepted.
400	Bad Request	Payload is malformed. Check NDJSON formatting, timestamp, and required fields. Response body has the specific issue.
401	Unauthorized	Missing, invalid, or expired credentials. Check your API key or re-authenticate via Google OAuth.
403	Forbidden	Authenticated but lacking permission. Some admin endpoints require the owner or admin role.
404	Not Found	Resource doesn't exist or your API key isn't scoped to its tenant.
409	Conflict	Duplicate resource (e.g. creating a tenant with an existing account_id).
413	Payload Too Large	Single log line exceeds 1 MB or batch exceeds the ingest size cap. Split into smaller batches.
429	Too Many Requests	Rate limit hit. Honor the `Retry-After` header and back off exponentially.
500	Internal Server Error	Server-side issue. Retry with backoff. If it persists, email support@getepok.dev with the request ID from the response headers.
503	Service Unavailable	Temporary overload or deploy in progress. Retry with backoff.

Recommended retry strategy

For ingest, buffer locally and retry idempotently. Exponential backoff with jitter prevents correlated retries after an upstream hiccup. A simple loop:

retry.py

python

import time, random, httpx

def send(events, *, max_attempts=5):
    for attempt in range(max_attempts):
        r = httpx.post(
            "https://ingest.getepok.dev/insert/elasticsearch/_bulk",
            headers={"Authorization": f"Bearer {API_KEY}"},
            content=events,
            timeout=30,
        )
        if r.status_code < 500 and r.status_code != 429:
            return r
        delay = float(r.headers.get("Retry-After", 2 ** attempt))
        time.sleep(delay + random.uniform(0, 0.5))
    raise RuntimeError(f"ingest failed after {max_attempts} attempts")

Logs flow during incidents. Epok never drops logs during a traffic spike on paid tiers. Overage is billed per GB; you can set budget alerts in Settings to be notified before you hit a limit.

Migrating From Another Tool

Moving from Datadog, Splunk, or Loki? Dual-ship for a day, verify parity, then cut over on your own schedule. The migration guide walks through each source with concrete Vector configs.

Read the Migration Guide →

What's New

Release notes with the actual commits. Every change is traceable back to the code that shipped it.

Read the Changelog →

Documentation

Quick Start

Authentication

Quickstart by platform

Supported Integrations

Configuration Examples

What Happens Next

Search and live tail work immediately

New errors are detected from the first log

Silence detection activates within 1 hour

Volume baselines build over 7 days

Detectors

Statistical (5)

Error Intelligence (2)

Domain-Specific (9)

SLO & Performance (2)

User-Defined (2)

Alert Management

Deduplication

Severity escalation

Incident grouping

Auto-resolve

Snooze and mute

Analysis Tools

AI Features

Daily AI credits

Data privacy

Notifications

Slack

PagerDuty

Webhook

Email

Team Management

Roles

Inviting team members

Tier limits

Configuration

Detection sensitivity

Threshold + composite rules

SLO monitoring

Self-tuning thresholds (Team+)

Custom rules — reference

Threshold rules

Worked examples

Composite rules

Leaf operators (measurements)

Branch operators (combine)

Version-control your rules

Test before deploy

Anti-patterns

Tier limits

API Reference

Example: List active alerts

Example: Search logs

Rate Limits & Errors

Ingest rate limit

HTTP error codes

Recommended retry strategy

Migrating From Another Tool

What's New

Further Reading