epok

Documentation

Updated May 31, 2026 · today

Get started with Epok in under 5 minutes. Send your first logs and let the intelligence engine do the rest.

The sandbox is read-only, logged in as a demo tenant with pre-seeded logs. No sign-up required.

Quick Start

Send your first log entry. Replace YOUR_API_KEY with your key from Settings — see Authentication below for header formats.

terminal
bash
curl -X POST https://ingest.getepok.dev/insert/elasticsearch/_bulk \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '
{"create":{}}
{"_msg":"Application started","level":"info","service":"api","_time":"2026-02-21T00:00:00Z"}
'

That's it. Your logs appear in real time immediately. Anomaly detection activates automatically.

Want to see it working first? The sandbox tenant has live logs, real alerts, and working detectors.

Open sandbox → Explore

Quickstart wizard

Two questions → the exact snippet to paste.

1

Where are your logs coming from?

Authentication

Epok uses API keys for log ingestion. You'll get a default API key when you sign up. Find it in Settings.

Include your API key in every request using any of these methods:

Authorization: Bearer epk_your_api_key

or

Authorization: Basic base64(epk_your_api_key:x)

or

X-API-Key: epk_your_api_key

Basic Auth is used by Loki-native shippers (FluentBit, Promtail, Grafana Alloy). Set the username to your API key and the password to any value.

Quickstart by platform

Pick your stack — each guide is a 5-minute end-to-end setup with copy-paste-ready config, the exact verification steps, and the gotchas we've seen on real deployments.

Not on this list? The protocol table below works for any shipper that speaks Elasticsearch Bulk, Loki Push, OTLP HTTP, FluentBit, Fluentd, Syslog, or plain JSON.

Supported Integrations

Epok accepts logs from any source. Pick the integration that fits your stack.

ProtocolEndpoint
Elasticsearch Bulk/_bulk
Loki Push/loki/api/v1/push
OTLP HTTP/v1/logs
FluentBit Native/api/v1/fluent
Fluentd/api/v1/fluentd
Syslog (HTTP)/api/v1/syslog
CloudWatch/api/v1/cloudwatch
GCP Cloud Logging/api/v1/ingest
Generic JSON/api/v1/ingest
Syslog formats: RFC 5424, RFC 3164, Cisco IOS, Fortinet FortiGate, Palo Alto Networks, and HP/Aruba ProCurve are all parsed automatically.

Configuration Examples

Copy-paste configs for every supported shipper. Replace YOUR_API_KEY with your key.

curlPOST /insert/elasticsearch/_bulk

The fastest way to test. Send a log line from your terminal.

terminal
bash
curl -X POST https://ingest.getepok.dev/insert/elasticsearch/_bulk \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '
{"create":{}}
{"_msg":"Application started successfully","level":"info","service":"api","_time":"2026-02-21T00:00:00Z"}
{"create":{}}
{"_msg":"GET /api/users 200 42ms","level":"info","service":"api","status_code":200,"duration_ms":42}
'
FluentBitPOST /loki/api/v1/push

Lightweight log shipper. Ideal for Docker, Kubernetes, and edge devices. Uses native Loki output with Basic Auth.

terminal
bash
# /etc/fluent-bit/fluent-bit.conf

[INPUT]
    Name         tail
    Path         /var/log/app/*.log
    Tag          app

[OUTPUT]
    Name         loki
    Match        *
    Host         ingest.getepok.dev
    Port         443
    TLS          On
    HTTP_User    YOUR_API_KEY
    HTTP_Passwd  x
    Labels       job=fluentbit, host=my-server
    drop_single_key on
VectorPOST /loki/api/v1/push

High-performance observability pipeline. Native Loki sink — same protocol as Promtail/Alloy.

terminal
bash
# vector.toml

[sources.app_logs]
type = "file"
include = ["/var/log/app/*.log"]

[sinks.epok]
type = "loki"
inputs = ["app_logs"]
endpoint = "https://ingest.getepok.dev"

[sinks.epok.encoding]
codec = "json"

[sinks.epok.auth]
strategy = "bearer"
token = "YOUR_API_KEY"

[sinks.epok.labels]
service = "{{ service }}"
host = "{{ host }}"
Promtail / Grafana AlloyPOST /loki/api/v1/push

If you already run Promtail or Grafana Alloy, point them at Epok. Native Loki protocol support.

terminal
bash
# promtail-config.yml

clients:
  - url: https://ingest.getepok.dev/loki/api/v1/push
    basic_auth:
      username: YOUR_API_KEY
      password: x

scrape_configs:
  - job_name: app
    static_configs:
      - targets: [localhost]
        labels:
          app: api
          __path__: /var/log/app/*.log
PythonPOST /loki/api/v1/push

Send logs directly from your application code.

terminal
bash
import time, httpx

resp = httpx.post(
    "https://ingest.getepok.dev/loki/api/v1/push",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "streams": [{
            "stream": {"app": "myapp", "env": "production"},
            "values": [[
                str(int(time.time())) + "000000000",
                "User signup completed for user_id=4821"
            ]]
        }]
    }
)
Node.jsPOST /insert/elasticsearch/_bulk

Send logs directly from a Node service. Uses built-in fetch — no dependencies.

terminal
bash
// No npm install needed — fetch is built-in on Node 18+.

const API_KEY = process.env.EPOK_API_KEY;

async function log(entries) {
  const body = entries
    .flatMap((e) => [
      JSON.stringify({ create: {} }),
      JSON.stringify({ _time: new Date().toISOString(), ...e }),
    ])
    .join("\n") + "\n";

  const r = await fetch(
    "https://ingest.getepok.dev/insert/elasticsearch/_bulk",
    {
      method: "POST",
      headers: {
        Authorization: `Bearer ${API_KEY}`,
        "Content-Type": "application/x-ndjson",
      },
      body,
    },
  );
  if (!r.ok) throw new Error(`epok ingest failed: ${r.status}`);
}

await log([
  { level: "info", service: "api", _msg: "signup ok user_id=4821" },
  { level: "error", service: "api", _msg: "payment failed order_id=9012" },
]);
GoPOST /insert/elasticsearch/_bulk

Stdlib-only Go client. Batch entries, use http.Client with timeout.

terminal
bash
package epok

import (
	"bytes"
	"encoding/json"
	"fmt"
	"net/http"
	"os"
	"time"
)

type Entry struct {
	Time    string `json:"_time"`
	Msg     string `json:"_msg"`
	Level   string `json:"level"`
	Service string `json:"service"`
}

var client = &http.Client{Timeout: 30 * time.Second}

func Send(entries []Entry) error {
	var buf bytes.Buffer
	for _, e := range entries {
		buf.WriteString(`{"create":{}}` + "\n")
		if err := json.NewEncoder(&buf).Encode(e); err != nil {
			return err
		}
	}
	req, err := http.NewRequest("POST",
		"https://ingest.getepok.dev/insert/elasticsearch/_bulk", &buf)
	if err != nil {
		return err
	}
	req.Header.Set("Authorization", "Bearer "+os.Getenv("EPOK_API_KEY"))
	req.Header.Set("Content-Type", "application/x-ndjson")
	resp, err := client.Do(req)
	if err != nil {
		return err
	}
	defer resp.Body.Close()
	if resp.StatusCode >= 300 {
		return fmt.Errorf("epok ingest failed: %s", resp.Status)
	}
	return nil
}
OpenTelemetry (OTLP)POST /v1/logs

Native OTLP HTTP support. Works with any OpenTelemetry SDK or Collector. Use the otlphttp exporter (not otlp/gRPC).

terminal
bash
# otel-collector-config.yml

exporters:
  otlphttp:
    endpoint: https://ingest.getepok.dev
    headers:
      Authorization: "Bearer YOUR_API_KEY"

service:
  pipelines:
    logs:
      receivers: [otlp]
      exporters: [otlphttp]
Loki APIPOST /loki/api/v1/push

Direct Loki push API. Works with any Loki-compatible client.

terminal
bash
curl -X POST https://ingest.getepok.dev/loki/api/v1/push \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
  "streams": [{
    "stream": {"app": "api", "env": "production"},
    "values": [
      ["1771632000000000000", "Application started successfully"],
      ["1771632001000000000", "GET /api/users 200 42ms"]
    ]
  }]
}'
FluentdPOST /api/v1/fluentd

Native Fluentd HTTP output. Tag-based service routing.

terminal
bash
# /etc/fluentd/fluent.conf

<source>
  @type tail
  path /var/log/app/*.log
  tag app.logs
</source>

<match app.**>
  @type http
  endpoint https://ingest.getepok.dev/api/v1/fluentd
  headers {"Authorization": "Bearer YOUR_API_KEY"}
  json_array false
  <format>
    @type json
  </format>
</match>
LogstashPOST /insert/elasticsearch/_bulk

Drop-in Elasticsearch output. Point your existing Logstash pipeline at Epok and flip API keys.

terminal
bash
# /etc/logstash/conf.d/epok.conf

output {
  elasticsearch {
    hosts         => ["https://ingest.getepok.dev"]
    index         => "logs"
    user          => "${EPOK_API_KEY}"
    password      => "x"
    ssl_enabled   => true
    http_compression => true
  }
}
FilebeatPOST /insert/elasticsearch/_bulk

Elastic's lightweight shipper. Uses the native Elasticsearch output — no plugin install.

terminal
bash
# filebeat.yml

filebeat.inputs:
  - type: filestream
    id: app-logs
    paths:
      - /var/log/app/*.log

output.elasticsearch:
  hosts: ["https://ingest.getepok.dev"]
  api_key: "${EPOK_API_KEY}"
  compression_level: 3
  bulk_max_size: 1000
rsyslogPOST /api/v1/syslog

Built-in on most Linux distros. Use the omhttp module to ship RFC 5424 frames directly over HTTPS.

terminal
bash
# /etc/rsyslog.d/50-epok.conf

module(load="omhttp")

action(
  type="omhttp"
  server="ingest.getepok.dev"
  serverport="443"
  usehttps="on"
  restpath="api/v1/syslog"
  httpcontenttype="text/plain"
  httpheaders=["Authorization: Bearer YOUR_API_KEY"]
  template="RSYSLOG_SyslogProtocol23Format"
  action.resumeRetryCount="-1"
  queue.type="LinkedList"
  queue.size="50000"
)
Splunk (migration)via Vector shim

Epok doesn't emulate the Splunk HEC wire protocol; point your existing HEC-bound Heavy Forwarder or Vector pipeline at Epok instead. Vector's `splunk_hec_logs` source pairs cleanly with its `elasticsearch` sink.

terminal
bash
# vector.toml — drop-in Splunk HEC shim

[sources.splunk_in]
type = "splunk_hec_logs"
address = "0.0.0.0:8088"
token = "YOUR_INTERNAL_HEC_TOKEN"

[sinks.epok]
type = "elasticsearch"
inputs = ["splunk_in"]
endpoint = "https://ingest.getepok.dev"
bulk.index = "logs"

[sinks.epok.auth]
strategy = "basic"
user = "YOUR_API_KEY"
password = "x"
Syslog (native HTTP)POST /api/v1/syslog

Native syslog HTTP endpoint. Accepts RFC 5424 and RFC 3164 frames as the request body. Simplest path for any tool that can POST — no relay needed.

terminal
bash
# Send a single syslog frame directly with curl
curl -X POST https://ingest.getepok.dev/api/v1/syslog \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: text/plain' \
  --data-binary '<134>1 2026-04-08T12:34:56Z host01 myapp 1234 - - User signup failed for user_id=4821'

# Or batch many frames in one POST (newline-delimited)
curl -X POST https://ingest.getepok.dev/api/v1/syslog \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: text/plain' \
  --data-binary @syslog-batch.txt
Syslog (via FluentBit relay)UDP/TCP 514 → HTTP

For network appliances (Cisco, Fortinet, Palo Alto) and legacy systems that can only send raw UDP/TCP syslog. Use FluentBit as a local relay to receive syslog and forward over HTTP.

terminal
bash
# /etc/fluent-bit/syslog-relay.conf

[INPUT]
    Name        syslog
    Listen      0.0.0.0
    Port        514
    Mode        udp

[OUTPUT]
    Name         loki
    Match        *
    Host         ingest.getepok.dev
    Port         443
    TLS          On
    HTTP_User    YOUR_API_KEY
    HTTP_Passwd  x
    Labels       job=syslog-relay
AWS CloudWatchPOST /api/v1/cloudwatch

Forward CloudWatch Logs via subscription filter. Native gzip decompression.

terminal
bash
# Create a Lambda subscription filter that POSTs to Epok.
# CloudWatch → Lambda → Epok

import base64, urllib3

EPOK_URL = "https://ingest.getepok.dev/api/v1/cloudwatch"
API_KEY = "YOUR_API_KEY"
http = urllib3.PoolManager()

def handler(event, context):
    # CloudWatch payload is base64-encoded gzip, send as-is
    compressed = base64.b64decode(event["awslogs"]["data"])
    http.request("POST", EPOK_URL,
        body=compressed,
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Content-Encoding": "gzip",
            "Content-Type": "application/json"
        })
GCP Cloud LoggingPOST /api/v1/ingest

Forward Google Cloud Logging via Pub/Sub sink and a Cloud Function.

terminal
bash
# GCP Cloud Logging → Pub/Sub → Cloud Function → Epok
# 1. Create a log sink that routes to a Pub/Sub topic
# 2. Deploy this Cloud Function as a Pub/Sub subscriber

import base64, json, requests

EPOK_URL = "https://ingest.getepok.dev/api/v1/ingest"
API_KEY = "YOUR_API_KEY"

def handle_log(event, context):
    data = json.loads(base64.b64decode(event["data"]))
    entry = {
        "_msg": data.get("textPayload", json.dumps(data.get("jsonPayload", {}))),
        "level": data.get("severity", "info").lower(),
        "service": data.get("resource", {}).get("type", "gcp"),
        "_time": data.get("timestamp"),
    }
    requests.post(EPOK_URL,
        headers={"Authorization": f"Bearer {API_KEY}"},
        json=[entry])
Generic JSONPOST /api/v1/ingest

Simplest format for custom applications. Send a JSON array or newline-delimited JSON.

terminal
bash
curl -X POST https://ingest.getepok.dev/api/v1/ingest \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '[
  {"_msg": "User signed up", "level": "info", "service": "auth", "user_id": 4821},
  {"_msg": "Payment processed", "level": "info", "service": "billing", "amount": 29.99}
]'

What Happens Next

Once your logs start flowing, Epok's intelligence engine activates automatically. No configuration needed.

1

Search and live tail work immediately

As soon as your first log arrives, you can search it and stream it live. No indexing delay.

2

New errors are detected from the first log

Epok fingerprints every error-level log message. When a never-before-seen error appears, it shows up in the New Errors feed within 5 minutes.

3

Silence detection activates within 1 hour

Epok learns each service's expected log cadence. If a service that was sending logs every 30 seconds goes quiet for 5 minutes, you'll get an alert.

4

Volume baselines build over 7 days

Log rate anomaly detection learns your normal patterns per service, per hour, per day of week. Early detection is active from day one with wider thresholds. Full precision by day seven.

Detectors

Epok ships 20 detectors that run automatically on every connected log stream. All detectors are included on every tier, including the 14-day trial — there is no detection-quality gate.

Each entry below is paired with an example alert as it would appear in the product. Thresholds, baseline windows, and tuning details are intentionally not published — those are implementation choices that change as we learn from your feedback.

Statistical (5)

Detects spikes, drops, and flatlines in log volume vs daily and weekly baselines per service.

EXAMPLE ALERT

api-gateway log volume dropped 87% from typical Wednesday-2am baseline

Catches services that stop logging when they normally log every N seconds. The most dangerous failure mode: no errors, just absence.

EXAMPLE ALERT

auth-service silent for 5 minutes (typical baseline 30 logs/min)

Outlier Detectionisolation_forest

Multi-dimensional outliers in log feature space. Catches subtle anomalies that single-axis thresholds miss.

EXAMPLE ALERT

api-gateway request cluster outlier: status=200 + latency=12s + body=8MB (1 in 50,000 vs baseline)

Per-service error percentage anomalies vs baseline, with sustained-elevation guards so a single noisy minute doesn't fire and slow ramps still get caught.

EXAMPLE ALERT

api-gateway: 4.2% error rate vs 0.3% baseline (14x normal)

Identifies log patterns that recur on a schedule — daily batch jobs, hourly cron runs, weekly reports — and flags when one fails to fire on its expected cadence.

EXAMPLE ALERT

nightly-backup: expected at 02:00 UTC, last seen 9 hours ago (3 missed runs)

Error Intelligence (2)

Catches errors that have never appeared in your 7-day baseline. On connect, the baseline seeds from your last 7 days of historical logs — push history for day-1 alerts, or wait a week for organic warm-up.

EXAMPLE ALERT

FATAL: connection pool exhausted (first seen 2 min ago in payment-service)

Pattern Clusteringpattern_cluster

Groups errors with similar templates so many variants of the same problem cluster into one alert. Surfaces brand-new clusters as they appear.

EXAMPLE ALERT

1,247 instances of new error pattern in 5 min: 'Deadlock detected on table users, retrying transaction...'

Domain-Specific (9)

Kubernetes Detectionk8s_intelligence

70+ rules for OOMKilled, CrashLoopBackOff, ImagePullBackOff, FailedScheduling, evictions, probe failures, and other Kubernetes failure modes.

EXAMPLE ALERT

pod payment-service-7f4b OOMKilled (3rd restart in 10 min)

AWS Service Detectionaws_intelligence

Patterns for RDS, S3, DynamoDB, ECS, EKS, IAM, KMS, Lambda, and 20+ other AWS services. Catches throttling, capacity events, IAM denials, and service-specific failure modes.

EXAMPLE ALERT

rds-primary: storage autoscaling triggered 3rd time in 2h (300 GB → 350 GB)

Serverless Detectionserverless_intelligence

Lambda timeouts, cold starts, throttling, init failures, runtime crashes, and concurrency limits across functions.

EXAMPLE ALERT

lambda payment-handler: 12 timeouts in 5 min, all hitting the 30s limit

Database Detectiondatabase_intelligence

Connection pool exhaustion, deadlocks, slow queries, replication lag, schema migration errors, and transaction aborts across Postgres, MySQL, and MongoDB.

EXAMPLE ALERT

postgres-primary: 8 deadlocks on the orders table in 90 seconds

Dependency Detectiondependency_intelligence

Upstream service failures, circuit breaker trips, retry exhaustion, and cascading failures between services.

EXAMPLE ALERT

notification-service: 60% of outbound calls to email-relay returning 502

Web / HTTP Detectionweb_intelligence

4xx and 5xx surges, slow endpoints, TLS handshake failures, gateway timeouts, and load balancer health events.

EXAMPLE ALERT

nginx-edge: 5xx rate jumped from 0.1% to 4.3% in 60 seconds

Security Event Detectionsecurity_intelligence

Brute-force authentication attempts, anomalous auth failures, privilege escalations, and suspicious access patterns from your auth and audit logs.

EXAMPLE ALERT

23 failed SSH auth attempts to bastion-1 from a single IP in 90 seconds

Search Detectionsearch_intelligence

Slow queries, query failures, index issues, and scoring anomalies in Elasticsearch / OpenSearch / Solr.

EXAMPLE ALERT

elasticsearch-primary: query latency p99 above 8 seconds for 4 consecutive minutes

Infrastructure Detectioninfrastructure_intelligence

Disk pressure, memory pressure, CPU steal, swap activity, kernel errors, and other host-level signals from system logs.

EXAMPLE ALERT

host worker-12: disk usage 94% on /var, growing 1.2 GB/hour

SLO & Performance (2)

Latency, saturation, and per-service error rate. Three of the four SRE golden signals — traffic is covered by Volume Anomaly.

EXAMPLE ALERT

checkout-service p99 latency: 4.2s vs baseline 380ms

SLO Monitoringslo_monitor

Error budget tracking with burn rate prediction. Get warned before the SLO breaches, not after.

EXAMPLE ALERT

checkout-service SLO: 14-day error budget at 87% burn, projected to exhaust in 36h

User-Defined (2)

Threshold Rulesthreshold_rule

Custom alert conditions on any query, with cooldown and duration guards. Use when you need a hand-tuned alert beside automatic detection.

EXAMPLE ALERT

user-defined: error_rate > 5% for 3 minutes (checkout-service)

Composite Rulescomposite_rule

Multi-condition alert rules combining several signals. Use when no single threshold captures the failure mode.

EXAMPLE ALERT

user-defined: (latency p99 > 2s AND error_rate spike) for 5 min on payment-service

Alert Management

Epok handles alert deduplication, grouping, escalation, and lifecycle automatically. Try in sandbox →

Deduplication

If the same anomaly (same detector, same service, same type) fires again while an incident is still open, Epok updates the existing alert with new evidence instead of sending another notification. Suppression windows stretch automatically for persistent issues so you never get paged about the same thing twice.

Severity escalation

Alerts that keep re-firing automatically escalate in severity. An INFO that refuses to resolve becomes a WARNING; a WARNING that persists becomes CRITICAL. Persistent problems get the attention they deserve without manual intervention.

Incident grouping

Multiple alerts from the same tenant within a short window are grouped into a single incident. Epok correlates related alerts across services so a cascade of failures produces one coherent incident instead of fifteen disconnected pages.

Auto-resolve

Alerts automatically resolve when the detector stops producing anomalies for that service. You can also manually resolve alerts from the dashboard with an optional note for the timeline.

Snooze and mute

Snooze an alert for a set duration during maintenance windows. Mute specific services or detector types to suppress known noisy patterns. Feedback from snooze/mute actions trains the self-tuning system.

Analysis Tools

When an alert fires, Epok automatically runs analysis to help you understand what happened, why, and what to do next. Deterministic analysis runs on every tier; AI-assisted explanations are included starting with the 14-day trial (capped by daily AI budget). Open an incident in the sandbox →

Root Cause Ranking

All tiers

Ranks potential root causes by scoring error patterns, causal language signals, timing correlation, and cross-service propagation. Outputs a ranked list of hypotheses.

Error categorization

All tiers

Classifies errors by failure type — connection, timeout, resource exhaustion, auth, configuration, data schema, rate limit, runtime crash, and more. Categories drive different investigation paths and RCA scoring.

What Changed (9 methods)

All tiers

Compares the anomaly window against a baseline period across 9 dimensions: new error patterns, volume shifts, field distribution changes, new log streams, disappeared streams, latency changes, status code shifts, new field values, and pattern frequency changes.

Blast Radius

All tiers

Determines which services, endpoints, and users are affected by an incident. Shows the scope of impact to help you prioritize response.

Cascade Timeline

All tiers

Reconstructs the sequence of failures across services. Shows which service failed first and how the failure propagated through dependencies.

Dimension Lift

All tiers

Identifies which field values are disproportionately represented in the anomaly. If 90% of errors come from region=us-east-1, Dimension Lift surfaces that automatically. AI-generated plain-language explanations are included starting with the trial.

Cross-service error matching

All tiers

Matches related errors across different services. When your API returns 500s and your database logs connection timeouts at the same time, Epok links them.

Service dependency graph

All tiers

Infers service-to-service dependencies from log patterns and error propagation. Visualizes which services depend on what.

Deploy correlation

All tiers

Detects recent deploys from log patterns (version strings, restart markers, config changes) and correlates anomalies with deploy timing.

AI incident narrative

Trial+

Plain-language summary of what happened, what's affected, and suggested next steps. Inlined into Slack alerts and the investigation view.

AI root-cause hypothesis

Trial+

LLM-assisted explanation on top of the deterministic RCA ranking. Turns signals into a readable theory of the incident.

Deep RCA

Trial+

On-demand, slower analysis that pulls more context (baseline comparison, correlated events, pattern history) and produces a longer write-up.

Dimension Lift explanation

Trial+

Natural-language explanation of why a dimension spiked — the shift, its scale, and whether it's the most likely cause.

Noise scoring

Trial+

LLM-scored noise rating on every alert to auto-tune suppression over time. Reduces alert fatigue without manual rule edits.

Natural-language query

Trial+

Ask "show me 5xx spikes from checkout in the last hour" and Epok translates to LogsQL. Scoped to your tenant.

AI Features

Every detector and deterministic analysis tool runs on every tier. AI-powered explanations sit on top: they turn detector output into readable prose, explain dimension shifts, auto-tune alert noise, and translate English into LogsQL. AI runs against your logs only at your tenant's request; your data is never used for model training.

FeatureTier
Incident narrativeTrial+
Root-cause hypothesisTrial+
Suggested actionsTrial+
Title rewriteTrial+
Deep RCATrial+
Dimension Lift explanationTrial+
Noise scoringTrial+
Natural-language queryTrial+

Daily AI credits

1 credit = 1 AI action. Trial: 200/day. Team: 500/day. Growth: larger budget. Credits reset at 00:00 UTC. Alert narratives are generated eagerly; on-demand features (Deep RCA, NL query) consume credits per invocation.

Data privacy

Epok sends only the minimum necessary context (log samples, detector evidence, service names) to the AI provider. Payloads are not retained by the provider and are not used for model training. Your logs stay on Epok's servers; the LLM never gets bulk access.

Notifications

Configure where Epok sends alerts. Trial and Team include channels; Growth and Enterprise are unlimited.

Slack

Incoming webhook integration. Alerts include severity, affected service, description, and a link to the investigation view. On Team tier and above, AI-generated incident narratives are included inline.

Add a Slack incoming webhook URL in Settings > Notification Channels.

PagerDuty

Native Events API v2 integration. Alerts map to PagerDuty incidents with severity, dedup key, and custom details. Resolved alerts auto-resolve in PagerDuty.

Add your PagerDuty integration key (Events API v2) in Settings > Notification Channels.

Webhook

Send alert JSON to any HTTP endpoint. Use this to integrate with OpsGenie, Microsoft Teams, Discord, or custom systems.

Add a webhook URL in Settings > Notification Channels. Epok sends a POST with the alert payload as JSON.

Email

Email notifications for alerts. Includes a summary with links to the dashboard for investigation.

Add email addresses in Settings > Notification Channels.

Delivery guarantees: Notifications are batched and retried with exponential backoff. Failed notifications go to a dead-letter queue and are recovered on restart.

Team Management

Epok supports team collaboration with role-based access control.

Roles

Three roles: Owner (full access, can manage billing and delete tenant), Admin (manage members, API keys, settings), and Member (view alerts, search logs, investigate incidents).

Inviting team members

Owners and admins can create invite links in Settings. New members sign in with Google and are automatically added to your tenant with the role you specify.

Tier limits

TierDaily ingestRetentionUsersAPI keysServices
Trial107 GB14 days3210,000
Team50 GB30 days10510,000
Growth167 GB30 daysUnlimited20Unlimited
Custom1024 GB365 daysUnlimitedUnlimitedUnlimited

All tiers include every intelligence detector. See pricing for full feature comparison.

Configuration

Epok works with zero configuration out of the box. All settings below are optional and can be adjusted in the dashboard.

Detection sensitivity

Volume anomaly detection calibrates itself to each service's normal traffic pattern and flags spikes, drops, and flatlines. You can adjust sensitivity per service if a stream is genuinely bursty by design, but the defaults work without tuning for almost every workload.

Threshold + composite rules

Custom rules for hard constraints. Threshold rules fire when a LogsQL query crosses a number; composite rules fire when multiple signals are simultaneously active. Full reference below →

Trial: 5 threshold + 5 composite. Team: 20 + 5. Growth: unlimited.

SLO monitoring

Define Service Level Objectives with error budget tracking. Epok monitors burn rate and predicts when your SLO will breach. Trial: 5 SLOs. Team: 5. Growth: unlimited.

Self-tuning thresholds (Team+)

Epok learns from your feedback. When you snooze, mute, or resolve alerts, the system adjusts sensitivity to reduce noise over time. No manual threshold tuning needed.

Custom rules — reference

Two flavours of user-defined rule. Threshold rules fire when a LogsQL query crosses a number over a window. Composite rules fire when two or more signals are simultaneously true. Both are plain JSON; both speak the same LogsQL you use in Explore.

Threshold rules

Endpoint: POST /api/v1/tenants/<id>/rules

json
{
  "name": "Payment refund burst",
  "query": "service:payment AND _msg:refund AND amount > 1000",
  "condition_op": "gt",
  "condition_value": 5,
  "window_seconds": 300,
  "severity": "critical",
  "for_duration_seconds": 60,
  "cooldown_seconds": 600,
  "channel_ids": [12, 8],
  "enabled": true
}
FieldTypeDescription
namestringHuman-readable, surfaced in alerts.
queryLogsQLThe query whose count is checked against the condition. Same syntax as Explore.
condition_openumgt · gte · lt · lte · eq · neq
condition_valuenumberThreshold the hit count must satisfy.
window_secondsintLook-back window. Default 300 (5 min).
severityenuminfo · warning · critical. Drives notification routing + paging behavior.
for_duration_secondsintCondition must hold for at least this long before firing. Eliminates flapping.
cooldown_secondsintMinimum time between consecutive fires for this rule. Default 0.
channel_idsint[]Notification channel IDs. Omit to use tenant defaults.
enabledboolToggle without deleting. Default true.

Worked examples

Auth-failed burst
query: level:warn AND _msg:"auth failed"
fires: count gt 50 in 60s · critical

Brute-force / credential stuffing. 50 events in 60s is too many.

Database connection storm
query: service:checkout AND _msg:"connection refused"
fires: count gt 10 in 60s · critical

Upstream DB is down or pool is exhausted. Pair with composite rule.

Slow request anomaly
query: service:api AND latency_ms:>5000
fires: count gt 3 in 300s · warning

Hard ceiling on user-visible latency. 3 occurrences in 5min crosses SLO budget.

Refund / fraud guard
query: service:payment AND _msg:"refund issued" AND amount:>1000
fires: count gt 5 in 300s · critical

High-value refund cluster — page someone immediately for fraud review.

Composite rules

Endpoint: POST /api/v1/tenants/<id>/composite-rules

json
{
  "name": "API degraded + DB struggling",
  "expression": {
    "op": "and",
    "conditions": [
      {
        "op": "threshold",
        "query": "service:api AND status_code:>=500",
        "comparator": "gt",
        "value": 10,
        "window_seconds": 300
      },
      {
        "op": "detector_active",
        "detector_type": "database_intelligence"
      }
    ]
  },
  "severity": "critical",
  "cooldown_seconds": 300,
  "channel_ids": [12],
  "enabled": true
}

Leaf operators (measurements)

threshold

A LogsQL query crossing a number in a window. Requires query + value; comparator defaults to gt; window_seconds defaults to 300.

{ "op": "threshold",
  "query": "level:error",
  "comparator": "gt",
  "value": 50,
  "window_seconds": 300 }
alert_firing

A specific alert (by dedup-key, severity, or service) is currently active. All fields optional — match anything when omitted.

{ "op": "alert_firing",
  "dedup_key": "lr:api:..." }
detector_active

A built-in detector type has produced anomalies. Pair built-in detection with your own threshold to filter noise.

{ "op": "detector_active",
  "detector_type": "silence" }

Branch operators (combine)

and (all conditions true), or (any true), not (single child false). Branches take a conditions array. Expressions can nest up to 5 levels deep; expression size is capped at 10KB.

Version-control your rules

Rules are JSON; export with GET /rules, commit to your repo, apply from CI with POST. Put alert config next to service code so a deploy and its alerts ship together.

Test before deploy

Run the rule's LogsQL query in Explore against your last 24h. The condition fires on the count of hits over your window — same number Explore shows you.

Anti-patterns

  • Don't alert on noisy level:info patterns — use the detectors.
  • Set for_duration_seconds > 0 for any latency or rate rule to kill flapping.
  • Composite rules with a single operand should be threshold rules — keep composites for real AND/OR logic.

Tier limits

  • Trial: 5 threshold + 5 composite
  • Team: 20 threshold + 5 composite
  • Growth, Enterprise: unlimited

API Reference

Key endpoints for programmatic access. All endpoints require authentication via API key. Full OpenAPI reference →

Download the raw spec at /openapi.json to generate clients or import into Postman.

MethodEndpoint
GET/health
GET/api/v1/alerts
GET/api/v1/alerts/:id
POST/api/v1/alerts/:id/resolve
GET/api/v1/streams
GET/api/v1/new-errors
GET/api/v1/patterns
GET/api/v1/search
GET/api/v1/facets
GET/api/v1/hits
WS/ws/livetail/:tenant_id
WS/ws/alerts/:tenant_id
GET/api/v1/detectors
POST/api/v1/channels
GET/api/v1/channels
GET/metrics

Example: List active alerts

terminal
bash
curl https://app.getepok.dev/api/v1/alerts?state=firing \
  -H 'Authorization: Bearer YOUR_API_KEY'

Example: Search logs

terminal
bash
curl 'https://app.getepok.dev/api/v1/search?query=level%3Aerror&start=-1h&limit=100' \
  -H 'Authorization: Bearer YOUR_API_KEY'

Rate Limits & Errors

Epok enforces per-tenant quotas so one tenant can't degrade the platform for everyone else. All limits are documented; nothing is secret or negotiated on a case-by-case basis.

Ingest rate limit

Logs are rate-limited per tenant. When you exceed the limit, Epok returns HTTP 429 Too Many Requests with a Retry-After header (seconds). Retry after that many seconds with exponential backoff to avoid thundering herd on recovery.

TierIngest rateAPI query rateDaily volume
Trial500 events/sec1,200 req/min107 GB
Team500 events/sec1,200 req/min50 GB
Growth2,000 events/sec6,000 req/min167 GB

Hitting the daily-volume ceiling pauses ingest until the next UTC day on the trial; paid tiers bill overage per GB. Live-tail sessions and saved-search counts have separate caps — see pricing.

HTTP error codes

CodeMeaningWhat to do
200OKSuccessful response. For ingest, all logs were accepted.
400Bad RequestPayload is malformed. Check NDJSON formatting, timestamp, and required fields. Response body has the specific issue.
401UnauthorizedMissing, invalid, or expired credentials. Check your API key or re-authenticate via Google OAuth.
403ForbiddenAuthenticated but lacking permission. Some admin endpoints require the owner or admin role.
404Not FoundResource doesn't exist or your API key isn't scoped to its tenant.
409ConflictDuplicate resource (e.g. creating a tenant with an existing account_id).
413Payload Too LargeSingle log line exceeds 1 MB or batch exceeds the ingest size cap. Split into smaller batches.
429Too Many RequestsRate limit hit. Honor the `Retry-After` header and back off exponentially.
500Internal Server ErrorServer-side issue. Retry with backoff. If it persists, email support@getepok.dev with the request ID from the response headers.
503Service UnavailableTemporary overload or deploy in progress. Retry with backoff.

Recommended retry strategy

For ingest, buffer locally and retry idempotently. Exponential backoff with jitter prevents correlated retries after an upstream hiccup. A simple loop:

retry.py
python
import time, random, httpx

def send(events, *, max_attempts=5):
    for attempt in range(max_attempts):
        r = httpx.post(
            "https://ingest.getepok.dev/insert/elasticsearch/_bulk",
            headers={"Authorization": f"Bearer {API_KEY}"},
            content=events,
            timeout=30,
        )
        if r.status_code < 500 and r.status_code != 429:
            return r
        delay = float(r.headers.get("Retry-After", 2 ** attempt))
        time.sleep(delay + random.uniform(0, 0.5))
    raise RuntimeError(f"ingest failed after {max_attempts} attempts")
Logs flow during incidents. Epok never drops logs during a traffic spike on paid tiers. Overage is billed per GB; you can set budget alerts in Settings to be notified before you hit a limit.

Migrating From Another Tool

Moving from Datadog, Splunk, or Loki? Dual-ship for a day, verify parity, then cut over on your own schedule. The migration guide walks through each source with concrete Vector configs.

What's New

Release notes with the actual commits. Every change is traceable back to the code that shipped it.

Further Reading

Ready to get started?

Open Epok Dashboard

14-day trial includes every detector and full AI root cause analysis. No credit card required.