Twenty detectors.
Two intelligence layers.
Six statistical detectors learn your baseline. Fourteen rule packs fire from minute one. Every detector runs on every tier.
Statistical detectors
Learn your baseline over seven days, then alert on deviations.
New Error Detection
Catches errors that have never appeared in your 7-day baseline. New means new, not an existing pattern getting noisier.
Silence Detection
Catches services that stop logging when they normally log every N seconds. The most dangerous failure mode: no errors, just absence.
Volume Anomaly
Detects spikes, drops, and flatlines in log volume vs daily and weekly baselines per service.
Log Rate
Detects sustained shifts in overall log throughput. Catches the slow-leak case that volume anomaly misses on short windows.
Golden Signal Monitoring
Latency, traffic, error rate, and saturation thresholds per service. The four signals that matter for any production system.
Multi-dim Outlier
Outliers in log feature space. Catches subtle anomalies that single-axis thresholds miss.
Error intelligence
Identify error messages that haven't been seen before. Cluster the ones that have.
Pattern Clustering
Groups errors with similar templates so many variants of the same problem cluster into one alert.
Semantic Clustering
Errors with different text but the same meaning fold together. "connection refused to database" and "cannot connect to postgres: ECONNREFUSED" become one pattern.
Changepoint Detection
Permanent traffic shifts after a deploy read as a new baseline, not a prolonged anomaly. No false-positive storms after a real product change.
Domain rule packs
Nine curated packs encoding the most common incident patterns. Fire from minute one.
Kubernetes
70+ rules for OOMKilled, CrashLoopBackOff, ImagePullBackOff, FailedScheduling, evictions, probe failures, and more.
AWS
Lambda throttles, RDS connection saturation, SQS queue depth, NLB target health, S3 SlowDown errors, IAM throttling.
Serverless
Cold starts, invocation timeouts, init duration drift, memory exhaustion across Lambda, Cloud Run, Functions.
Database
Connection pool exhaustion, deadlocks, slow queries, replication lag, schema migration errors, transaction aborts.
Dependency
Upstream service failures, circuit breaker trips, retry exhaustion, cascading failures between services.
Web / HTTP
5xx surges, 4xx anomalies, TLS handshake failures, gateway timeouts, slow responses, payload anomalies.
Security
Brute-force auth, privilege escalation, anomalous access patterns, secret-leak signatures in logs.
Search
Slow Elasticsearch / Opensearch queries, index issues, scoring anomalies, cluster yellow/red transitions.
Infrastructure
Disk pressure, memory pressure, swap usage, kernel errors, hardware failure signatures.
SLO & custom
When you need a rule the built-in detectors don't cover.
SLO Burn Rate
Error budget burn-rate alerts with short + long window confirmation. Warned before the SLO breaches, not after.
Threshold + Composite Rules
When you need a rule the built-in detectors don't cover. Threshold and composite rules let you write your own — no DSL to learn, just LogsQL.
When the built-in detectors don't cover it.
Two ways to extend: threshold rulesfor "alert when this LogsQL query crosses a number," and composite rulesfor "alert when threshold X AND detector Y are both active." No DSL to learn — just the same LogsQL you use to search.
Fires when a LogsQL query crosses a number.
Same query language as search. Window, condition, severity, and routing are all explicit. Hold-down with for_duration_seconds eliminates flapping.
{
"name": "Payment refund burst",
"query": "service:payment AND _msg:refund AND amount > 1000",
"condition_op": "gt",
"condition_value": 5,
"window_seconds": 300,
"severity": "critical",
"for_duration_seconds": 60,
"cooldown_seconds": 600,
"channel_ids": [12, 8]
}- Auth-failed bursts: level:warn AND _msg:"auth failed" > 50 in 1m
- Retry-rate spikes: service:checkout AND _msg:retry > 10× baseline
- Specific value thresholds: latency_ms > 5000 for any service
Fires when two or more signals are simultaneously true.
Combine a threshold rule with a detector firing, or two detectors at once. Express "API is degraded AND the database is in a known bad state" as a single page — not two.
{
"name": "API degraded + DB struggling",
"expression": {
"op": "and",
"conditions": [
{ "op": "threshold",
"query": "service:api AND status_code:>=500",
"comparator": "gt",
"value": 10,
"window_seconds": 300 },
{ "op": "detector_active",
"detector_type": "database_intelligence" }
]
},
"severity": "critical",
"cooldown_seconds": 300
}- threshold — a LogsQL query crossing a number in a window
- alert_firing — a specific alert dedup-key is active
- detector_active — a built-in detector type is firing
Visual builder or JSON — you pick. Both speak the same schema the API does.


