May 26, 2026·6 min read

Six Rule Packs at 100% Precision and 100% Recall

How six detector rule packs hit 100% precision and 100% recall on a labeled corpus — and the CI gate that makes sure they stay there. Security, web, database, dependency, search, infrastructure.

detectionrule packsbenchmarksprecisionrecall

A statistical detector that runs at 100% precision and 100% recall doesn't exist. There's always a threshold somewhere, and somewhere downstream of that threshold there's a false positive waiting to happen.

But a rule pack — a curated set of patterns encoding hard-won incident knowledge for a specific domain — can hit 100/100 on a labeled corpus. Not because it's clever. Because the rules are deterministic and the corpus tells the truth.

This is the engineering story of how Epok's six domain rule packs got there, and the CI gate that keeps them there.

What a rule pack actually is

Six of Epok's twenty detectors are rule packs:

Security — brute-force auth, privilege escalation, anomalous access patterns
Web / HTTP — 4xx and 5xx surges, TLS handshake failures, gateway timeouts
Database — connection pool exhaustion, deadlocks, slow queries, replication lag
Dependency — upstream failures, circuit breaker trips, retry exhaustion
Search — slow Elasticsearch queries, index issues, scoring anomalies
Infrastructure — disk pressure, memory pressure, swap, kernel errors

Each pack is a set of patterns matched against log content and structure. A "fires when" condition (e.g. "5 or more auth_failed events for the same user within 60 seconds") and an alert template ("Brute-force attempt on user X from IP Y — 23 failed attempts in 90 seconds").

The patterns aren't AI-derived. They're written by engineers who have seen these failure modes in production. Encoded once. Run forever.

The labeled corpus

A rule pack is only as good as the corpus it's tested against. Ours is a curated set of log slices across each domain, each labeled with the ground-truth answer: should this fire, and if so, what should the alert say?

A pack passes only if every positive case fires (recall) AND every negative case stays silent (precision). The corpus has both. The negative cases are where most products fail — they're the ones that look like the real thing but aren't. A failed login followed immediately by a successful one is a typo, not a brute-force attack. The corpus has examples of both, and the rules have to know the difference.

The CI gate

The corpus is wired into a single test that runs on every pull request. A pack that drops below 100/100 on any category fails the test. The PR doesn't merge.

What this means in practice: a rule pack improvement that catches a new failure mode but breaks an existing case fails CI. So does a refactor that accidentally weakens an existing rule. So does a regression nobody would have caught by eye.

This is the most boring kind of engineering discipline and also the most useful. The corpus stays honest, the rules stay honest, and the "100% precision and recall" claim on the marketing site stays honest. We don't have a quarterly QA cycle. We have a test that runs in 4 seconds on every PR.

Why precision matters more than recall here

A common mistake in detection product design is treating precision and recall as a single tradeoff to be tuned. For rule packs specifically, they aren't. Precision is the budget. Recall is the spec.

If a security rule pack fires on 90% of brute-force attacks but also fires twice a week on legitimate SSH from a sleepy laptop, the on-call mutes the channel within a month. Recall above the precision floor is irrelevant — nobody is listening to the alerts.

The 100% precision target isn't aspirational. It's the price of admission for the detector to stay subscribed-to.

Catch what rules can — and only what rules can

The reason rule packs work at 100/100 is that they encode patterns that are unambiguously bad. SSH brute-force is unambiguously bad. Connection pool exhausted is unambiguously bad. A deadlock storm on the orders table is unambiguously bad.

For the failure modes that aren't unambiguous — gradual latency drift, novel error patterns, volume anomalies that depend on the hour-of-day baseline — rule packs are the wrong tool. Statistical detectors handle those, with explicit learning periods, calibrated thresholds, and confidence annotations on every alert.

Epok ships both. Six rule packs at 100/100 for the patterns where determinism wins. Five statistical detectors for the patterns where it doesn't. Every alert tagged with which lane caught it.

What's next

Three more rule packs are in the queue: observability (logging pipeline failures — the meta layer), runtime (JVM, Python, Go runtime crashes), and scheduler (cron job failures, batch job missed runs).

Each one ships only after it hits 100/100 on its own labeled corpus. That's the rule. The bar isn't arbitrary — it's what keeps the pager useful at 3am.

Try Epok against your own logs. Every rule pack runs on every tier, including the 14-day full-feature trial. No credit card. Start at app.getepok.dev.

Try Epok free. No credit card. First alerts in minutes; full baseline coverage at 7 days.

Every detector included. Root cause analysis on every incident. See what your logs are trying to tell you.

Start Free

The Incidents That Hide Between Alerts

Six classes of production failure that don't trip threshold alerts and don't show up in AI-summarized log feeds — but cost engineering teams real money every week. What Epok catches that the rest of your observability stack will quietly miss.

How to Catch New Errors in Production Before Users Report Them

Most error monitoring tools count known errors. The dangerous ones are the errors you've never seen before. Here's how automatic error fingerprinting works.

Silent Failures: The Bug That Won't Page You

The scariest production failures aren't the ones that throw errors. They're the ones where a service dies and the logs just stop. Here's why silence detection matters more than error alerting.