Show HN: Cordon – Reduce large log files to anomalous sections
github.comCordon uses transformer embeddings and density scoring to identify what's semantically unique in log files, filtering out repetitive noise.
The core insight: a critical error repeated 1000x is "normal" (semantically dense). A strange one-off event is anomalous (semantically isolated).
Outputs XML-tagged blocks with anomaly scores. Designed to reduce large logs as a form of pre-processing for LLM analysis.
Architecture: https://github.com/calebevans/cordon/blob/main/docs/architec...
Benchmark: https://github.com/calebevans/cordon/blob/main/benchmark/res...
Trade-offs: intentionally ignores repetitive patterns, uses percentile-based thresholds (relative, not absolute).
Also, please feel free to try to online demo: https://huggingface.co/spaces/calebdevans/cordon
Just a quick update: I’ve add support for remote embedding models in the most recent release (v0.3.0)