An open-source Kubernetes operator that watches your namespaces 24x7, correlates signals across pods, nodes, and services, and autonomously performs root cause analysis the moment an incident occurs.
The traditional SRE loop takes minutes or hours. RCA Operator collapses it to seconds using real-time signal correlation and AI-assisted analysis.
IncidentReport CR, notifies via Slack/PagerDuty, and optionally triggers autonomous remediation.NotifyOnly to AutoRemediate — you choose how autonomous the operator is. Start cautious, graduate to full automation.RCAAgent and IncidentReport are first-class Kubernetes objects. Full audit trail, queryable via kubectl, GitOps-compatible out of the box.Built on battle-tested controller-runtime patterns. Each component is independently testable and replaceable.
A clear, phased roadmap from MVP to a CNCF-ready, multi-cluster operator with full web UI.
Choose kubectl or Helm. Both paths have your operator watching your cluster in under 5 minutes.
RCA Operator is Apache 2.0 licensed and actively developed. Every contribution — bug reports, docs, playbooks, or code — moves us closer to production-ready.
make test, and open a Pull Request.RCA Operator is free, open source, and ready to deploy into your cluster today. Your on-call rotation will thank you.