Root Cause Analysis for Microservice Systems via Hierarchical Reinforcement Learning from Human Feedback

Lu Wang, Chaoyun Zhang, Ruomeng Ding et al.

2023 · 23 citations

In microservice systems, the identification of root causes of anomalies is imperative for service reliability and business impact. This process is typically divided into two phases: (i)constructing a service dependency graph that outlines the sequence and structure of system components that are invoked, and (ii) localizing the root cause components using the graph, traces, logs, and Key Performance Indicators (KPIs) such as latency. However, both phases are not straightforward due to the highly dynamic and complex nature of the system, particularly in large-scale commercial architectures like…

Read the paper →

Explore this paper's citation graph on Constellation.