Seeing Like an SRE: Site Reliability Engineering as High Mod

Seeing Like an SRE: Site Reliability Engineering as High Modernism

Rik Farrow
I recently spent some time trying to write a set of general guidelines for what to monitor in a software system. I came up with this list:
Latency distribution and successful/unsuccessful request counts (plus error types) for all RPCs served.
Latency distribution and success rate for all other services depended on, as well as circuit breakers tripping.
Monitor the last success time for anything that’s supposed to happen periodically.
Percentage utilisation for resources (quotas, rate limits, physical and logical system resources), as well as saturation signals for the same, and errors or timeouts.
How many instances are up and healthy/unhealthy, restarts, running versions of binaries.

Related Keywords

Jamesc Scott , Todd Underwood , Le Corbusier , Google , Service Level Objectives Slos , Surgical Tale Of Software , Readiness Reviews , Seeing Like , How Certain Schemes , Human Condition , Level Objectives , Cold War , Low Context Devop , Non Surgical Tale , டாட் அஂடர்‌வுட் , லெ கார்பூசியர் , கூகிள் , ரெடிநெஸ் மதிப்புரைகள் , பார்ப்பது போன்ற , எப்படி சிஇஆர்டிஏஐஎன் திட்டங்கள் , மனிதன் நிலை , குளிர் போர் ,