Items

Linked e-resources

Details

Introduction. The production environment at Google, from the viewpoint of an SRE
Principles. Embracing risk
Service level objectives
Eliminating toil
Monitoring distributed systems
The evolution of automation at Google
Release engineering
Simplicity
Practices. Practical alerting from time-series data
Being on-call
Effective troubleshooting
Emergency response
Managing incidents
Postmortem culture: learning from failure
Tracking outages
Testing for reliability
Software engineering in SRE
Load balancing at the frontend
Load balancing in the datacenter
Handling overload
Addressing cascading failures
Managing critical state: distributed consensus for reliability
Distributed periodic scheduling with Cron
Data processing pipelines
Date integrity: what you read is what your wrote
Reliable product launches at scale
Management. Accelerating SREs to on-call and beyond
Dealing with interrupts
Embedding an SRE to recover from operational overload
Communication and collaboration in SRE
The evolving SRE engagement model
Conclusions. Lessons learned from other industries.

Browse Subjects

Show more subjects...

Statistics

from
to
Export