Site reliability engineering :: How Google runs production systems /: edited by Betsy Beyer ... and others.

Beyer, Betsy,; Jones, Chris; Petoff, Jennifer,; Murphy, Niall Richard,

Site reliability engineering : How Google runs production systems / edited by Betsy Beyer ... and others.

Beyer, Betsy, editor.; Jones, Chris (Computer engineer), editor.; Petoff, Jennifer, editor.; Murphy, Niall Richard, editor.

2016

HD9696.8.U64 G6666 2016eb

Available Online

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Items

Linked e-resources

Linked Resource

Online Access

Details

Title

Site reliability engineering : How Google runs production systems / edited by Betsy Beyer ... and others.

ISBN

9781491951187 (electronic book)
1491951184 (electronic book)
9781491951170 (electronic book)
1491951176 (electronic book)
9781491929124

Published

Sebastopol, CA : O'Reilly Media, 2016.

Language

English

Description

1 online resource (xxiv, 524 pages) : illustrations

Call Number

HD9696.8.U64 G6666 2016eb

Dewey Decimal Classification

620.00452

Bibliography, etc. Note

Includes bibliographical references and index.

Access Note

Access limited to authorized users.

Source of Description

Description based on print version record.

Added Author

Beyer, Betsy, editor.
Jones, Chris (Computer engineer), editor.
Petoff, Jennifer, editor.
Murphy, Niall Richard, editor.

Available in Other Form

Print version: Site reliability engineering. Sebastopol, CA : O'Reilly, 2016 9781491929124

Linked Resources

Online Access

Record Appears in

Online Resources > Ebooks
All Resources

Introduction. The production environment at Google, from the viewpoint of an SRE
Principles. Embracing risk
Service level objectives
Eliminating toil
Monitoring distributed systems
The evolution of automation at Google
Release engineering
Simplicity
Practices. Practical alerting from time-series data
Being on-call
Effective troubleshooting
Emergency response
Managing incidents
Postmortem culture: learning from failure
Tracking outages
Testing for reliability
Software engineering in SRE
Load balancing at the frontend
Load balancing in the datacenter
Handling overload
Addressing cascading failures
Managing critical state: distributed consensus for reliability
Distributed periodic scheduling with Cron
Data processing pipelines
Date integrity: what you read is what your wrote
Reliable product launches at scale
Management. Accelerating SREs to on-call and beyond
Dealing with interrupts
Embedding an SRE to recover from operational overload
Communication and collaboration in SRE
The evolving SRE engagement model
Conclusions. Lessons learned from other industries.

Browse Subjects

Show more subjects...

Items

Linked e-resources

Details

Table of Contents

Browse Subjects

Statistics