- Overview
- Prerequisites
- Audience
- Audience
- Curriculum
Description:
Embark on a journey of excellence with our Site Reliability Engineering (SRE) training course. Discover how SRE, born at Google in the early 2000s, combines software engineering principles with infrastructure and operations to create ultra-scalable and highly reliable software systems. Dive into the history, principles, and practices of SRE, differentiate it from DevOps, and gain hands-on experience in budgeting, planning, monitoring, and best practices. Elevate your skills and learn to make your systems run smoothly, efficiently, and reliably
Course Code/Duration:
BDT139 / 2 Days
Learning Objectives:
- Understand the core principles of Site Reliability Engineering (SRE) and its origins at Google, including how it incorporates software engineering into infrastructure and operations to create highly reliable software systems.
- Differentiate between SRE and DevOps, and grasp the roles and responsibilities associated with SRE.
- Gain practical skills in budgeting, planning, monitoring, and implementing best practices for SRE, enabling you to enhance the scalability and reliability of software systems.
- None
- Developers and developer teams are looking to incorporate the principles of SRE into practice.
- Developers and developer teams are looking to incorporate the principles of SRE into practice.
Course Outline:
The course includes presentations, demonstrations, and hands-on labs.
Module 1: The Basics of Site Reliability Engineering
- Reliability in Modern Applications
- The Impact of Failure and Determining Your Reliability Objectives
- Accepting Failure and Making It Part of the Design Process
- SRE is a Mindset
Module 2: Gaining Resilience and Reliability On AWS
- AWS Global, Regional, and Zonal Architecture Design
- Amazon’s Global Storage Services – S3
- Running Resilient Databases on AWS – RDS and DynamoDB
- Fault Tolerant Computation on AWS – Lambda and EC2
- Core Resilience Principles for AWS – Load Balancing and Auto Scaling
Module 3: Accepting Failure In Multi-Tier Applications
- Typical Three-Tier Application Resilience and Why It Fails in Cloud
- Designing in Resilience with Microservices
- Managing State
- Typical Application Reliability Patterns
Module 4: Deploying Applications On AWS
- Optimizing and Migrating the Code
- Creating Container with Code Build
- The Architecture of Microservices
- Using Kubernetes and ECS in AWS
- Deploying ECS and RDS
- The Problem with What we’ve Just Built
Module 5: Designing Applications
- Overview of Failure Mode Analysis
- Multi-Regional Support
- Microservices Design
- Authentication and Authorization
- Code Deployment with Code Pipeline
- Application Telemetry and Tracing
- Application Analytics
- Aurora and its Advantages Over MySQL
Module 6: Deploying a Resilient, Fault Tolerant Application
- Running/Scaling Applications On EKS
- Deploying App-Mesh
Module 7: Surviving Failure of a Global Scale
- Review: AWS Global Architecture and What we have just Built
- Global Tools: Route 53, CloudFront
- Going Global: What does this mean for Users/Developers
- Operational Changes Required for a Global Application
- Course Summary
Training material provided:
Yes (Digital format)
[INSERT_ELEMENTOR id="19900"]