What is SRE?
Limited Time Offer!
For Less Than the Cost of a Starbucks Coffee, Access All DevOpsSchool Videos on YouTube Unlimitedly.
Master DevOps, SRE, DevSecOps Skills!
DevOps – DevSecOps – SRE – DataOps – AIOps
Lets Learn, Do it & Share! Thats a DevOps!!!
For Less Than the Cost of a Starbucks Coffee, Access All DevOpsSchool Videos on YouTube Unlimitedly.
Master DevOps, SRE, DevSecOps Skills!
What is SRE? and How its differnece from Ops?
SRE looks for continuos enhancemente by applying a methodology for reducing toil, prevent problems recurrence, be poractive. Ops is just fixing the incidents.
What is difference between SRE and DevOps?
Devops is about implementing a toolchain for continous delivery, SRE is also about toolchaing but also integrates operations and methodology, it’s a philosofy for managing IT.
What are the function of SRE Team?
Eliminate tool to focus on enhancement, manage risk derived from changes, handle failures and prevent recurrence
What are the best practices for Toil Management?
50% of time on operations, 50% on identifying toil and automating it.
Please me understand service level in SRE?
SLI, SLO, SLA
What are the phases to work on incidents?
triage, examine, diagnose, test, cure
What are the items we should have for postmortam?
timeline, evidences, analysis, RC, lessons learnt, corrective actions, education
What is Obserbability?
Monitoring of target environment by reviewing events, tracking metrics, finding patterns in logs and checking the health metrics of the infrastructure and applocations
What is SRE? and How its differnece from Ops?
SRE is an aproach to gain service reability Through some principal tenents like: eliminate toil, work with service levels, manage failure .
What is difference between SRE and DevOps?
The devops teams is include in the app codding phase
What are the function of SRE Team?
reduce toil, manage incidentes, improve observavility
What are the best practices for Toil Management?
understanding toil, limitin toil and eliminating
Please me understand service level in SRE?
is a way to measure the reability
What are the phases to work on incidents?
examine, diagnose, remediate
What are the items we should have for postmortam?
problem summary, timeline, lessns learned
What is Obserbability?
its more than monitoring, its gain visibility of many aspects as SLO,SLI,SLA, alerts , and more from a system
Q1 What is SRE? and How its differnece from Ops?
culture approach that combines aspects of software engineering and applies them to infrastructure and operations problems, Unlike traditional Ops, SRE uses code, automation, and teamwork to boost reliability, not just manual tasks.
Q2 What is difference between SRE and DevOps?
focuses on reliability using software and automation. DevOps is about merging development and operations teams to streamline software delivery a. SRE emphasizes reliability; DevOps emphasizes collaboration
Q3 What are the function of SRE Team?
Reliability
Automation
Monitoring
Collaboration
Q4 What are the best practices for Toil Management?
Automation
Documentation
Analysis
Task Reduction
Q5 Please me understand service level in SRE?
Service Level Objectives
Service Level Indicators
Service Level Agreements
Q6 What are the phases to work on incidents?
monitoring alert
Response
Mitigate
Root Cause Analysis
Document
Q7 What are the items we should have for postmortam?
Incident detail, Timeline, action taken , lessons learnt blameless postmortem ,
Q8 What is Obserbability?
having the tools and systems to understand how software and services work in real-time, quick problem detection, diagnosis, resolution.
#What is SRE? and How its differnece from Ops?
SRE is more focusing on site reliability and availibility. Ops(operations) just means the works for mantaining system.
#What is difference between SRE and DevOps?
DevOps includes development and operations, and SRE only includes operations.
#What are the function of SRE Team?
– Eliminating toil
– Managing risk
– Handling failure
#What are the best practices for Toil Management?
1. Identify and define toil
2. Quantify toil
3. Prioritize Automatation
4. Automate
#Please me understand service level in SRE?
1. SLA – service level agreement
2. SLO – service level goal
3. SLI – key metrics
#What are the phases to work on incidents?
1. Detection
2. Logging
3. Notification
4. Response
5. Resolution
6. Documentation
7. Postmortem
#What are the items we should have for postmortam?
1. Incident info
2. Root cause analysis
3. Timeline of actions
4. Document
5. Follow-up
#What is Obserbability?
1. Metric
2. Event
3. Logs
4. Traces
What is SRE? and How its differnece from Ops?
SRE is the way how all the teams and components should work together.
Ops is only the team to perform the things, but SRE includes also all other operations around.
What is difference between SRE and DevOps?
DevOps is more the way of developing, SRE is wider principle for whole product or site functioning.
What are the function of SRE Team?
Joining all the other parts together, communicate, automate, remove toil as much as possible,
What are the best practices for Toil Management?
Keep the toil max on 50% of time by automate everything possible 🙂
Help me understand service level in SRE?
SLI – parameters of the service
SLO – parameters for the service in a time slot
SLA – contractual parameters how the service should be available
What are the phases to work on incidents?
detect, first aid, diagnose, resolution, postmortem
What are the items we should have for postmortam?
analyze the whole chain of events, record all actions has been done, RCA, list of all involved people+components, lessns learned
What is Obserbability?
The capability to see the all available logs/info, trace the events and visualize it
1. What is SRE. SRE vs Ops
SRE is person/team focusing on availability, performance, incident management and monitoring of services. For SRE one of the main goals is to automate manual work and be involved at the start of SDLC instead of just running/supporting production.
2. SRE vs DevOps
DevOps is a mindset to automate everything, everyone can take part on DevOps. SRE is specific role focusing on availability, performance, incident management and monitoring.
3. SRE functions
Automate manual work, manage risk and handling failure
4. Best practices for toil management
Evaluate and messure how much each toil is taking time/resources.
How much time it would take to automate Toil.
Choose which automation/process change would be most cost effective/most important.
5. Service levels
SLA = service level agreement. Agreed service level with customer.
SLA = service level objective. Agreed within organization/SRE team what is acceptable level of service. This level needs to be higher than SLA and have realistic margin so that service level can be improved before SLA is reached.
SLI = service level indicator. To see actual status of a service with latency, amount traffic, amount of errors, platform/server saturation
6. Phases for incident
triage, get services up
examine, understand the problem
diagnosis, find root cause for the problem
Test, verify you understand the problem
Cure, fix problem long term
7. Postmortem
document the issue
identified root cause
What fixes should be done to fix so problem will not happen again
8. Observability
Ability to observe and monitor current state of service based on logs, events, metrics and traces
1.A. SRE posses the technical knowledge and are engaged right from the architecture to the operations and have complete information of the product. Ops teams they are bridge between the operation teams and the business. They do not have technical knowledge of the product.
2.A. Devops team are focus on developing the software and delivering it with quality. SRE focus on the software that is easily manageable, scalable and resilient. SRE also focus on existing projects to operate it more effectively by enhancing the deliverables using SRE principles of automating, observability etc.,
3.A. Eliminating toil, managing risks and handling failures
4.A. Toil should not exceed 50% of total work load
5.A.SLI,SLO and SLA
6.A. Triage, Examine, diagnose, test and cure
7.A. Identify the root cause and have it well documented. It should be neutral, blame-free and published with in the organisation
8.A. Metrics, Events, logs and Traces
What is SRE? and How its differnece from Ops?
Engineering approach for IT operation. SRE is focused in availability, efficiency, eliminate toil, manage risk and avoid failures.
What is difference between SRE and DevOps?
SRE concerns in software engineering to design operation function. DevOps to build and run.
What are the function of SRE Team?
Work to keep highest availability and efficiency. Document problems, share knowledge and solutions.
What are the best practices for Toil Management?
Identify, measure, prioritize
What are the phases to work on incidents?
Triage, examine, diagnose, test, cure
What are the items we should have for postmortam?
Issues in discussion, action items, assignments, lessons learned
What is Obserbability?
Collection of metrics, logs and traces that can monitor or control the state of system
What is SRE? and How its different from Ops?
SRE is an approach to operations which uses the software as their primary tool for managing the systems. SRE is different from Ops because SRE is more prescriptive and more feasible on internal migrations.
What is difference between SRE and DevOps?
The major difference is that DevOps teams create software and then refine it. Where SRE teams work with already-built software to ensure it functions correctly and cooperates with other software and systems.
What are the function of SRE Team?
The main function of SRE team is to elimination the toil, Need to work on service levels and managing failure.
What are the best practices for Toil Management?
The best practices for Toil management are – Data Driven Analysis, Toil reduction backlog, Cost benefit analysis, Automation projects etc.
Please me understand service level in SRE?
SLI, SLO and SLA
SLI-SLI are the metrics used to measure the levels of services provided to end users.
SLO-SLO are the targeted levels of services measured by SLIs they are typically expressed as a percentage over a period of time.
SLA- SLA are the agreement that outline the levels of service end users can expect from service providers. like service credits, subscription extension, services etc.
What are the phases to work on incidents?
The Incidents phases can say that Triage, Examine, Diagnose, Test and Cure.
What are the items we should have for postmortem?
A postmortem is the record of an incident, its impact, the actions taken to mitigate or resolve it, the root cause, and the follow-up actions to prevent the incident from recurring.
What is Observability?
the ability to monitor your system to discover and diagnose problems as they occur.
Site realibility engineering
DEVOPS is the development methodology
SRE is a component of DevOPS
SRE functions : eliminating Toil , managing risk , Handling failure
Toil : labor intensive repetitive automatable
Incidents : past experience , issue complexity
Postmortam : blameless analisys , continuous improvement
Obserbability : end to end vision of the customer bussinnes process
Q1- What is SRE?
A1 – Site Reliability Engineering, is a discipline that incorporates engineering and operations to ensure the reliability, performance, and scalability of systems. SRE use data and automation to monitor, troubleshoot, and improve systems.
Q2 – How does SRE differ from Ops?
A2 – SRE teams are more proactive than traditional Ops teams. They use data and analytics to identify potential problems before they occur. They also use automation to streamline and speed up their work.
Q3 – What is the difference between SRE and DevOps?
A3 – SRE and DevOps are both approaches to software development and operations that emphasize collaboration and automation. However, SRE is more focused on the reliability and performance of systems, while DevOps is more focused on the speed and frequency of software releases.
Q4- What are the functions of an SRE team?
A4 – SRE responsibilities, include:
a) Monitoring systems for performance and reliability issues
b) Troubleshooting and solving incidents
c) Implementing and manage automations
d) Work with developers to improve the reliability of software
Q5 – What are the best practices for toil management?
A5 – Toil is repetitive, manual work that is necessary to keep systems running. It can be a major drain on SRE teams’ time and energy. Here are some best practices for toil management:
a) Identify and automate as much toil as possible
b) Use data and analytics to identify and prioritize toil
c) Delegate toil to less skilled workers
d) Outsource toil to third-party vendors
Q6 – What is service level in SRE?
A6 – Service level, or SLO, is a set of targets that define the reliability, performance, and availability of a system. SREs using SLOs like a measure of progress and ensure that they are meeting the needs of their customers.
Q7 – What are the phases to work on incidents?
A7 – The four phases of incident response are:
a) Detection: The incident is detected and reported
b) Triage: The severity of the incident is assessed and a response plan is developed.
c) Resolution: The incident is solved and the system is restored to normal operation.
d) Postmortem: The incident is analyzed to identify the root cause and prevent similar incidents not happening in the future.
Q8 – What are the items we should have for a postmortem?
A8 – The postmortem include someone items:
a) A description of the incident with timeline of the events
c) RCA – root cause analysis (5 Whys, Ishikawa – Fishbone diagram )
d) Recommendations to prevent similar incidents(learning lessons)
Q9 – What is observability?
A9 – Observability is the ability to understand internal state of a system by observing its outputs. SREs using observability to monitor systems for performance and reliability to prevent issues.
What is SRE? and How its differnece from Ops?
– SRE uses the software as primary tool to manage systems
– SRE was engage since the begining of the creation of a new project with development team
What is difference between SRE and DevOps?
What are the function of SRE Team?
– Think about eliminating toil, managing riks and Hadling failure
What are the best practices for Toil Management?
– Think about effort to automatize a task and the save of time.
Please me understand service level in SRE?
– SLI.- Are metrics used to manage the level of service.
– SLO.- Are agrements with customer or another parties that usually doesn’t result in a penalty are used to know the status of level of service.
– SLA.- Are agrements with customer or another parties that could result in lost of money, credit or penalty if we can’t get the proper result in the metrics.
What are the phases to work on incidents?
– Analisis
– Engage proper teams.
– Fix the issue
– Monitoring
What are the items we should have for postmortam?
– Time of the incident
– Logs
– Actions that happened during the incident.
What is Obserbability?
– It’s the evolution of monitoring.
What is SRE? and How its differnece from Ops?
>SRE uses tools and automation for smoother operations and reliability. While ops team’s task is to maintain infrastructure.
What is difference between SRE and DevOps?
>focuses on reliability using software and automation. DevOps is about merging development and operations teams to streamline software delivery a. SRE emphasizes reliability; DevOps emphasizes collaboration
What are the function of SRE Team?
>Eliminate toil, manage risks & handle failure.
What are the best practices for Toil Management?
>Automate manual and repetitive work.
>Prioritize issues. Focus on issues that makes difference.
>Implement Observability.
Please me understand service level in SRE?
>Service Level Indicator (SLI), Service Level Objective (SLO) & Service Level Agreement (SLA).
What are the phases to work on incidents?
>Triage, examine, diagnose, test, & cure
What are the items we should have for postmortam?
>Document incident and its resolution.
>Identify Root cause & apply fix.
>continuous improvement.
What is Obserbability?
>MELT. Metrices, Events , Logs & traces.
What is SRE? and How its differnece from Ops?
Difference: SRE using automation, has a stronger engineering skills and working together with Dev team to to ensure the system/service stability.
What is difference between SRE and DevOps?
What are the function of SRE Team?
What are the best practices for Toil Management?
Please me understand service level in SRE?
What are the phases to work on incidents?
What are the items we should have for postmortam?
What is Obserbability?
What is SRE? and How its difference from Ops?
An SRE is tasked to ensure collaboration between DEV and OPS through automation and enhancement of processes , tools
What is difference between SRE and DevOps?
While DEVOPS is focusing on ensuring rapid release of stable, secure software. SRE is more focusing on a set of practices and metrics to improve collaboration and service delivery. SRE is closer to the business and also is a bridge between business and DEV&OPS.
What are the function of SRE Team?
SRE team is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.
What are the best practices for Toil Management?
Best practice: start small and then improve.
Please me understand service level in SRE?
SLO (range of sli’s) /SLI(indicator) /SLA(greement)
SRE mostly focuses on SLO/SLI. SLA’S helps to define both.
What are the phases to work on incidents?
Prep
Identify
Containment
Recover
Leran
Re-test
What are the items we should have for post-mortems?
Assembling background on the incident:
When dis it start
When was i dtected
What was the efect
What was oveserved by end-client
What actions were taken
…
What is Observability?
Is in between Metrics/traces/logs
Observability is a tooling that enables SRE to debug their systems .
What is SRE? and How its differnece from Ops?
What is difference between SRE and DevOps?
Thet have much un common but SRE focusses more on the reliability while DevOps looks more to the development.
What are the function of SRE Team?
Eliminate toil, manage risk, handle failures
What are the best practices for Toil Management?
Identify, prioritize, limit time spent on toil
Please me understand service level in SRE?
SLI – The metrics used to monitor the health
SLO – objective you must reach to meet the agreed target
SLA – agreed values with customers/users for a certain metric
What are the phases to work on incidents?
Triage, examine, diagnose, test, cure
What are the items we should have for postmortam?
Problem summary, lessons learned, action items, timeline
What is Obserbability?
a mechanism that helps to understand and explain unexpected system behavior with the help of logs, traces and metrics.
What is SRE? and How its differnece from Ops?
SRE – Site Reliability
SRE works on Operations tasks but must identify and automate repetitive tasks, works in incidents, postmortems, its a upgraded version of OPS with an end to end vision
What is difference between SRE and DevOps?
DevOps teams works in develope,create and delivery sofware and then refine it and SRE works with built software to ensure its funcionality at the required level, optmizing systems and the resources are available
What are the function of SRE Team?
Eliminate Toil, managing risk and handling Failure
What are the best practices for Toil Management?
Data Analysis Identify toil, priorize projects, cost-benefit analysis, Automation projects, buld costs vs toil costs
Please me understand service level in SRE?
SLI – a quantifible measure of service realibility
SLO – a goal- a realiability target for an SLI
SLA – consequences
What are the phases to work on incidents?
triage, examine, diagnose, test and cure
What are the items we should have for postmortam?
Blameless analysis
What is Obserbability?
Metrics Events, Logs and traces to identify a desviation complex system
What is SRE and how its different from ops?
An SRE is in charge of helping to mantain an application working as best as it can during the expected time, in order to do this he needs to be involved in the planning, designing, develpment, testing, etc of the applications.
Ops team is only responsible to keep the applications running, most of the times without knowing who it works and reacting to events when they are already happened.
What is difference between SRE and Devops?
They have different goals, for Devops teams their main goal is to create new applications, delivered new functions, as fast as they can, they are minimal interested on how those applications may be maintain during their life cicle and SRE are involved in the almost the same phases than devops teams but they are also focus on maintain the application running, the quality of the service, automation, instrumentation.
What are the functions of SRE Team?
Eliminating Toil, Work with service levels and managing failures
What are the best practices for Toil Management?
Identify and mesure toil
Priorited toil-reduction
adopt toil-reduction techniques
Please me understand service level in SRE?
We have 3 main Service levels on SRE
SLI.- Service Level Indicators. this are the metrics that we use to level a service like availability, latency; and they are presented in percentage
SLO.- Service Level Objectives. This are the target levels for a system availabity (the operation level that we want to obtain for it) and they are presented in percentage
SLA.- Service Level Aggreements. This are the contractual aggreements between the service provider and the users about the levels of the services provided and if you do not reach them usually represent consequences for the provider
What are the phases to work on incidents?
Triage, Examine, Diagnose, Test and Cure
What are the items we should have for postmortems?
log information, events, graphics of capacity.
What is Observability?
is an strategy to keep the most relevant and important information to measure a system state based on the data generated by their components.
1. What is SRE? And how its differnece from OPS??
Is full stack Systems trinking and coding skills, with app service availability focus that is data drive
The difference is that differents goals, skillers, tools and. Only OPS supports, maintenance and capacity
SRE is full. design, development, acceptance, delivery, support, maintenance and capacity
2. What is difference between SRE and DevOps
DevOps only design, development, testing, acceptance, delivery and SRE design, development, acceptance, delivery, support, maintenance and capacity (NOT TESTING)
3. What is the function of SRE team
Agreed delivery, Full agency, after the event
4. What are the best practices for toil management?
Identify and automate, prioritice projects
5. Please me understand service level in SRE ?
Are 3 services level SLI, SLO, SLA
But where sRE is focused is the SLO, SLI
6. What are the phases to work on incidents ?
Triage, Examine, diagnose, test, cure
7. What are the items we should have post mortem
Document the incident and resolution and identification root cause and fix
8. What is observability
Is see logs, trace, graphics for detected performance degradation or problems that cause impact
1. What is SRE? and How its differnece from Ops?
SRE is set of practices used in the deploymet/test/operating of software, Ops (Operations) is a team that is resposible
for BAU – aka keepint the software running withing the agreed SLAs. Ops is integrated as part of SREngineering practicess as SREngineers
where they are have an extra function of loopback for Devops team allowing to better use resources and incrase overall Software workings.
2. What is difference between SRE and DevOps?
SREs are reponsible for Maintenace while DevOps are responsible for Testing
3. What are the function of SRE Team?
Eliminating Toil
Working to Service Levels
Managing Failure
4. What are the best practices for Toil Management?
->Identify and Measure Toil
->Engineer Toil Out of the System
->Reject the Toil
->Use SLOs to Reduce Toil
->Promote Toil Reduction as a Feature
->Start Small and Then Improve
5. Please help me understand service level in SRE?
There are 3 Definitions in SRE that connect to Service Level
SLA SL Agreement – agreement that was done with the client
SLO SL Objectives – objectives that SRE team must hit to meed that agreement
SLI SL Indicators – the real numbers of app performance
6. What are the phases to work on incidents?
-> Triage – get back to “good enough” state
-> Examine – understand problem/identify trigger
-> Diagnose – find the possible cause
-> Test – identify the problem cause
-> Cure – > fix the problem/document solution
7. What are the items we should have for postmortem?
-> Date/Authors/Reviewers/Incident Commander
-> Action Items
-> Timeline
-> Executive summary
-> Problem summary
-> Lessons Learned
8. What is Observability?
Monitoring + Metrics+ Tracing + Logs
What is SRE? and How its difference from Ops?
The SRE role is committed to achieve the system stability and participate into development and delivery process for new functionabilities from end to end, the differences from Ops are that SRE Team is involved into SDLC and Ops as one entity, Ops team only operates and support system.
What is difference between SRE and DevOps?
The main differences between SRE and DevOps are that DevOps is pretty focus to shorten SDLC and speed up the software delivery. The SRE is involved into SDLC and Ops as one entity.
What are the function of SRE Team?
The function of SRE team are provide reliability to system, optimization through tools and automation, reduction of toil and accomplishment of SLI, SLO and SLA.
What are the best practices for Toil Management?
Identify manual, repetitive, automatable, reactive tasks.
Please me understand service level in SRE?
Service level in SRE are divided into three categories, SLA agreed with clients, SLO internal team agreement, SLI numbers of performance.
What are the phases to work on incidents?
Preparation, Identification, Containment, Eradication, Recovery, Lessons Learned, Ongoing Improvement.
What are the items we should have for postmortem?
A high-level summary
RCA
Steps taken to diagnose, assess, and resolve
A timeline of significant activity
Learnings and next steps
What is Observability?
Observability is a proactive approach to analyze and optimize the systems.
1. What is SRE? And how its differnece from OPS??
Is full stack Systems trinking and coding skills, with app service availability focus that is data drive
The difference is that differents goals, skillers, tools and. Only OPS supports, maintenance and capacity
SRE is full. design, development, acceptance, delivery, support, maintenance and capacity
2. What is difference between SRE and DevOps
DevOps only design, development, testing, acceptance, delivery and SRE design, development, acceptance, delivery, support, maintenance and capacity (NOT TESTING)
3. What is the function of SRE team
Agreed delivery, Full agency, after the event
4. What are the best practices for toil management?
Identify and automate, prioritice projects
5. Please me understand service level in SRE ?
Are 3 services level SLI, SLO, SLA
But where sRE is focused is the SLO, SLI
6. What are the phases to work on incidents ?
Triage, Examine, diagnose, test, cure
7. What are the items we should have post mortem
Document the incident and resolution and identification root cause and fix
8. What is observability
Is see logs, trace, graphics for detected performance degradation or problems that cause impact
What is SRE? and How its differnece from Ops?
SRE is more focusing on site reliability and availibility.
What is difference between SRE and DevOps?
DevOps includes development and operations.
What are the function of SRE Team?
– Eliminating toil
– Managing risk
What are the best practices for Toil Management?
. Identify and define toil
. Quantify toil
. Automate
Please me understand service level in SRE?
SLA
SLO
SLI
What are the phases to work on incidents?
1. Detection
2. Logging
3. Notification
4. Response
5. Resolution
6. Documentation
7. Postmortem
What are the items we should have for postmortam?
1. Incident info
2. Root cause analysis
3. Timeline of actions
4. Document
5. Follow-up
What is Obserbability?
Monitoring of target environment by reviewing events, tracking metrics, finding patterns in logs and checking the health metrics of the infrastructure and applocations
SRE is a methodology that focuses on ensuring the reliability and scalability of cloud-enabled infrastructure, solutions, and services
SRE is more focused on ensuring the reliability of applications in production environments while DevOps is more focused on building and deploying applications
Building software, support and fix issues, optimize processess and be on-call, conducting postmortam reviews.
Identify and quantify toil, automate repetitive tasks, eliminate nontactical/reactive work, set an upper bound on toil, conduct post-incident reviews
Service Level Indicators (SLIs) separate indicators to measure, Service Level Objectives (SLOs) – set of SLI treated as internal level of services, Service Level Agreements (SLAs) – SLOs which were agreed with customer
Detection, Triage, Triage, Postmortam
Incident information, Find root cause, Timeline of actions, Document, Follow-up
Observability is ability to measure the internal states of a system by examining its outputs. Observability is key to reducing repetitive, predictable, and manual tasks that are related to maintaining a service
1 What is SRE? and How its differnece from Ops?
Ops primary concern is to keep the systems running in steady state, while SRE focuses on improvements in speed, stability and tasks automation
2 What is difference between SRE and DevOps?
DevOps is a culture of work that aims to break down barriers between development and operations teams to facilitate faster and better work. On the other hand, SRE is a specific implementation of DevOps where automation is heavily used to achieve reliability at scale.
3 What are the function of SRE Team?
– Ensuring the continuous functionality of systems
– Automating ways to keep applications functioning
– Monitoring websites/services to discover errors
– Being proactive in fixing known issues and determining ways to prevent future downtime or hiccups
4 What are the best practices for Toil Management?
– identify toil
– perform cost analysis
– automate
5 Please help me understand service level in SRE?
This one is defined with 3 terms: Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs).
– SLIs are a quantitative measure of some aspect of the level of service that is provided.
– SLOs are the target values or ranges of values for a service level that is measured by an SLI.
– SLAs are contract agreed consequences of meeting (or missing) the SLOs they contain.
6 What are the phases to work on incidents?
– Detecting
– Communication with engineers
– Assessing the impact and applying a severity level
– Communicating with customers
– Escalating to the right responders
– Delegating incident response roles
– Resolving
7 What are the items we should have for postmortam?
A clear outline of what happened during the incident, including a timeline of events
A detailed analysis of the root cause(s) of the incident
A list of actions taken to mitigate or resolve the incident
A set of follow-up actions to prevent the incident from recurring
8 What is Obserbability?
in SRE Observability refers to the ability to infer a system’s internal state(s) by examining its external outputs. It provides actionable insights into when errors occur within a system and why they occur, enabling engineers to take corrective action right away, reducing downtime, and ensuring that systems remain dependable and highly available. Observability requires collecting data from all levels of the system, not just at the application level, which includes logging, tracing, and metrics
What is SRE? and How its differnece from Ops?
# Site Reliability Engineering is focusing on actually solving problems – by “engineering it away” – than just fixing it fast until the next time the same issue pops up.
# It is also focusing on eliminating repetitive manual tasks by automation.
# It also makes sure the service is reliable as much as needed, but not more
This is done by using various software technologies and code.
What is difference between SRE and DevOps?
# DevOps integrates development and operation skills, tasks into one team, people
# SRE team exists with the development team, and working closely together with it
What are the function of SRE Team?
# Using automation to get rid of manual repeptitive work which makes toil. This way supporting more servies does not need to involve more people.
# Managing the risk of service availability failures according to the agreed levels. Calculating how much work needs to be added the reach a given reliability, but not more.
# Handilng incidents that led to service outage or degradation, learn from it, document it and make sure it won’t happen again, or if it does, appropriate actions would be at hand preferably by an automation.
What are the best practices for Toil Management?
# Probably the best practice is to not manage the toil, but eliminate it
# Basically you need to idetify what counts as toil: manual, repetitive, does not add vaule, automatable
# Either you automate the specific task, or better to implement a solution that makes the task go away, so you dont need to act on it
Please me understand service level in SRE?
Servei Level
# Objective – the availability the service needs to meet
# Indicator – metrics made by tests against the service if its meeting the target SLO
# Agreement – the amount of time the during the SLO needs to be met
What are the phases to work on incidents?
# triage
# examine
# diagnose
# test
# cure
What are the items we should have for postmortam?
# Documentation about what happened, why, what was the solution and suggestions how to use software, code, automation to not produce indident again.
What is Observability?
# Observing service / software behaviuor to able to pinpoint anomailies in case the software does not behave as expected.
What is SRE? and How its difference from Ops?
SRE is site reliability engineering.
It is a practice that uses software engineering to automate tasks like production management ,change management , incident management
etc, Where as Ops is doing many things manually in their daily day to day operations.
What is difference between SRE and DevOps?
Dev ops is all about core development of a prodcut or application
they are not working against each other.
SRE in charge of automating all the things in the deployment of a an application or product.
DEvops is about core development , they are directly dealing with customer expectations and adding new features as asked.
SRE they are working on the implementation of the core or we can say they are working on the deployment.
They give feedback to the Devops if a product is not working correctly
If they get some issues with the app or product they will come back with feedback to the Devops team , which in turn will modify code and release it gain.
What are the function of SRE Team?
reduce manual work.
Automate where ever there is an opportunity.
Do proactive monitoring of running app/product and predict the failures before they actually happen.
Try to fix it if this can be done at their end or involve Developers if this has to be modified at the app/product level.
What are the best practices for Toil Management?
Identify what is causing toil.
Automate it
Document it
Then monitor it
Please me understand service level in SRE?
Service levels are a way to measure the reliability of a service
What are the phases to work on incidents?
Detect
Log it
work on it
resolve it
Notify on resolution
RCA if required
Document it.
What are the items we should have for postmortem?
Document the issue
Action items
RCA
follow ups
What is Obserbability?
In SRE practice ,it allows us to detect and diagnose the issues before they cause much trouble to the customer.