On-Call Best Practices for Mendix Cloud Apps

Hello folks, Our team is developing an incident response process for a Mendix Cloud App. We’re making progress developing our expectations for uptime, escalation paths, etc. but we’re reconsidering our process for on-call: 1 person out of a pool of 4 people is assigned for any given day On their day, they monitor for automated outage alerts or customer-reported outages That person responds to the alert themselves or escalates Our needs of uptime, response time, and hours of operation puts a lot of pressure on the 1 person on-call, so our team is asking: Can you describe the on-call process for your Mendix Cloud App? How do you delegate the on-call process for incident response? How have users’ expectations influenced your incident response process? Influenced your application design? Environment? What changes to your app or Mendix Cloud configuration have helped reduce the number of incidents? Increased the number of incidents?    There are abundant resources on this subject, like Google’s SRE manual, but we’d love to know what works for you.
0 answers