This is a response to a tweet asking:
"Why is there no competition to PagerDuty/Opsgenie? People in my team say it’s “just connecting to the Twilio API” but if it were that easy, there’d probably be a ton of competition."
PagerDuty is the market-leading incident alerting tool. OpsGenie is Atlassian's incident management tool, which is widespread thanks to distribution. If you're a JIRA or Confluence customer, it's trivial to connect OpsGenie, and to use it together with your existing Atlassian products.
However, there are actually a lot of alternatives to these well-known tools! PagerDuty has excellent brand awareness, and competitors are pretty hard to find.
Let's change this!
Here's what a software engineer said after reading this post:
"We decided to go with PagerDuty because it seemed like a shallow competitive field. This is a helpful blog post maybe would have done something different had I had this list."
20 incident management and alerting alternatives to PagerDuty
Below are 20 alternatives, and one sentence on how they describe themselves on their website.
- ZenDuty. "End-to-end incident alerting, on-call management and response orchestration platform."
- incident.io (disclaimer: I'm an investor). "With a beautifully simple interface, powerful workflow automation, and integrations with all your existing tools, prepare for incident management like never before." Note that the cofounder and CPO at incident.io contributed to the article Incident review best practices.
- Jeli. "Every incident is an opportunity that reveals how your organization really works. Jeli allows you to see that opportunity." Note that the Head of Research at Jeli contributed to the article Incident review best practices.
- FireHydrant. "Reduce manual work, get everyone on the same page, and improve time to resolution with a fully-customizable platform that works with all the tools you love."
- Spike "We alert you of your incidents via Phone calls, Whatsapp, Telegram, SMS, Email, Slack, Microsoft Teams, and Discord before your customers do."
- ilert. "Manage on-call, respond to incidents and communicate them via status pages using a single application."
- Blameless. "Assemble responders, manage communications and restore service without ever leaving Slack."
- Rootly. "Manage incidents directly from Slack. Build a consistent and automated response process."
- Moogsoft "Ensure continuous availability with automated noise reduction, correlation, and collaboration across your incident workflow."
- SquadCast "Deliver and scale super-reliable services with one platform for all your Reliability workflows. Fix issues faster and optimize savings."
- Datadog incident management. "Track and collaborate on incidents from start to finish all within a unified platform. No context switching or manual processes."
- Grafana OnCall. "An easy-to-use on-call management tool that will help reduce toil in on-call management through simpler workflows and interfaces that are tailored specifically for engineers."
- Splunk On-Call (formerly: VictorOps) "The tools to fix major incidents faster."
- OnPage "Ensure that critical notifications rise above the clutter® and are always received by the right on-call teams."
- AWS Systems Manager Incident Manager "Designed to help you mitigate and recover from incidents affecting your applications hosted on AWS."
- AlertOps "Transform real-time operational intelligence into automated incident response."
- XMatters "Automate operations workflows, ensure applications are always working, and deliver remarkable products"
- Better Uptime "Get notified with a radically better infrastructure monitoring platform."
- Transposit "An AI-powered incident management platform. Create automations in seconds, and resolve incidents faster."
- OpsGenie. The best-known competitor. See my warning on OpsGenie reliability below though.
And some solutions that are much more than just incident management and alerting, though have this as well:
- Coralogix - a full-on observability platform, with alerting capabilities as well.
- ServiceNow: a workflow system that comes with alerting capabilities.
Open source alternatives
- Iris by LinkedIn: a highly configurable and flexible service for paging and messaging.
- Oncall by LinkedIn: a calendar tool designed for scheduling and managing on-call shifts. It can be used as source of dynamic ownership info for paging systems like Iris.
Alerting vs incident management differences
Note that most competitors are not apples-to-apples in features to PagerDuty. Here are areas where each are different:
- Multi-channel alerting. PagerDuty is most known for it's multi-channel alerting capabilities and can deliver alerts across push notifications, emails, text messages, phone calls. It can chain respondent chains across multi-channels, and organize pretty complex oncall teams. Vendors like ZenDuty, Spike, ilert and many others all have similar alerting and alert chaining capabilities.
- Incident management. What happens once the right people have been alerted? Well, the incident needs to be debugged, mitigated, and communication needs to happen with the relevant stakeholders. Once mitigation happens, a postmortem need to be completed, and follow-up work needs to be done. This is the part where PagerDuty won't offer nearly as much capabilities as many of the competitor products. Vendors like incident.io, FireHydrant, Blameless and several others tend to have more focus on this area.
- Learning from incidents. Once the incident is mitigated, and follow-up actions are complete, are we done? In better organizations: no! Incidents are ones that teams use to learn from: and these learnings are both circulated, shared as stories, and made easy to reference. New joiners to the company will often read through historic incidents to understand how things played out, and prepare for what to do when they go oncall. PagerDuty - to my knowledge - offers nothing for this phase. Jeli is the best example which focuses on this area, but other tools with a major focus on incident management tend to often offer such capabilities.
In the age of always-connected smartphones with push notifications, choosing an incident management product that is less heavyweight on multi-channel alerting, but more focused on incident management might be a reasonable tradeoff.
As an example of the above tradeoffs, here's how incident.io compare themselves to PagerDuty (I am an investor in incident.io):
"On paper, PagerDuty covers entire lifecycle of an incident, but in our experience – and the experiences of our customers – we’ve observed it to be strongest in the alerting phases, and weaker at helping them to respond to and learn from incidents. At incident.io, we believe that we’re the most sensible incident response and management tool for companies looking to do more than just alert.
Today, PagerDuty is both a integration with and an alternative to incident.io. We offer a powerful integration with their on-call management and alerting capabilities, allowing you to trigger incident.io incidents from PagerDuty alerts, and to escalate to other folks as necessary.
PagerDuty is great (and probably the most popular tool) for alerting engineering teams when something goes wrong. However, PagerDuty doesn’t offer as much when it comes to incident coordination, response and follow-up — arguably the most important aspects of incident management.
This characterization on incident.io applies to many other tools, such as Rootly, Blameless, FireHydrant, and others. Their multi-channel alerting capabilities are more limited, but their overall incident management focus could be more relevant: depending on your needs, and the maturity of your team.
Be wary of OpsGenie's past reliability incident. I recommend exercising caution regarding OpsGenie, given how in 2022, OpsGenie was down for hundreds of customers for 2 weeks. Yes: 2 for weeks! For a real-time alerting system! Unlucky customers had no option but to move to alternative alerting/incident management providers. This long downtime for a real-time alerting system is not acceptable, and that Atlassian did not prioritize restoring OpsGenie over less critical Atlassian products–even as impacted customers were begging them to prioritize OpsGenie first and foremost–is truly puzzling.
I have since talked with engineers on the OpsGenie team who said that it felt that Atlassian rushed the OpsGenie integration - after buying the company - onto their unified internal stack, ignoring warnings that an outage in the Atlassian identity system would take OpsGenie down. OpsGenie is clearly more critical of a system than the ones like JIRA or Confluence, but it is not treated with priority within the Atlassian stack, at least now it seems like it.
Featured Pragmatic Engineer Jobs
- Senior Backend Engineer - C#/.NET at Straddle. £90-125K + founding team equity. Remote (UK).
- Senior Solutions Engineer at Tint. $130-195K. Remote (US).
- Product Engineer at Causal. Remote (US, UK). The team tackles interesting challenges like simplifying React state management.
- Backend Engineer - Data at Causal. Remote (US, UK).
- Senior Backend Engineer at Polarsteps. Amsterdam (Netherlands).
- Senior Data Engineer at GetHarley. £70-100K. Remote (UK) or Hybrid.
- Senior Frontend Engineer at GetHarley. £70-100K. Remote (UK) or Hybrid.
- Senior Software Engineer at Tint. $140-195K. Remote (US).
- Senior Product Engineer, Frontend at Attio. £90-125K + equity. Remote (Europe).
- Senior Data Engineer (RoR) at Terminal49. $140-200K. Berkeley, California.
- Engineering Manager - Security Product team at CAST AI. Remote (Lithuania).
- Software Engineer at Freshpaint. $130-210K + equity. Remote (US).
The above jobs score at least 10/12 on The Pragmatic Engineer Test. Browse more senior engineer and engineering leadership roles with great engineering cultures, or add your own on The Pragmatic Engineer Job board and apply to join The Pragmatic Engineer Talent Collective.
Want to get interesting opportunities from vetted tech companies? Sign up to The Pragmatic Engineer Talent Collective and get sent great opportunities - similar to the ones below without any obligation. You can be public or anonymous, and I’ll be curating the list of companies and people.
Are you hiring senior+ engineers or engineering managers? Apply to join The Pragmatic Engineer Talent Collective to contact world-class senior and above engineers and engineering managers/directors. Get vetted drops twice a month, from software engineers - full-stack, backend, mobile, frontend, data, ML - and managers currently working at Big Tech, high-growth startups, and places with strong engineering cultures. Apply here.