Downtime, outages and outages: Understand your true costs
- 11. April 2019
- Written by: Gad Cohen
This content is brought to you by Evolven. Evolven Change Analytics is a unique AIOps solution that tracks and analyzes all actual changes to the enterprise cloud environment. Evolven helps leading companies reduce the number of incidents, reduce problem resolution time and eliminate unauthorized changes.Learn more
When it comes to mission-critical applications or data center performance quality, companies are willing to invest heavily. Unfortunately, these investments are not always fully delivered.
Against system failure
Despite the efforts that have been invested in infrastructure resilience, many IT organizations continue to struggle with database, hardware and software failures lasting from a few minutes to several days, completely shutting down the business and large cause losses.
The world of IT outages can seem strange at times.
Despite the variety of advanced solutions and the growing amount of data being collected by leading enterprise software vendors and IT departments - from ERP to CRM and more - outages remain a valid and serious threat to the industry.
On the other hand, IT outages have somehow become an inherently accepted, even expected, part of business life.
IT downtime review
While IT professionals experience downtime from time to time and then focus their efforts on overcoming it, the business organization as a whole suffers the "financial pain" that is usually quite significant.
In the past, we've taken a closer look at the various ways IT downtime can impact business outcomes (you can read more about this here:Cost and scope of unplanned outages). In doing so, we consider different aspects, from direct sales losses and damage to reputation to indirect effects such as reduced productivity.
Now, I want to return to the topic and examine how organizations should address and assess threats to their IT operations, including systems, applications and data, by looking at robust (and established) benchmarks that represent the potential costs behind downtime and disruption .
Measuring the failures of big brands
When should the industry start measuring the financial impact of major brand disruptions like the one that occurred recently?Facebook, Öone that reached hundreds of thousands of Lloyds Bank customers, or theJetstar failurethat caused hundreds of flight delays?
In other words, at what point is an outage "significant enough" that a cost analysis becomes valuable for the industry to learn from and predict the impact of future outage incidents?
Well, apparently at some point the disruption creates an impact that PR-wise can't ignore. This is the point of no return, followed by estimates of the financial impact.
The cost of downtime varies significantly between industries. The size of the affected company is of course a critical but not the only important factor. The role of the IT systems in the company is also crucial.
Defining a numeric value behind an IT outage means pre-defining its impact on multiple business and organizational aspects so the entire industry can learn and optimize accordingly.
A failure of a critical application can result in two different types of losses:
- Application service outage: The impact of downtime varies by application and organization;
- Data Loss: The potential loss of data due to a system failure can have significant legal and financial implications.
Well, I'm sure you'll agree that today's data centers should never go down; Applications must remain available 24/7, and internal (let alone external) end users around the world must be confident that data centers are always available (for critical data and application availability).
Well, reality bites. This is not the case in the back office (i.e. within the data center). No organization enjoys 100% uptime. Should You Try to Achieve 100%? Clear. However, you also need to develop a deep understanding of the impact of downtime and ways to minimize it.
Worst blackout nightmare in history? What probably happened to you...
Some past blackouts have turned into PR disasters, like the mythical Virgin Blue disaster in 2010 or the recent one that struck Facebook.
Because? The massive impact probably had something to do with it.
As a reminder, Virgin Blue's outage prevented passengers from boarding flights for 11 days (!!), resulting in negative press, damaged reputations and millions in losses.
More specifically, Virgin Blue's reserve management company, Navitaire, eventually compensated Virgin Blue for more than $20 million (Navitaire's booking decision gives Virgin $20 million in compo).
There are many other incidents that still attract media attention. Here's a current oneUSA Today article on the Wells Fargo power outagewho prevented customers from accessing their accounts for many hours.
It's safe to say that anyone in IT would agree that failures or disruptions are VERY bad for business. They are undesirable, very damaging financially and must be combated with all available means.
Configuration errors are key
The IT Process Institute's Visible Operations Handbook has reported in the past that "80% of unplanned outages are due to poorly planned changes made by administrators ("operations staff") or developers" (visible operations).
The Enterprise Management Association reported that 60% of availability and performance failures are due to misconfigurations.
How much does it cost?
Downtime can cost organizations $5,600 per minute and up to $300,000 per hour in web application downtime (according to aAnalyse Gartner 2014).
Average cost per hour of downtime for enterprise servers, worldwide, 2017-2018:
Application maintenance costs are increasing at 20% annually. But that can't solve all your problems. Previous industry studies have found that at least a quarter of the downtime surveyed is due to configuration errors. (How much will you spend on app downtime this year?).
How common is downtime or disruption?
Granted, downtime can be a financial nightmare. That part is clear. But if you want to properly assess the risk potential of business disruption, the immediate question should be, "How likely is that?"
Fuente:data center knowledge
Admittedly, failures are too common to ignore the thought, "I probably won't have a major failure." Now the question is how do you calculate the risk specific to your business.
Clarified production costs and app downtime
Unplanned outages are resolved by IT. However, as I mentioned earlier, these outages ultimately impact the entire organization.
An important part of a complete downtime risk assessment process is estimating how much money you will lose per hour (or minute, or whatever time interval you choose) due to the downtime.
For organizations that rely solely on the ability of data centers to provide IT and network services to customers, such as For example, telecom providers or e-commerce companies, downtime can be particularly costly, with the cost of a single event reaching $1 million (more than $11,000 per minute) according to expert estimates.
In a USA Today survey of 200 data center managers, more than 80% said their downtime costs exceeded $50,000 per hour. More than 25% reported downtime costs of more than $500,000 per hour (!!).
According to another survey, while companies cannot achieve zero downtime, one in ten companies indicated that their availability should be greater than 99.999%.
To get a solid understanding of the impact of production and release downtime, let's take a look at how the consequences of downtime manifest themselves.
Downtime costs: per year or per incident?
AStudy 2017found that 46% of 400 IT decision makers experienced more than four hours of IT-related downtime in 12 months; 23% said they incur costs between $12,000 and more than $1 million per hour.
More than 35% admitted they are unsure of the cost of a business interruption.
If you ask Delta Airlines, which had to cancel 280 flights due to disruptions in 2017, the losses from a single disruption incidentcould reach more than 150 million US dollars.
A few years ago, Dun & Bradstreet reported that 59% of Fortune 500 companies experience at least 1.6 hours of downtime per week.
If you take an average Fortune 500 company (or any company with at least 10,000 employees) and assume that it pays IT staff an average of $56 an hour, then (assuming all IT employs it is to fix the downtime), part labor alone - downtime for a company this size would be $896,000 per week, which works out to over $46 million per year (Assessing the financial impact of downtime).
The reality is of course more complicated, since many parameters have to be taken into account, such as: B. the time of the event (weekdays or weekends? day or night?) and much more. However, understanding the cost of downtime will go a long way in assessing your potential risk and return on investment from tools that can help minimize the impact of downtime.
Could the industry learn from the past and minimize collateral damage during an outage?
How have things changed since the past?
So we already know that downtime and power outages are still happening today and that the industry is not yet able to eliminate them. But how have costs changed over time? Are these incidents less harmful today?
ab 2010,a poll by Coleman Parkesfound that IT downtime costs companies a total of more than 127 million hours per year in employee productivity, an average of 545 hours per company.
In 2009, the average cost of downtime varied significantly by industry, from about $90,000 per hour in the media industry to about $6.48 million per hour for major online brokers (How to quantify downtime).
According to a survey of IT managers over the years, companies are increasingly aware of the direct financial cost of computer failures. Research has found that one in five businesses is losing $12,000 an hour due to system downtime (How to quantify downtime).
As mentioned above, a subsequent analysis by Gartner in 2014 found average costs of $5,600 per minute and more than $300,000 per hour.
As early as 2004, a conservative estimate by Gartner put the cost of computer network downtime at $42,000 per hour. As a result, a company with less than 175 hours of downtime per year can lose more than $7 million per year. However, the cost of each disruption affects every business differently, so it's important to know how to calculate the exact financial impact (How to quantify downtime).
It makes sense to think that the cost of disruption will only increase over time (since we now rely more on data systems). Here's how to understand why past data can be multiplied by a significant number to reflect current reality...
Every minute counts
More than a decade ago, the average cost of data center downtime across all industries was estimated at approximately $5,600 per minute (Unplanned IT outages cost more than $5,000 a minute), appreciate that, secondgardener, remained the same until 2014. The previous Ponemon Institute study referenced above calculated the minimum, mean, average, and maximum cost per minute of unplanned outages, based on information from 41 data centers. The highest cost of an unplanned outage was over $11,000 per minute.
On average, the cost of an unplanned outage is likely to be over $5,000 per minute.
It just becomes more meaningful
AStudy 2013saw an increase of more than 41% over the previous averages described above and an average cost of more than $7,900 per minute.
LikeITIC-Umfrage 2015clearly shown that the cost per hour (compared to 2008 data) increased by 25-30%.
Impact of downtime per year
A previous Gartner analysis calculated that downtime can average 87 hours per year. Obviously this is the sum of many interruptions from a few minutes to several hours (The average large enterprise experiences 87 hours of network downtime per year).
How have things changed?
laterSurvey 2011found that while the industry has been successful in addressing the downtime epidemic and reducing its incidence, we are still seeing significant downtime and huge revenue losses (Source:resulted in more than 3 million (apparently WhatsApp users) switching to Telegram)
The impact on reputation and loyalty
How much is your company's reputation worth? This can be extremely difficult to assess, as can the long-term impact of a damaged reputation and its impact on sales and profitability.
In this case, the cost of downtime includes lost customers (both short- and long-term) and other tangible items that reflect the cost of reputation degradation, such as an organization's profile.
What parameters should affect its calculation?
When attempting to estimate the cost of downtime, there are obvious direct costs (e.g., lost business during downtime). However, there are also many indirect costs to consider, such as: B. Personnel expenses or the above-mentioned reputation problems.
Personnel costs come from the cost of burning out “war room” tasks aimed at getting IT systems up and running again, the cost of being behind on all other scheduled tasks, the cost of staff extras (if applicable) and more. Add to this the value of data loss, emergency maintenance fees (especially if the outage occurs outside of business hours), and additional repair costs that can persist long after service is restored.
It goes without saying that you should consider these costs when estimating the impact of downtime, as they are often very high; But even a rough estimate can be extremely helpful in understanding the risks and deciding what level of technology to rely on to combat them.
There's also the impact of lost sales. To get an accurate estimate of total lost sales, the hit rate needs to be increased to reflect the true lifetime value of customers who permanently switch to a competitor. For example, the Facebook (and Whatsapp) outage mentioned above.Unconscious Costs: Denying the true cost of network downtime. What is the revenue loss due to these users experiencing fewer billable ad impressions?
Inventory down 25%
Even if it is difficult to quantify so many parameters, they are still substantial and meaningful. For example, when Amazon.com was offline for several hours in the first few days, its inventory dropped by 25% in a single day (Unconscious Costs: Denying the true cost of network downtime)!
DarinAmazon Cloud OutageFor example, the company continued to fight to bring its cloud services back online. As a result, many customers questioned the reliability of their cloud and Amazon's communications surrounding the outage. Other customers felt they should be compensated for downtime as part of their SLA.
I know you're curious: In terms of SLA, Amazon's EC2 SLA was not breached despite the nearly four-day outage (Seven lessons from the Amazon outage).
The cost of downtime: Calculate it yourself
How much will you lose due to unexpected server or business application downtime?
According to various sources, the easiest way to calculate potential lost revenue during an outage is to use this equation:
|LOSS OF INCOME||=||(GR/TH) x I x H|
|GRAMM||=||annual gross income|
|º||=||total annual working time|
|H||=||Number of hours of downtime|
How to minimize the risk of disruptions and downtime?
Downtime and failures are catastrophic, but they don't have to be overly shocking. By using solutions that focus on getting to the root of the problem, failures can be prevented before they happen.
Developed change analysishas developed a unique AIOps solution that targets changes that are the true cause of performance incidents. Evolven helps enterprise IT and cloud operations teams prevent and remediate incidents before problems arise.
Contact usto see how we are helping leading companies reduce incidents and MTTR.
What is downtime What are the costs associated with downtime? ›
Downtime cost is defined as any profit that a company loses when its equipment or network stops functioning. The cost of downtime implies not only direct financial loss but can have an impact on your company in at least the other 4 ways.What is the real cost of downtime? ›
For the Fortune 1000, the average total cost of unplanned application downtime per year is $1.25 billion to $2.5 billion. The average hourly cost of an infrastructure failure is $100,000 per hour.What is the difference between downtime and outage? ›
Downtime occurs when a system can't complete its primary function. It can be broken up into two types: IT outages and brownouts. IT brownouts occur when a system is slowed or partially available. This might mean customers can access your site, but pages load slowly or dynamic features like "add to cart" don't function.What is the meaning of outage cost in business? ›
Outage Costs means the actual increased costs of replacement energy incurred by Transmission Owner during an Outage calculated in accordance with this section and does not include costs that would have been incurred notwithstanding the Generating Facility interconnection.
Common categories of downtime include excessive tool changeover, excessive job changeover, lack of operator, and unplanned machine maintenance.What are some examples of downtime? ›
Downtime has many causes, including shutdowns for maintenance (known as scheduled downtime), human errors, software or hardware malfunctions, and environmental disasters such as power outages, fires, flooding or major temperature changes.What are the two major considerations when calculating the cost of downtime? ›
Calculating Downtime Cost
The duration of the downtime and the cost incurred per minute you're offline are the two variables that most affect the financial impact of an outage.
TDC is a methodology of analyzing all cost factors associated with downtime, and using this information for cost justification and day to day management decisions. Most likely, this data is already being collected in your facility, and need only be consolidated and organized according to the TDC guidelines.What are the two types of downtime? ›
Downtime falls into two categories: planned and unplanned. Planned downtime is notable because it offers advanced warning and gives users a chance to prepare. Planned downtime is usually done for upgrades or maintenance to the network infrastructure.How do you explain downtime? ›
a time during a regular working period when an employee is not actively productive. an interval during which a machine is not productive, as during repair, malfunction, maintenance.
How do you define an outage? ›
an interruption or failure in the supply of power, especially electricity. the period during which power is lost: a two-hour outage on the East Coast.How much does 1 hour of downtime cost the average business? ›
About 98% of organizations claim only one hour of downtime costs over $100,000. Looking at each industry's breakdown, we'll find out if this is true. In the IT industry, downtime is typically calculated at about $5,600 per minute.How do companies keep their costs down? ›
Cost cutting measures may include laying off employees, reducing employee pay, closing facilities, streamlining the supply chain, downsizing to a smaller office, or moving to a less expensive building or area, reducing or eliminating outside professional services, such as advertising agencies and contractors, etc.What are the financial impacts of downtime? ›
The cost of downtime = downtime duration x per-minute cost.
You can use around $400 as a cost-per-minute figure for small enterprises. In the case of large and medium businesses, use $10,000. Many people only associate downtime costs with lost revenue.
Human Error: Regardless of whether accidental or due to negligence, human error is one of the most common causes of unplanned downtime. An employee unintentionally deleting data or accidentally unplugging a cable or not following standard protocols can lead to costly downtime.What is downtime also called as? ›
DOWNTIME stands for Defect, Overproduction, Waiting, Non-Utilized Talent, Transportation, Inventory, Motion, and Extra Processing.How do you handle downtime at work? ›
- Offer to help a colleague or manager. ...
- Organize and clean your workspace. ...
- Go for a walk. ...
- Clean your email inbox. ...
- Read industry news. ...
- Compile a list of contacts. ...
- Record your voicemail greeting. ...
- Write a note of appreciation.
Downtime behavior determines how events related to a CI are handled when received while that CI was in downtime. To access. Administration > Event Processing > Automation > Downtime Behavior. Alternatively, click Downtime Behavior.What is downtime for maintenance? ›
In manufacturing, “downtime” occurs when an unplanned event halts production for a period of time. This event can be a malfunction, repair, or changeover of tools or equipment. Maintenance downtime in particular is when a machine is not operating or being productive due to required maintenance work.What is managing downtime? ›
Downtime management enables you to exclude periods of time from being calculated for events, alerts, or views that can skew CI data. To access. Administration > Service Health > Downtime Management. Alternatively, click Downtime Management.
What is a high cost of downtime? ›
How Much Does Downtime Cost a Company? The average cost of downtime is significant. Each minute costs an average of $9,000, according to the Ponemon Institute, bringing the downtime cost per hour to over $500,000.What is the industry standard for downtime? ›
World Class Standards For Downtime
Aim for unscheduled downtime to be 10% or less.
For example, the average automotive manufacturer loses $22,000 per minute when the production line stops. That quickly adds up. Overall, unplanned downtime costs industrial manufacturers as much as $50 billion a year. Downtime costs aren't limited to direct labor, production or finances.How do you optimize maintenance and operation costs? ›
- Eliminate tasks that do not correspond to any failure mode.
- Instead of “fixing”, find a cure.
- Optimise work orders.
- Avoid reactive maintenance.
- Negotiate contracts with current suppliers.
- Know the life cycle of your assets.
- Cut down on day-to-day wastage.
- Optimise MRO inventory.
All manufacturing downtime reduces overall output by stopping production. Unplanned downtime can cost 15 times more than planned downtime. The loss of revenue during any type of asset maintenance can be as high as $3 million per incident.What is reliability downtime? ›
Equipment downtime analysis is an important part of any reliability strategy. It involves assessing the percentage of time a piece of equipment is not operational due to factors such as maintenance, repair and/or replacement.What is downtime formula? ›
To get a quick estimate of your company's probable downtime costs, use the following formula, based on the size of your business and the number of minutes your most recent incident lasted: Downtime cost = minutes of downtime x cost-per-minute.Why is downtime important? ›
Downtime gives us time and space to enjoy our personal lives and get personal tasks done. It grants us time with family, friends, and our hobbies. On a brain level, it allows us to reach homeostasis and is a necessary break from the aroused state, Dr. Hanson says.What is an outage problem? ›
An Internet outage or Internet blackout or Internet shutdown is the complete or partial failure of the internet services.What is an unplanned outage called? ›
An unplanned outage (also called an unscheduled outage) is typically caused by a failure.
What is outage impact? ›
Impact Outage is defined as a percentage of the total number of current active Tenants whom are considered Down.What are three techniques used to reduce cost in a business? ›
Combination: Bundle goods and services across an organization to reduce costs. Elimination: Remove unnecessary products, processes, benefits, and workflows. Optimization: Streamlining processes and workflows to reduce bottlenecks and redundancies. Substitution: Using cheaper products or services.What is the strategy to reduce the cost? ›
Cost reduction strategies are practices and principles designed to optimize operational efficiency. They cover all aspects of running a business, from hiring employees to booking flights. Successful implementation works by streamlining processes, allocating resources effectively, and eliminating waste.What are the best way to reduce cost? ›
- Make a plan. You need to evaluate where your business is now and where you want to take it in the future. ...
- Track expenses diligently. ...
- Benchmark against your industry. ...
- Manage variable costs. ...
- Get tough on fixed costs. ...
- Invest in technology. ...
- Offer incentives to staff.
Importance of Reducing Unplanned Downtime
Waiting on parts or the necessary personnel to fix an issue takes time and could mean the machine is going to stay down for longer. Longer downtime is less time making product, directly effecting the bottom line.
The first way to measure your equipment downtime is in actual time. For a given asset (or set of assets), record the amount of time during each month that the asset is broken down. Keeping a running tally and comparing it to past months will help you know when an asset is having more issues than normal.What is downtime and how can IT affect a business? ›
Network downtime means that your customers can't access your online services. They can't find or buy your services and products. If your potential customers can't access your website, then it will affect your revenue. Also, your existing customers can't access your products and services.What is the explanation of downtime? ›
The term downtime is used to refer to periods when a system is unavailable. The unavailability is the proportion of a time-span that a system is unavailable or offline. This is usually a result of the system failing to function because of an unplanned event, or because of routine maintenance (a planned event).What is meant by the term downtime? ›
: time during which production is stopped especially during setup for an operation or when making repairs. : inactive time (such as time between periods of work) napping during our downtime.What is downtime and its causes? ›
Downtime is a period during which production or business processes come to a halt due to application unavailability, technical glitch, network outage or natural disaster.
How do you manage downtime? ›
- Know the best windows of time for planned downtime based on your company's production cycle. ...
- Prioritize all your assets and know which should be handled first. ...
- Implement clear guidelines and well-defined standard operating procedures (SOPs) for each repeated operation.
- Plan for Recovery. The best way to ensure a fast recovery is to plan ahead. ...
- Keep Everything Up to Date. ...
- Educate Your Workforce. ...
- Install a Backup Power System. ...
- Test Your Infrastructure. ...
- Consider Disaster Recovery as a Service.
What is downtime at work? It is a period during which an equipment or machine is not functional or cannot work. It may be due to technical failure, machine adjustment, maintenance, or non-availability of inputs such as materials, labor, power.What is the importance of downtime? ›
A little downtime is important for your brain health. Research has found that taking breaks can improve your mood, boost your performance and increase your ability to concentrate and pay attention. When you don't give your mind a chance to pause and refresh, it doesn't work as efficiently.What does downtime mean in maintenance? ›
In manufacturing, “downtime” occurs when an unplanned event halts production for a period of time. This event can be a malfunction, repair, or changeover of tools or equipment. Maintenance downtime in particular is when a machine is not operating or being productive due to required maintenance work.What is another word for downtime? ›
A break or intermission in work or activity. break. pause. intermission. interlude.Why is it important to reduce downtime? ›
Importance of Reducing Unplanned Downtime
Waiting on parts or the necessary personnel to fix an issue takes time and could mean the machine is going to stay down for longer. Longer downtime is less time making product, directly effecting the bottom line.