Wednesday February 10th 2016 started with my websites and also clients sites hosted with rather than other hosting providers. This can happen from time to time to all hosting companies both big and small.
You should not judge a company based on something going wrong - but rather on how they respond to it. This is where things went downhill very quickly for myself and all the customers trying to reach out to you.
You actively chose not inform any customer that there was a problem, the nature of the problem or the likely duration of the problem.
I choose the word 'actively' specifically as you had avenues to use. Your 'updates' since have suggested that you could not email, had too many tickets to handle and a number of other reasons you were unable to communicate with people.
But the truth is you had two methods, neither of which were affected by the seriousness of the failures it later turned out you were dealing with. You chose to ignore one and confuse on the other.
Some time ago you create a dedicated server status site webhostingstatus.com to inform customers of problems. It was poorly advertised and often failed to show planned maintenance which would later be referred to later messages as the cause of new problems. It was wiped clean of all historical value as quickly as possible. This removed much of the value for those people reviewing their websites issues.
So with neither being updated both the concern and the frustration grow. Your phones went unanswered, even the Head of Customer Services direct line was constantly engaged.
So the problem must be massive
A customer who sought information tweeted the Heart Internet account and was told to check the dedicated status page for updates. Yet it would take another 40 minutes or more before ANY information was provided.
The first message provided announced that you were suffering from a distributed denial of service (DDoS) attack. You have given this as the reason for a number of service interruptions in the past few months, thus an increasing cause for grave concern.
Almost 40 minutes later a new status message appears suggesting you were also suffering from a brief power interruption. At this point it was anyone's guess as to what would fail next.
Drips of information
I pushed for answers, like something on the blog, the status page to be updated - smoke signals even - things very slowly started to drip out.
There was no DDoS attack anymore - although how anyone confuses ZERO power with a DDoS attack is beyond believe. Given the time from the problems starting and the time it took to be told DDos attack and then to be told power problems - over an hour had expired.
I got my way and a blog post appeared that offered very little in the way of useful or actionable information; that we as customers could work with.
Questions were going unanswered on Twitter, email and tickets for those who were able. I chose to write a comment on the blog post in the hopes the unanswered questions from the post gained answers.
The comment was not published until I complained via Twitter and to this day remains unanswered. Some points have been spoken about but others have not.
A whole week goes by
Over the many days in the week since the problems started, some things began to work and then fail and some were not available for days.
We learnt that despite your many processes in-place; a lot of corruption occurred during the outage and then during subsequent rebuilds.
The situation is still not full restored 8 days later.
We can all agree that the unexpected can happen, but it turns out pretty much each contingency failed one way or another in the chain. This was explained in your more recent blog posts.
As a customer I expect a certain level of value for the money paid. Part of that is clear, timely and accurate information when something happens.
You might say "Everyone was focussed on the problem so no-one was available". But that is the problem, this is where you dropped the ball and not just once. Someone SHOULD be tasked with informing customers at the time and then update within a reasonable timeframe - even if only to communicate that you are still working on it.
Here are a few of the reasons why informing people straight away would have helped YOU and us (your customers). Beneficiary shown in bold.
- Ticket attempts would be significantly lower. You
- Emails to support would be significantly lower. You
- Tweets for updates would be significantly lower. You
- Calls to the main number would be significantly lower. You
- One place to find information. Customer
- Knowing what to say to clients if you are a reseller. Customer
- Have a reasonable sense of "at least they are on it". Customer
Any lessons learned?
I do not believe so. The account of how many things that went wrong needs to be matched by a plan to show how they cannot occur again.
Import information goes without any real announcement.
The new server status status.heartinternet.uk site will provide up-to date information we are told. I say "told" as this was not mentioned in any email about the problems. It was however buried at the bottom of one post. The new place to gain updates is not even mentioned on webhostingstatus.com the previously recommend source of updates.
In order for confidence to start returning:
- ALL incidents big or small should be indicated on the new status page.
- ANY maintenance work (which started this whole series of events) should be included and announced via a second source like email.
- The Twitter account must be actively used to announce any incidents that occur for more than 5 minutes.
- Some idea of a plan that will be/is in-place to handle the group of small failures that contributed to this whole mess. *
- A guarantee of an uptime - like all other service providers. Until then, sales people should not suggest there is one.
- Publicise the new support website everywhere. Emails, website, blog, Twitter - honesty should be your policy. Being honest up front is a great sign you treat honesty and trust above all others concerns.
- Support page archive must hold at least the past 28 days of activity and each entry be updated with a status to show if it has been rectified and also marked with the seriousness of the problem.
- Honestly thank your customer for sticking with you - it has been a tough week and for some it continues. A heartfelt letter from the top - rather than a blog post from a staffer.
* It has been reported this whole event was the problem - but in truth it was many small issues combined together that worked against you.
Less unforeseen, simply more unplanned.
Core 13 Ltd
Yesterday I reach out to the Head of Customer Services for Heart Internet so I could include any words directly from Heart Internet. I still await any response.
Judge others by their actions, not their words.