11th April outage report

You may have noticed a bit of downtime in the last 24 hours.
As is the way with these things I tend to get punished by servers whenever I am scheduled to fly on a plane.
Our host, Digital Ocean had a major outage in the SF02 datacenter yesterday.

This was closely followed by a global outage of their control panel:

The Cloud Panel and API connectivity were effected by our outage in SFO2. Multiple redundant power sources in our SFO2 datacenter failed temporarily causing a large number of Droplets and core systems, including networking, to go down. We have restored power and networking, and are currently working to bring all affected Droplets safely online.

Following the full power recovery, we started to experience delays in event processing. Our engineering team was able to isolate the root cause. We will update the status page once all events are processing without delay and publish further details to our blog once we’ve conducted a full post mortem

While this was happening, I was packing for a trip to the Australian desert and hoping to grab a few hours sleep before a 9am flight!

Digital Ocean was back online in the morning but Clarionhub was still down. The server seemed to be running fine but docker would not start. I was pretty sure I knew what was wrong but it was not something I could fix in 15 minutes before the taxi arrived and we have no wifi on the domestic flights yet, or at least not all of them!

As expected, there had been a kernel update which removed aufs support causing docker to fail to start.

The magic incantation that did the final trick was…

sudo apt-get -y install linux-image-extra-$(uname -r) aufs-tools

I am not due to be on a plane again for a few weeks so we should be good for a bit longer :smiley:

Thanks for hanging around, happy ClarionHub-ing!!!


Have fun on your outback walkabout mate

