Webnx's Ogden Utah datacentre outage

Xon · Apr 5, 2021

Ogden datacenter issue - WebNX

Hello Everyone,Now that we have a better understanding of what happened we would like to give everyone an update.One of our old generators that have worked for years and was recently load tested had a mechanical failure and caught fire resulting in power being cut to our core routers and fire...

webnx.com

Apparently a generator stopped and caught fire, and the fire suppression system was apparently water based, and then the fire department said "no" to the building having power.

While this was a nasty time, over night and during a public holiday; the lack of communications is probably the worst part. Even if those communications are just repeats of "we are still investigating", it at least communicates that they are doing something.

I've got a number of websites down, over a couple different clients.

Thankfully I've got very recent backups due to using zfs & snapshot shipping (using a tool called zrepl), and plenty of practice doing restores. Sadly building an entire new production environment is still somewhat painful

MySiteGuy · Apr 5, 2021

Xon said:
Ogden datacenter issue - WebNX

Hello Everyone,Now that we have a better understanding of what happened we would like to give everyone an update.One of our old generators that have worked for years and was recently load tested had a mechanical failure and caught fire resulting in power being cut to our core routers and fire...

webnx.com

Apparently a generator stopped and caught fire, and the fire suppression system was apparently water based, and then the fire department said "no" to the building having power.

While this was a nasty time, over night and during a public holiday; the lack of communications is probably the worst part. Even if those communications are just repeats of "we are still investigating", it at least communicates that they are doing something.

I've got a number of websites down, over a couple different clients.

Thankfully I've got very recent backups due to using zfs & snapshot shipping (using a tool called zrepl), and plenty of practice doing restores. Sadly building an entire new production environment is still somewhat painful

Sorry to hear this. "Somewhat" is an understatement, even with the best of backups.

Forsaken · Apr 6, 2021

Xon said:
Sadly building an entire new production environment is still somewhat painful

I may be moving 3 dedicated servers to a different data center due to recent instability at the one it's hosted at. Each server has a different purpose, so that'll be fun.

MattW · Apr 6, 2021

Xon said:
While this was a nasty time, over night and during a public holiday; the lack of communications is probably the worst part. Even if those communications are just repeats of "we are still investigating", it at least communicates that they are doing something.

^^ This

Communication is key in these type of incidents. most people understand outages can and do happen, but it's how this is then communicated out that makes the difference. To be fair to OVH's Octave has done a pretty decent job in ensuring communication has been happening pretty regularly with their own DC fire.

Xon · Apr 7, 2021

I've ended up renting a replacement server and my core sites are now back up. Now I play the waiting game with webnx to figure out what is happening with the previous servers.

And my add-on site is back up! https://atelieraphelion.com/

There looks to have been ~20 minutes of lost data, but I can't see any payment records in paypal/stripe for missing transactions so no missing licenses

ENF · Apr 8, 2021

Xon said:
I've ended up renting a replacement server and my core sites are now back up. Now I play the waiting game with webnx to figure out what is happening with the previous servers.

And my add-on site is back up! https://atelieraphelion.com/

There looks to have been ~20 minutes of lost data, but I can't see any payment records in paypal/stripe for missing transactions so no missing licenses

@Xon Thanks for your efforts as always. Hope the rest can be resolved quickly.

V3NTUS · Apr 10, 2021

A similar thing happened to OVH the last month, and we've been offline for 10 days

Xon · Apr 10, 2021

If you have paypal subscriptions, IPN notifications start timing out after 3 days. Which thankfully I got things backonline before that happened.

V3NTUS · Apr 12, 2021

Xon said:
If you have paypal subscriptions, IPN notifications start timing out after 3 days. Which thankfully I got things backonline before that happened.

Yeah I also had Stripe webhooks disbled after a few days, as the servers were offline for around 10 days! I had to go through each single purchase and resend the event to Xenforo so it could process the payments/renewals correctly.

ichpen · Apr 12, 2021

Yep got hit by this. Webnx has been great until the s hit the fan then radio silence until recently. I cannot stress enough how important over communication is in times of crisis even if it's for the sake of communicating something.

I've moved 1 server and waiting on another to dry as it was on the splash zone. Unfortunately I forgot to backup one VM so I'm their mercy at the moment or I would have moved the second.

Webnx's Ogden Utah datacentre outage

Xon

Well-known member

Ogden datacenter issue - WebNX

MySiteGuy

Well-known member

Ogden datacenter issue - WebNX

Forsaken

Well-known member

MattW

Well-known member

Xon

Well-known member

ENF

Well-known member

V3NTUS

Well-known member

Xon

Well-known member

V3NTUS

Well-known member

ichpen

Well-known member

We value your privacy