Webnx's Ogden Utah datacentre outage

Xon

Well-known member

Apparently a generator stopped and caught fire, and the fire suppression system was apparently water based, and then the fire department said "no" to the building having power.

While this was a nasty time, over night and during a public holiday; the lack of communications is probably the worst part. Even if those communications are just repeats of "we are still investigating", it at least communicates that they are doing something.

I've got a number of websites down, over a couple different clients.

Thankfully I've got very recent backups due to using zfs & snapshot shipping (using a tool called zrepl), and plenty of practice doing restores. Sadly building an entire new production environment is still somewhat painful :(
 

MySiteGuy

Well-known member

Apparently a generator stopped and caught fire, and the fire suppression system was apparently water based, and then the fire department said "no" to the building having power.

While this was a nasty time, over night and during a public holiday; the lack of communications is probably the worst part. Even if those communications are just repeats of "we are still investigating", it at least communicates that they are doing something.

I've got a number of websites down, over a couple different clients.

Thankfully I've got very recent backups due to using zfs & snapshot shipping (using a tool called zrepl), and plenty of practice doing restores. Sadly building an entire new production environment is still somewhat painful :(

Sorry to hear this. "Somewhat" is an understatement, even with the best of backups.
 

Forsaken

Well-known member
Sadly building an entire new production environment is still somewhat painful :(
I may be moving 3 dedicated servers to a different data center due to recent instability at the one it's hosted at. Each server has a different purpose, so that'll be fun.
 

MattW

Well-known member
While this was a nasty time, over night and during a public holiday; the lack of communications is probably the worst part. Even if those communications are just repeats of "we are still investigating", it at least communicates that they are doing something.
^^ This

Communication is key in these type of incidents. most people understand outages can and do happen, but it's how this is then communicated out that makes the difference. To be fair to OVH's Octave has done a pretty decent job in ensuring communication has been happening pretty regularly with their own DC fire.
 

Xon

Well-known member
I've ended up renting a replacement server and my core sites are now back up. Now I play the waiting game with webnx to figure out what is happening with the previous servers.

And my add-on site is back up! https://atelieraphelion.com/

There looks to have been ~20 minutes of lost data, but I can't see any payment records in paypal/stripe for missing transactions so no missing licenses :D
 

ENF

Well-known member
I've ended up renting a replacement server and my core sites are now back up. Now I play the waiting game with webnx to figure out what is happening with the previous servers.

And my add-on site is back up! https://atelieraphelion.com/

There looks to have been ~20 minutes of lost data, but I can't see any payment records in paypal/stripe for missing transactions so no missing licenses :D
@Xon Thanks for your efforts as always. Hope the rest can be resolved quickly.
 

Xon

Well-known member
If you have paypal subscriptions, IPN notifications start timing out after 3 days. Which thankfully I got things backonline before that happened.
 

V3NTUS

Well-known member
If you have paypal subscriptions, IPN notifications start timing out after 3 days. Which thankfully I got things backonline before that happened.

Yeah I also had Stripe webhooks disbled after a few days, as the servers were offline for around 10 days! I had to go through each single purchase and resend the event to Xenforo so it could process the payments/renewals correctly.
 

ichpen

Well-known member
Yep got hit by this. Webnx has been great until the s hit the fan then radio silence until recently. I cannot stress enough how important over communication is in times of crisis even if it's for the sake of communicating something.

I've moved 1 server and waiting on another to dry as it was on the splash zone. Unfortunately I forgot to backup one VM so I'm their mercy at the moment or I would have moved the second.
 
Top