It generally should have cleared quickly. I only discovered late last night what was really going on and it ended up in a situation that was really frustrating because it was a) out of my control and b) would have been really easy to prevent (but because of A, I couldn't do anything). The downtime ended up being inevitable when we wanted to change DNS settings.
We had been using DNS from the registrar (Namecheap/Enom) and it had actually been like that since day 1. We have had a few issues with it due to DDoSes directed at others, but for the most part it has been fine. However, yesterday morning, there were some sporadic issues that led to DNS resolution failing for some people. That was the final straw and led to us moving DNS (to Route53; if you're accessing us now, you're using it).
Here's the issue though. As soon as we switched off the Namecheap DNS v1 servers (even if we were moving back to their DNS v2 servers), DNS v1 stopped resolving our domain. There appears to be a goal of redirecting to the new canonical servers but it doesn't appear to work correctly. This means that if someone is still hitting the old DNS server, the domain would fail until their local (intermediate) DNS server updates at realizes that the canonical records are elsewhere. That seemed to happen very quickly on a number of servers, but others appear to still be holding out now (>24 hours later). Because of the locality of DNS and intermediate servers ignoring TTLs, it's somewhat hard to know how many people are still affected but it should hopefully be very few; in terms of raw traffic, there hasn't been any noticeable difference.
If we had been on DNS v2, none of this should have been an issue because it would have still resolved the domain and would have allowed a graceful transition. When I realized what was going on, I contacted the DNS provider about this but it seems that there was nothing to do about it. I contemplated going back (which should have resolved the issue immediately), but we had already gone through (hopefully) most of the "pain" of the transition and if I had undone it, that would have ended up causing the whole issue again at some later.
Ridiculously frustrating situation.
We had been using DNS from the registrar (Namecheap/Enom) and it had actually been like that since day 1. We have had a few issues with it due to DDoSes directed at others, but for the most part it has been fine. However, yesterday morning, there were some sporadic issues that led to DNS resolution failing for some people. That was the final straw and led to us moving DNS (to Route53; if you're accessing us now, you're using it).
Here's the issue though. As soon as we switched off the Namecheap DNS v1 servers (even if we were moving back to their DNS v2 servers), DNS v1 stopped resolving our domain. There appears to be a goal of redirecting to the new canonical servers but it doesn't appear to work correctly. This means that if someone is still hitting the old DNS server, the domain would fail until their local (intermediate) DNS server updates at realizes that the canonical records are elsewhere. That seemed to happen very quickly on a number of servers, but others appear to still be holding out now (>24 hours later). Because of the locality of DNS and intermediate servers ignoring TTLs, it's somewhat hard to know how many people are still affected but it should hopefully be very few; in terms of raw traffic, there hasn't been any noticeable difference.
If we had been on DNS v2, none of this should have been an issue because it would have still resolved the domain and would have allowed a graceful transition. When I realized what was going on, I contacted the DNS provider about this but it seems that there was nothing to do about it. I contemplated going back (which should have resolved the issue immediately), but we had already gone through (hopefully) most of the "pain" of the transition and if I had undone it, that would have ended up causing the whole issue again at some later.
Ridiculously frustrating situation.