Recovery maintenance scheduled for Saturday, March 3
Last Thursday and Friday, as I'm sure you're aware, Central Web Services experienced a major service interruption that brought down the majority of our websites, at least intermittently, for a significant part of both days. First off, I'd like to take this opportunity to apologize to those of you who depend on our services. Second, I want to assure you that we have identified the root cause and are in the process of implementing safeguards against this kind of thing happening again.
We do, however, have some more maintenance to do along the way, and the first part of that is scheduled for tomorrow evening. To resolve the outage last week, our primary network-attached storage (NAS) server had to be brought back online in a cross-site storage configuration; essentially, this means that although our NAS server is in one data center, its disks are in another data center several miles away. This introduces a new (albeit small) element of risk into the overall web service architecture.
In order to mitigate this risk, CWS is coordinating with the ITSS infrastructure team to reconfigure the NAS such that its data and its services all operate out of the same data center. Due to the way the backend storage system works, it is highly unlikely that this maintenance work will result in any interruption of service. In the event that web services do need to be taken offline, we anticipate no greater than one hours' worth of downtime.
The infrastructure team plans to begin this process at 7:00 pm tomorrow night (Saturday, March 3, 2012). Again, we do not expect any downtime. However, if you are relying on your website for something unusually critical during the scheduled maintenance window, please let us know; we can reschedule if absolutely necessary, since the risk we're mitigating is relatively small.
the CWS team