What happened in the recent Hotmail outage

On December 31, 2010, a number of our users reported their email messages and folders were missing from their Hotmail accounts.  I want to take a little time to explain what happened, and what steps we’ve taken to fix this problem and preventit from happening in the future.

In Hotmail, one way we monitor the health of the email service is through automated tests. We set up a number of accounts with different configurations, and then use automated tests to log into these accounts, simulate normal user activity and behavior, and report when errors are found. We use scripts to create and delete these test accounts in bulk. The way we delete a test account is to remove its record from a group of directory servers that route users and incoming mail to the correct mailbox. 

On December 30th, we had an error in a script that inadvertently removed the directory records of a small number of real user accounts along with a set of test accounts. Please note that the email messages and folders of impacted users were not deleted; only their inbox location in the directory servers was removed.  Therefore when they logged in, a new mailbox was automatically created for them on a new storage server that didn’t contain their old messages and folders.   This is why the accounts received the “Welcome to Hotmail” message. 

The issue was first reported on December 30th, and initially our support teams were unable to trace the source of the problem.  A “ticket,“ (notification that an issue needs investigation), was entered into ourissue alert system on December 31st.  This issue was one that had not arisen before, and at first, we did not assign it to the correct team for action.  Additionally, because there were a relatively small number of reports, the volume wasn’t high enough to set off alarms. This meant we had a ticket in the system that was getting no action. 

We raised the priority of the ticket on January 1st after continued reports, and by that evening, we’d identified the root cause of the problem.  Our first step was to restore these users’ entries in the directory servers, which we did by early on the morning of January 2 PST.  We then merged their old emailmessages and folders with any new mail they’d received throughout the day on January 2nd. This required multiple passes to capture all the accounts and messages, so for some users, service wasn’t completely restored until January 5th. We completed the merge for 16,035 users on January 2nd and by January 5th had completed this for the remaining 1,320 userswho were affectedby this particular issue.

100% data recovery

I am happy to report that no user data was permanently lost in this particular incident, that is, we had 100% recovery of existing email and folders in the affected accounts. The only unfortunate exception to this statement is that, if you were affected by this incident and you didn’t sign in to your account between the time of the incident and the time your account was restored, then any messages sent to your account during that time would have bounced.

What we’ve learned

To prevent similar problems in the future, we’ve taken the following actions:


  • We are updating our infrastructure to use a separate code path for provisioning and removing test accounts, so that our testing no longer risks affecting real user accounts. 
  • We are changing our issue alert process so that when multiple users report missing data, these issues get a higher priority and immediate action.
  • We are updating our feedback process so that we can more clearly communicate status to affected customers through the support forums.
Other reports

We’ve also received reports of unrelated data loss issues, including people who set up a POP client (an email program on their computer or mobile phone) that, unbeknownst to them, was automatically deleting their messages from the server. Others found, after investigation, that their accounts had been closed as a result of not having signed in for 270 days.

If you think you’re missing email from your account, first check this Solution Center article on the most common reasons for missing email in Hotmail. If you don’t find a solution, be sure to report it in the Hotmail Solution Center Forums, as the more reports we get, the more quickly we can figure out and address your problem.

We apologize to the Hotmail users who were affected by this issue. Our commitment to protecting your data is a top priority for the entire Windows Live team. We will continue to investigate new incident reports as they come in, and we’ll share new information about these on this blog.

Mike Schackwitz

Windows Live Hotmail team

P.S. Here are a few related links you might find helpful if you’re having trouble with your Hotmail account:



aggbug.aspx

More...
 
Back
Top