Windows Home Server Hard Drive Failure
What do you do when a hard drive fails?
A Western Digital drive died in my system recently so I decided to document the steps I went through and what all happens during this process. I’m completely out of USB ports on my Media Smart Server so I decided to shut it off and re-organize. I’ve decided to use a USB hub for my printer and the battery backup and I’ll keep all data drives on their own dedicated port. When I powered it back up I noticed the top drive light was not on. Sure enough, I go back to my workstation and the notification icon is an angry RED color.
I tried re-seating the drive in question. It also felt cool to the touch when I thought it would be a tad warm even from casual use. My server is in a basement equipment room so it’s nice and cool so that my explain it.
I opened the console and here is what I found.
Here is what Disk Management Add-In reports. The non-storage drive is what I am using with KeepVault. KeepVault runs a nightly backup of my shares to this drive. It’s also the ioSafe drive.
Clicking the Network Critical notification also informs me that my backup database is feeling ill.
I’m also seeing these popups. The first one was for the music folder, now the Photos folder. I got a few more then finally a Network Critical notification. (Isn’t it fun to sneak a peek at what is running on other peoples toolbar? Think you can name all the apps in mine?)
My only option with the so-called “Missing Drive” is to remove it but I think I’ll poke around the Event Viewer via Remote Desktop to see if WHS is doing anything in the background. I also looked at Disk Manager and the drive is not there so I’m now convinced I have a bad drive. I guess my server has been trying to tell me that all along?
I’ll remove the offending drive.(you know I’m going to check it on another computer though!)
Several Hours!
Now is where I start thinking. What folders did I NOT have duplication turned on? DVD’s. Yep, my movie collection. Well, about 20 or so DVD rips. I was just about to start ripping some more DVD’s for a review too. I may have to start over if that folder was on the dead drive. I also think that the Software and Public folder were not duplicated. That might not be very smart on my part. My Software share was growing. I have started putting everything that I download there. Beta’s, drivers, Anti-Virus stuff. That’s going to suck if I have to re-download some of the beta stuff. Luckily, my Win7 ISO’s are still on my desktop!
Wait and see. 8:45PM start. 10:20 Finish
Once again lets triage the situation. What is bad, what is good?
Still seeing the database error.
Still seeing file conflict errors.
Under the shared folders tab I see some disturbing news.
This is the status of all my shared folders. The healthy folders are mostly unused or empty folders.
Fix it
It’s time to dive in and see what I can do to get rid of these errors. First thing is to pull the bad drive out and restart the server. This is recommended by Knowledge Base article link from this error message.
Further reading of this article suggests that by repairing the backup database I will lose all my PC backups. That is not good news but at least I have my shares intact.
After the reboot I see my shared folders are all healthy.
Disk Management is reporting more usage on two drives leading me to think that the duplication efforts were successful.
I still have the “backup database errors” critical notification so it looks like I’ll be clicking the repair button and losing my backups. Someone alert Yakuza that I’ll be downloading his BDBB Add-In real soon!
I let the process run all night so I don’t know how long it took. Here are the results.
Click next and the backup service will start again. It also re-checks database consistency. Even though it says its starting the backup service I have backups of all computers that were on last night. So I feel good about that. I never went very long without protection. The whole process lasted over 4 hours. It took longer because I took pause and read a whole bunch of stuff to make sure I didn’t goof anything up. In the end, I followed all the prompts and clues from WHS and it took care of everything.
Conclusion:
I suppose there is a lesson in this but there is also a question. Why can I not duplicate computer backups automatically from day one?
Did I lose anything important? I lost two backups that I wouldn’t mind having back. One was a family member laptop that I backed up just to have the image in case the family member had trouble in the future. I have the factory restore disks for this laptop so all is not lost here.
The other was an old laptop that needs a small repair. I had a new drive for it and never got around to the repair. I guess Win7 will go on it now.
On the bright side this issue did clear up a bunch of space for me! I can’t find much to complain about except for the fact that I cannot duplicate backups “out of the box.” I’m certainly aware of the Add-In BDBB but don’t normal users expect to be protected 100% when they purchase WHS? Even so, I only went 4 hours without backups. There was only a slim chance of failure during that 4 hour window. I can’t imagine the bad luck of having a computer hard drive fail during this vulnerable period. I bet it has happened to someone or it will.
I suppose I will take a deeper look at BDBB from Alex Kuretz and enable duplication of my backups via this Add-In.
Off to find a good hard drive deal!
See Also: The Home Server Show 39 – OS System Drive Failure OptionsShare
Category: Windows Home Server




Sorry for your troubles. I've been there. I haven't had to go to your lengths, though. I shut the server down, removed the drive from the internal enclosure and put it in a different enclosure. On restart, it worked, luckily. Seems the power supplies on some of these external drives can go bad from time to time.
I put the drive in a test machine and it is most certainly bad. It came up once but otherwise was very flaky. I was just about to toss it out but decided to check the warranty on it. Come to find out, it's under warranty so I'll ship it to WD and hopefully have a replacement soon.
[...] Lastly, I’m running MSS EX470 with 2GB RAM. I’m thinking about adding a few more. BDBB for instance. My recent hard drive failure taught me a lesson. [...]
Thanks for the article. It's good to know what happens when a drive fails. On your advice I've also gotten the BDBB add-in and it's running right now. Great stuff.
Also thanks for the article, Had a drive fail, Luckily no database issues. My first reaction was to reboot but my X510 DataVault would not even bootup with the bad Samsung 1TB drive (My first and hopefully last Samsung HDD failure) it was clicking, hissing and banging and thrashing about it sounded real sick. I had to bootup without the faulty drive and then re-insert the drive once WHS was up and running. Even when running WHS Console would freeze off and on with the bad drive in bay 2 but I was able to remove the drive luckily and then restart WHS. All Backups and Share data was fine and I put another Samsung 1TB drive and working again fine.
[...] My hard drive failure [...]
Nice article Dave! I had a drive fail on my this morning and when I searched Google for a good article to walk me through the steps this was the top result and based off the top 5 articles that I reviewed this one was perfect. I had implemented BDBB about a month ago so i got all of my machine backups recovered back to the storage pool. Thanks again.
From what I understand Microsoft have developed Windows 7 to include an backup system which copies the complete hard drive as an image, which is easier to re-install. If you want to use it and have Windows 7 do the following:
Open MS Access.
Click on the Tools menu. Choose “Database Utilities.”
Click on “Compact and Repair Database.”
A window named, “Database to Compact From” will open.
Choose the database that you’re going to Compact and Repair. Click on “Compact.”
It’s easy, right? Make sure to apply this method in your office databases so that frustration from a corrupt database will be avoided.
Hope this help a little.
I have made the server fail by pulling drives just to see what would happen . I think it is better to know when you trust so much on a WHS . The WHS 2003 passed what a great program . the truble I had was restoring a computer . I am now trying WHS 2011 no more DE I hope it will work as well …. systems fail that's a fact it's just a matter of when . It nice to know you have a backup do I with 2011 ? not sure about raid realy like DE
The right thing to do with WHS, is to put WHS on a virtual machine (buy the OEM from newegg for $50), that is running on top of a raid array. I use Open Solaris with ZFS to be able to add space and replace drives as needed. I can the just add another drive if I need more space, and I can still do duplication to make the WHS server more easily recovered, but ZFS snapshots will get you "exact" backups, instantly with the ability to recover back to a particular day of the OS trashes something.
In this configuration, you are using ZFS/raid to handle drive failure and OS trashing disks. WHS then just becomes a simple VM'd application that will work trivially to do your backups. ZFS was designed to give enterprise quality data management with consumer grade devices. ZFS uses checksums on all blocks, and the ZFS scrub mechanism does block validation on every block that is used, and will move blocks as the surface decays. It tells you how much is going wrong with the pool so that you can watch for escalating degradation and replace a drive before its a real problem.
What I'm trying to find out is how to remove disks from WHS once they've failed. I've already physically removed them, but they will not remove from the WHS console(which is the only thing tracking them anymore). The drives were garbage, but now WHS seems in love with the memory of them. Is there a .bin or something that I can edit to get them out of my hair? Maybe a registry key I need to kill? ANY help would be greatly appreciated.