Jump to content
RESET Forums (homeservershow.com)

MOBO failure during HDD Removal process – how to restore?


GDog
 Share

Recommended Posts

OK, I looked through the posts for the last couple of months, and I found one or two that addresses SOME of my issues, but not all. So, I will take a stab at starting a new thread. I apologize for the length of my post, but I’m not sure what is important to mention or not.

 

What makes my situation unique, I feel, is the way my MOBO failed and what was going on when it did. Here’s what happened:

 

My system:

MSI P35 Platinum, Pentium Dual Core E5200, 2GB RAM with video card installed (not headless)

System Drive: WD 500GB Black

Data Drives: 4 x 2TB WD Green EADS drives connected to MOBO ports

System Backup HDD: 1 x 1TB WD EACS connected to …

1 x SATA PCI card SiI 3114, 4-port

Actual amount of Data: Approx 6.5TB

 

History of Events leading up to MOBO Failure:

I did not want to set the entire “Software” folder to duplicate, just one folder inside. So, I created a separate folder outside of the Software folder just for that purpose and set THAT folder to Duplicate. After a while – seems like shortly after it finished the Duplication process – I started to get a bunch of File Conflict Errors. I went in and corrected or deleted the files, but they kept coming back. So, I started a chkdsk of all the volumes, like Microsoft says we’re supposed to do. This process went on for 2-1/2 days and was only finished part way through the third drive. Even though not finished, the second drive in the order had a lot of errors, so I decided to terminate the process and just replace that drive.

 

At this point, I’m still not suspecting any Hardware failures except the HDD’s, which as I knew, were prone to failure. So I put in a new 2TB WD Green HDD (call it HDD-2), added it to the pool and started the Removal process. It took 2 days to finish due to several File conflict errors. I had to re-start the process several times to get it finished. Once the bad drive (call it HDD-1) was out of the system, I put it on the bench and tested it fully. I couldn’t find anything wrong with it! Plus, WHS was still throwing up File Conflict Errors. Thinking I just pegged the wrong drive, I ran chkdsk on the specific drive/volume I had just put in earlier (HDD-2). Found errors, so now I am starting to think maybe that drive was bad too. So, I added HDD-1 BACK to the pool and started the Removal process for HDD-2. After 2 days, it was only maybe 40% finished. NOW, I’m starting to think something ELSE is wrong, so I terminated the Removal process and tried to reboot.

 

Reboot Failed. The system would not even POST – no video AT ALL, no sound, no beeps, NOTHING - except 4-Red lights on the Diagnostics LED’s on the MOBO indicating a failed CPU. Purchased a NEW CPU, Pentium Dual Core E5400 (can’t get the E5200 anymore. Installed it and still nothing. Final Conclusion: BAD MOBO. Purchased new MOBO, Supermicro C2SBC-Q with Intel Q35 Chipset.

 

SO, given this series of events, what would be, in your expert opinions, the best method to use to rebuild my WHS to ensure that I can restore all my data?

 

Thanks so much for everyone’s help!

Gary

Edited by GDog
Link to comment
Share on other sites

OK, I looked through the posts for the last couple of months, and I found one or two that addresses SOME of my issues, but not all. So, I will take a stab at starting a new thread. I apologize for the length of my post, but I’m not sure what is important to mention or not.

 

What makes my situation unique, I feel, is the way my MOBO failed and what was going on when it did. Here’s what happened:

 

My system:

MSI P35 Platinum, Pentium Dual Core E5200, 2GB RAM with video card installed (not headless)

System Drive: WD 500GB Black

Data Drives: 4 x 2TB WD Green EADS drives connected to MOBO ports

System Backup HDD: 1 x 1TB WD EACS connected to …

1 x SATA PCI card SiI 3114, 4-port

Actual amount of Data: Approx 6.5TB

 

History of Events leading up to MOBO Failure:

I did not want to set the entire “Software” folder to duplicate, just one folder inside. So, I created a separate folder outside of the Software folder just for that purpose and set THAT folder to Duplicate. After a while – seems like shortly after it finished the Duplication process – I started to get a bunch of File Conflict Errors. I went in and corrected or deleted the files, but they kept coming back. So, I started a chkdsk of all the volumes, like Microsoft says we’re supposed to do. This process went on for 2-1/2 days and was only finished part way through the third drive. Even though not finished, the second drive in the order had a lot of errors, so I decided to terminate the process and just replace that drive.

 

At this point, I’m still not suspecting any Hardware failures except the HDD’s, which as I knew, were prone to failure. So I put in a new 2TB WD Green HDD (call it HDD-2), added it to the pool and started the Removal process. It took 2 days to finish due to several File conflict errors. I had to re-start the process several times to get it finished. Once the bad drive (call it HDD-1) was out of the system, I put it on the bench and tested it fully. I couldn’t find anything wrong with it! Plus, WHS was still throwing up File Conflict Errors. Thinking I just pegged the wrong drive, I ran chkdsk on the specific drive/volume I had just put in earlier (HDD-2). Found errors, so now I am starting to think maybe that drive was bad too. So, I added HDD-1 BACK to the pool and started the Removal process for HDD-2. After 2 days, it was only maybe 40% finished. NOW, I’m starting to think something ELSE is wrong, so I terminated the Removal process and tried to reboot.

 

Reboot Failed. The system would not even POST – no video AT ALL, no sound, no beeps, NOTHING - except 4-Red lights on the Diagnostics LED’s on the MOBO indicating a failed CPU. Purchased a NEW CPU, Pentium Dual Core E5400 (can’t get the E5200 anymore. Installed it and still nothing. Final Conclusion: BAD MOBO. Purchased new MOBO, Supermicro C2SBC-Q with Intel Q35 Chipset.

 

SO, given this series of events, what would be, in your expert opinions, the best method to use to rebuild my WHS to ensure that I can restore all my data?

 

Thanks so much for everyone’s help!

Gary

 

 

GDog,

 

Assusming you stick with an intel 775, you should be able to just put in the board and reboot. IF you change to a different motherboard type (ie: COre I3) it might just fire up and detect new hardware or worse case do a server re-install. If you go to an AMD then you will have to do an server re-install. The important thing is that after you install the board, try it and if does not boot correctly, boot from the install CD and "MAKE SURE" that you select server -reistall not the new installation. Very important or you will cream you data. I have been through three separate recoveries like you are describing and it is fairly painless if you heed that warning. Good luck and pleae read my repsonse on the new forum locations.

Link to comment
Share on other sites

GDog,

 

Assusming you stick with an intel 775, you should be able to just put in the board and reboot. IF you change to a different motherboard type (ie: COre I3) it might just fire up and detect new hardware or worse case do a server re-install. If you go to an AMD then you will have to do an server re-install. The important thing is that after you install the board, try it and if does not boot correctly, boot from the install CD and "MAKE SURE" that you select server -reistall not the new installation. Very important or you will cream you data. I have been through three separate recoveries like you are describing and it is fairly painless if you heed that warning. Good luck and pleae read my repsonse on the new forum locations.

 

Thanks PCDoc,

I tried your advice and just hooked everything up and fired it up. Like I stated in my other post, I decided to go with the Supermicro C@SBC-Q MOBO which is a socket 775 board. As far as booting up goes, it was successful. After it finished booting, there were numerous drivers that didn't install, so I just put the CD that came with the board in the drive and eventually WHS found all the drivers it needed and my Device Manager was free of yellow!

 

Now, on to that drive removal. After a short burn in and check to make sure all systems looked OK, I started the drive removal process again. This time, I noticed that the drive that was giving me all the errors is a 2TB WS Greem drive with a manufactured date of October 2009. A Google search on this confirmed what I thought I had read previously - that WD was having trouble with these drives with that manufacture date failing prematurely. Sure enough, the removal process has been going on now for over 40 hours and is only about 75% finished. A look at the system event logs shows literally THOUSANDS of drive related errors for Drive #3. Checking Properties for any of those errors shows this comment: "The Device, HardDisk3 has a Bad Block." Looks like a failing drive to me, eh? Hopefully this "Removal Process" will finish in another day or so so I can save as much of my data as possible and I can RMA that HDD.

 

So, it is looking like I had a multiplicity of issues, a failing hard drive AND a failed MOBO at the same time! What's the likelihood of THAT? Sheesh!

 

Anyway, thanks so much for your help. I will let everyone know how it all works out and how I like that Supermicro MOBO.

 

Cheers!

GDog

Link to comment
Share on other sites

Sorry for your troubles, and good luck. Hopefully the HDD coughs up the files before it stops working altogether. *fingers crossed* In the future, have a server backup scheme for instances just like this. You won't regret it.

 

Anyone think that the two components dying or damaged at the same time might point to a PSU problem?

Link to comment
Share on other sites

  • 2 weeks later...

Warning! Somewhat Long Post.

 

UPDATE:

Thanks Guys! It’s been a while since I posted anything, but I have been far from idle in trying to get my server running error-free again. I’m STILL not nearly finished, but it is taking SOOOO darn long to fix everything, I thought I would sign on and update everyone as to what is going on.

 

As I said before, I took your advice and just hooked everything up and tried a boot. As you know, IT WORKED! It booted right up and found all my drives. ALSO, on a hunch, I re-installed the old Pentium E5200 just to see if MAYBE it wasn’t the CPU at all. It wasn’t. The E5200 is working just fine. Guess I pulled the trigger on that purchase a little too soon, eh? Whatever – I can always use it in another build.

 

As soon as I got it running smoothly and installed all the updated MOBO drivers, I started getting File Conflict errors again. Shoot! Except THIS time, I *KNEW* for sure that it wasn’t my CPU OR my MOBO that was giving me this problem. It HAD to be the HDD’s.

 

So I started CHKDSK again, but after 4-days of running, I interrupted it because it had only finished 2 HDD’s but had already identified at leas ONE HDD with massive errors and seemed to be having LOTS of trouble reading the third one. I saved the info to a .TXT file, wrote down the HDD identifying info and proceeded to try a soft Power Off. After about 10-min, the system powered down.

 

Then, I put in a new WD20EADS HDD I had in reserve, powered back up and started a Drive Removal on one of the faulty drives (Volume 10). This drive was about 2/3 full before the removal, but it was taking FOREVER! Sure enough, a check of the Event Log showed massive read errors. After 4-1/2 days, it finally finished. I pulled the drive and tried to run it through the WD testing on another PC. It failed WD’s own testing, so it is going back to WD on Monday. Spinrite would not even run on this drive, throwing up a “Division Overflow Error”. Curiously, the Manufacture Date on the drive is October 2009, the SAME date a lot of other people have had trouble with on this particular HDD.

 

While all of this was going on, I was running Spinrite Level 4 on one of a pair of Brand New WD20EADS HDD’s I had just purchased. NOTE: I will NEVER put another HDD into service on any of my Servers without FULLY testing it first. A full Format is just not enough by itself. If anyone knows of a better way to do this than Spinrite, I would sure like to know about it, mainly because Spinrite wants SIX full days to do a Level 4 on a single 2TB HDD. And that is if it does NOT find anything wrong. Sheesh!

 

Next, I put in the freshly Spinrite’d HDD and proceeded to start a Drive Removal on one of the other suspected faulty HDD’s (Volume E). And THAT’s where I am right now – waiting for that Drive Removal to finish. I am THREE full days into this Drive Removal and it is only 80% finished.

 

 

COMMENTARY: This is an INSANE amount of time and effort to correct these problems, and calls into question (at least for me) the reliability of the WHS system and the overall usefulness of this method of protecting one’s data. I have spent an inordinate amount of time and money on a system that was supposed to save me BOTH.

 

At the very least, I am beginning to question the wisdom of using drives with such large capacities. When one of them goes out, it not only places a HUGE amount of data at risk, but also involves a HUGE amount of time to R&R. I would sure like to hear what others think about this, and what, if anything you are doing to mitigate this “problem”.

 

Thanks for following my lengthy story guys!

 

-Gary

Link to comment
Share on other sites

Sorry for your troubles, and good luck. Hopefully the HDD coughs up the files before it stops working altogether. *fingers crossed* In the future, have a server backup scheme for instances just like this. You won't regret it.

 

Anyone think that the two components dying or damaged at the same time might point to a PSU problem?

 

I don't THINK it is a PSU problem. I was very careful to select the right one. It is a high end, high efficiency Corsair 450W unit. I calculated that with 8-Green drives + 1 7200RPM standard drive, MOBO etc, that the 450W unit would be at 65-80% of capacity, which would put it in its most energy efficient operating range. If you guys think it *is*, I have another 650Watt High efficiency Antec Earth unit that I can slip in there.

-Gary

Link to comment
Share on other sites

At this point, it would not hurt to rule out the variable. It is not that it is too small or over capacity, but rather to see if there is something wrong with it such as an unstable rail.

Link to comment
Share on other sites

At this point, it would not hurt to rule out the variable. It is not that it is too small or over capacity, but rather to see if there is something wrong with it such as an unstable rail.

Yes, that's what I was getting at. But the fact that you have a bad HDD with a manufacturing date that others are having problems with could mean you may have gotten to the root of the problem. I'm interested in the fact that you are using Spinrite to pre-qualify your new drives. Seems like a good idea, except for the time it takes.
Link to comment
Share on other sites

I must be the only guy that does not like Spinrite. I do not think it is necessary to to pre-qualify the hard drives using this method. There is definetly an issue there but I do not think that going through spinrite will porvide an assurances at all. What is the date code on your 2T as I have about 18 of them and I am not aware of any issues. Would be interested in what vintage you are using.

Link to comment
Share on other sites

I'm a big fan of SpinRite to fix drives. I don't pre-qualify them. The only time I use it is when someone brings me their "dead" machine and they have no back-ups. I SpinRite it, get the data, then dump the drive.

 

I actually have a box that has a swappable bay so I can slap in a drive and run spinrite on it. During the next update cycle of my home computers, I'll combine the boxes and have one box for both IDE and SATA drives to run spinrite.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...