Jump to content
RESET Forums (homeservershow.com)

Delayed Write Failure -- HDD failure


Jesse
 Share

Recommended Posts

Hello,

 

As of late I am getting a ton of delayed write failures on my whs V1 box.

 

The box hung on attempted restart and after more than an hour I finally did a hard reset. It came back up and ran check disk during which it found and repaired some errors. When I finally got the box up and running it was running pretty slowly and I had some issues accessing the shared folders. More delayed write errors popped up, so I disabled write caching and rebooted. The reboot took a good long while, during which I got more delayed write errors. The delayed write errors have not stopped with write caching disabled, but there are fewer of them.

 

Poking around on google turned up imminent disk failure as the most likely cause of delayed write failures, although the console shows all drives are healthy. I installed the WHS Smart add in and it indicated two of the drives were less than perfect but not on death's door. Never the less I went ahead and tried to have the disk removal wizard remove one of the less than perfect drives. After a few hours it hung.

 

All the files on the D partition are backed up externally and all the important ones have folder dup activated. Since the remove disk wizard is not working could I not just pull and replace the drives one at a time? Kind of an ugly solution, but it should be no different than recovering from an all out disk failure.

 

Anyone have any other suggestions about what could be causing this problem besides drive failure? I would hate to just yank the drives and find I still have the problem...

 

TIA

 

Jesse

Link to comment
Share on other sites

Jesse, I've had the delayed write errors before and it's ultimately been a bad SATA port and the drive was fine. Although I have had it on bad drives too, and I suppose a bad cable could cause it too. If possible you may want to just try a different port, cable or controller. Or run the vendor diags.

 

As for pulling the drives. When I've had problems with the drive removal wizard and everything was duplicated I just removed the physical drive and then deleted the drive from WHS (If I remember right I just ran the removal wizard for the missing drive and ignored the warnings). SO you can pull the physical drive and not lose duplicated files or restore unduplicated ones from backup.

Link to comment
Share on other sites

Does sound like an interesting dilemma. Not sure what type of server your have but before you resort to that, re-seat your cables just to rule out some of the obvious, reboot and try the wizard again. If that does not work, assuming it is not primary drive, what you stated should work. You should be able to replace the drive. Keep in mind you may loose you client backups so you may want to download and run BDBB in you have not already done so and store you backups on an external drive. Just pulling the drive and replacing should still keep the original intact so it would be worth a shot. I am not aware of any other quick solutions for failing hardware.

 

 

Good luck and keep us posted.

Link to comment
Share on other sites

Hello and thanks for the help.

 

I have tried re-seating all the cables and power connectors to the drives.

 

The OS drive is a raid 1 mirror using intel chipset raid on the mobo (Asus P5E-WS Pro). The intel matrix storage utility indicates that the OS mirror is in good health. It also indicates that two of the other drives attached to the mobo are in good health. The last drive attached to the mobo is attached to a marvell controller and I can only go by the SMART data on this drive.

 

A bad port on the mobo has occurred to me. I really wanted to use this hardware for my next server (SBS 2011 Essentials) so I am hoping this is not the case. Is there any reason I could not shut down and try swapping the drives to unused sata ports? I know that this is no problem on my 3ware raid controller, but I have not tried it on the mobo ports. The os should recognize the drives just fine even if the ports have changed, right?

 

Thanks again.

 

Jesse

Link to comment
Share on other sites

Hi,

 

Just walked in from a very short trip and checked on the server. No delayed write errors in the event log since early yesterday morning. I needed some audio files off of the server before leaving and it hung, so I rebooted and it once again and ran check disk. It found and repaired errors and I got the files onto my Iphone and headed out on our trip. I got home and find that no more errors have occurred. Maybe check disk fixed it?

 

Jesse

Link to comment
Share on other sites

Hi,

 

Just walked in from a very short trip and checked on the server. No delayed write errors in the event log since early yesterday morning. I needed some audio files off of the server before leaving and it hung, so I rebooted and it once again and ran check disk. It found and repaired errors and I got the files onto my Iphone and headed out on our trip. I got home and find that no more errors have occurred. Maybe check disk fixed it?

 

Jesse

 

If I was you I would keep a eye on this it can either be corrupted files/indexes ect but I can also be a drive or a port that is on the way out.

 

About a month and a half ago on my own system I started to get regular delayed write errors, replaced the sata cables still the same setup a quick bat file to run once a week

 

net stop pdl

net stop whsbackup

chkdsk D: /x /r

chkdsk C: /x /r

for /d %%1 in (C:\fs\*) do start chkdsk /x /r %%1

 

 

seemed okay for a bit then it started to happen again and noticed the server was taking longer and longer to browse certain shares long story short it turned out to be the disk that was failing.

I used seatools to check the disk ( removed it from the pool and installed it in a seperate system and checked it offline )

Edited by generious
Link to comment
Share on other sites

Hi,

 

Just walked in from a very short trip and checked on the server. No delayed write errors in the event log since early yesterday morning. I needed some audio files off of the server before leaving and it hung, so I rebooted and it once again and ran check disk. It found and repaired errors and I got the files onto my Iphone and headed out on our trip. I got home and find that no more errors have occurred. Maybe check disk fixed it?

 

Jesse

Don't want to beat a dead horse, but wanted to remind you that SpinRite is a great tool for maintaining drives. It can also, in this case, provide you with some info as to whether it's really the drive or something else: if SR finds no problems on the drive, then I would bet it's fine. It would at least eliminate 1 possibility.

 

Just this past weekend, I resolved an issue with my HTPC (it wouldn't do backups to my WHSv1 at all, not even manual; gave failure messages). Ran SR on the disk and now it's running like a top, and backing up great.

Edited by ikon
Link to comment
Share on other sites

HI,

 

Of course, the errors popped up again after a few days. I brought the server down and started testing the various drives using the manufacturer's drive test utilities. A 1TB Hitachi deathstar came up bad. Everything is backed up and everything important is duplicated, so I went ahead and powered it back up and had the wizard remove the now missing drive. After about 19 hours it was still trying to remove the drive.

 

I am just going to pick up a few new drives and move to 2011.

 

ikon - no dead horse with spinrite. If you have had success with something you should let people know about it.

 

Thanks for all the help.

 

Jesse

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...