Jump to content
RESET Forums (homeservershow.com)

VERY Strange Drive Removal Error Pattern


GDog
 Share

Recommended Posts

I have a thread running about how to restore my WHS system, which has involved the replacement of some error laden HDD's, but THIS is so strange, I decided to start a separate thread on it. I became aware of this error pattern 2 days ago and decided to watch it for another 2 days before I put it on the Forum. Here's what is happening:

 

I am in the middle of a Drive Removal process on a 2.0TB WD HDD that has been running now for over 120 hours. It has MANY errors, and it is obvious that WHS is struggling to save the data on that drive and transfer it to the new drive. Almost from the beginning, I have been checking the System section of the Event Viewer to get a handle on just how many errors WHS is finding. The numbers are MASSIVE. What is strange, however, is WHEN it is finding the errors. Basically, the system is reporting Disk Errors continuously for about 18 straight hours from approximately 4pm each day until approximately 10:30am the next day. The only time it is NOT reporting Disk Errors is between approximately 10:30am until approximately 4:00pm - EVERY DAY! This is the ONLY time in a 24-hour period that it does NOT find errors, and the only time when the green bar makes any significant progress. During this approximate 5-hour period, it finds NO errors at all. Here is my log of these times (starting with the second day, when I started keeping a record):

 

11-06-10, 10:42am to 3:59pm NO ERRORS --- 4:00pm to 11:07am CONTINUOUS ERRORS every 3-6 Min

11-07-10, 11:08am to 4:03pm NO ERRORS --- 4:04pm to 10:22am CONTINUOUS ERRORS every 3-6 Min

11-08-10, 10:23am to 4:04pm NO ERRORS --- 4:05pm to 10:14am CONTINUOUS ERRORS every 3-6 Min

11-09-10, 10:15am to 4:03pm NO ERRORS --- 4:04pm to Now (09:22pm) CONTINUOUS ERRORS every 3-6 Min

This information is taken directly from the Event Viewer.

 

I don't know about you guys, but I find this pattern VERY strange and unexplainable. With only 5 to 5.5 hours per day of clean reading, NO WONDER it is taking so long to remove the drive! This makes absolutely no sense to me AT ALL! Can any one out there explain this? I'm all ears.

Thanks,

Gary

Link to comment
Share on other sites

That's just weird. Maybe the solution will provide more insight into the working of WHS and DE.

 

No kidding!! OR ... maybe the solution will provide insight into the workings of spinning hard drives in general. OR, the way spinning hard drives work in conjunction with WHS. All I know is I can't explain it. Maybe this is a common pattern. Maybe only ONE of the drive's (4 or 6) heads (can't remember exactly how many) is causing the read errors. Either 4 or 6 are evenly divisible into 24-hours. Makes you wonder, but I don't know enough about the inner workings to see a positive correlation.

 

I have to leave the office for a while, but I will be back around 11:30am (PST). If the procedure holds true to the pattern, WHS will have started its daily clean reads. I will of course, check it and report back to the Forum.

-Gary

Link to comment
Share on other sites

When are your backups running and do you have anti-virus running on your server?

Since he's in the middle of a Drive Removal, I believe the back-ups would be suspended, since all other data access is suspended during the drive removal.

Link to comment
Share on other sites

When are your backups running and do you have anti-virus running on your server?

 

Backups are suspended as AllToga states. I do not have ANY anti-virus SW running on my server at any time. I am thinking about installing MS Security Essentials soon however.

 

Since he's in the middle of a Drive Removal, I believe the back-ups would be suspended, since all other data access is suspended during the drive removal.

 

Correct, AllToga.

Link to comment
Share on other sites

UPDATE:

When I left the Office at 9:35am, the system was reporting disk errors every 3-6 minutes, as I have stated earlier. At that time, the Disk Removal Process was approximately 91% finished. When I returned to the office at 11:35am, I checked the Event Viewer and what do you know? The last Disk error was reported at 10:21am! (Predictable, NO?) After that, there were NO more errors. The Disk Removal appears to have finished (100%) at approximately 11:44am. It says the Removal was successful and tells me I can go ahead and disconnect it from the Server.

 

Wow! This process took 136.6 hours total (5.69 days) to complete. That is a VERY long time. The good news is that WHS is not reporting any lost data. Therefore, it must have managed to correct all the errors that it found and recovered the data. Would this be a safe assumption?

 

Another point: since we know that it was only reading cleanly (without errors) for about 5 to 5.5 hours per day, we could approximate that the time needed to remove a cleanly readable 2.0TB disk and remove it would be about 29.4 hours (5.6 days x 5.25 average hours per day). All the extra time it took was in trying to read marginal data blocks. Is THIS a logical assumption?

 

Let me know what you guys think. I am also still quite puzzled about the error reporting pattern. I would like to hear a plausible explanation for that.

 

Thanks,

-Gary

Edited by GDog
Link to comment
Share on other sites

Hmmmm ...does sound ...weird.

 

but like a lot of things ..in computers .....until the 'Unknown' becomes 'known', it can be weird.

 

 

Question: Could, room temperature be a factor between 10:30am to 4:00pm??

 

PS Glad you got your data off.

 

You could do some data verification or compares ...using file duplicate finder programs eg Duplicate Cleaner.

These programs usually have a byte by byte file comparison mode. ..if you can afford the time!!!!

Might be better to move the ailing drive to another machine, if available, to do more work on it.

 

cheers,

Phil.

Edited by manphil
Link to comment
Share on other sites

Hmmmm ...does sound ...weird.

 

but like a lot of things ..in computers .....until the 'Unknown' becomes 'known', it can be weird.

 

 

Question: Could, room temperature be a factor between 10:30am to 4:00pm??

 

PS Glad you got your data off.

 

You could do some data verification or compares ...using file duplicate finder programs eg Duplicate Cleaner.

These programs usually have a byte by byte file comparison mode. ..if you can afford the time!!!!

Might be better to move the ailing drive to another machine, if available, to do more work on it.

 

cheers,

Phil.

 

Hey Phil,

Thanks for your contribution! I was about to write an update, but I will just reply to your post and do the update here. First however, I would like to know more about your recomendation of the "Duplicate Cleaner". I could really use something like that! Is this a WHS add-in? If not, is it designed to be run remotely? Can it be run on the WHS itself? I do not run headless, so if it is not an Add-in, does it read the shares and find dups there? How do you use it?

 

BTW: Room temp was controlled at 75F.

 

MY UPDATE:

I tagged a WRONG drive as one of the bad drives (I thought I had TWO bad drives)! After I removed what I THOUGHT was a bad drive, it turned out to be OK (I *think*). As soon as it was out, I ran Spinrite level 5 on it, and it is now 60% finished (3-1/2 days) and not a single error. OOPS! turns out the REAL culprit was likely one of the other drives. I tried the Drive Removal process, but after 36 hours of ZERO green bar progress and what looked like a hundred thousand disk errors in the event log, I decided to just yank the drive and save what I could off it before it dies completely and I cannot even access it. I shut the Server down and pulled it. After I "Removed" the "Missing" drive, the server has been running without that drive for 10 hours now without a single File Conflict or CRC error. I think I got it now.

 

I am currently copying files off the drive manually. Luckily, a lot of the files on it were duplicated, so I don't have to copy everything. To do this, I am using a very EXCELLENT file synchronizing program called "Beyond Compare". If you haven't tried this one, you should. It is BY FAR the best one out there (IMHO). It is not free, but you get what you pay for. I have been using it for several years and I have never seen a single bug. Very intuitive UI and is is WHS friendly. One thing I really like about it (for this purpose) is, when it cannot read a file, it doesn't just stop or crash. It skips the file, notes it in a savable Log and moves on to the next file. PERFECT for copying files off a drive with KNOWN errors on it.

 

As soon as I get what I can off the REAL bad drive, I am going to run Spinrite Level 5 on it and see if it can save the drive. First, however, I will run it through the WD tests to see if it craps out there. If it does, I might just return it for a replacement drive. Either way, the drive will be Spinrite'd PRIOR to going back into MY WHS - that's for sure. No more BLIND FAITH on these gargantuan Multi-TB drives. Too much time & trouble when they go out.

 

That's all I have time for right now. I will keep everyone posted - that is, if you are interested. Apparently, not much interest in my last post. You're the only one. If nobody replies (cares), I won't bother with it anymore.

Gary

Edited by GDog
Link to comment
Share on other sites

...I am using a very EXCELLENT file synchronizing program called "Beyond Compare". If you haven't tried this one, you should. It is BY FAR the best one out there ...

I also LOVE Beyond Compare. I've had two use scenarios. First was before WHS, I had two boxes with 4 200gb drives attached to each, and I would run Beyond Compare to back up one to the other. It had to option to delete orphans off of the second, and it worked really well. My second usage of it was to compare custome code files. We had a custom software application at work, and when we got to support, they would send a "standard code fix", that we would need to compare to our customizations, and combine into a final fix for us. Beyond Compare was able to scan each file, identify the differences and merge them as needed. NICE program.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...