Storage Spaces Failure

• August 10, 2012

I have been testing and using storage spaces on Sever Essentials since its release.  I am using it to back up two systems as well as for some Movies, Music, and Photos.  Mostly testing and very light use.  On Tuesday I got a bit of a surprise I did not expect.

Before I get into the details of the error, let me first explain the setup.  I have storage spaces set to “Parity” mode using 3 Western Digital Green Drives.  All three drives are connected to a Highpoint 2680 (non raid) and are pooled together in one 3.7T parity Volume using only Storage Spaces.  I did not load any of the highpoint software and the drives are configured in legacy mode.  This is hooked in a dedicated box using a Gigabyte H55 board with I3-530.  The drives have been configured this way since I put the box together and started using Server 2012 Essentials.

On Tuesday, I woke up to the whaling sound of an alarm.  When I ran to my office, I assumed that it would be a UPS however it turned out to be the built in alarm on the Highpoint card.  The odd thing is that the drivers for that card where not loaded nor was this card configured in RAID.  I was very surprised the alarm even went off, but mostly I was curious as to why.  I opened the dashboard and saw that I had an error in my storage spaces.  It told me that one or more drives was not healthy however it did not tell me which one or why.  Looking a bit further, I saw that it was the storage spaces drive but the information it provided was not very informative.  The warning only told me to go the Storage Spaces and mange my storage.  I did notice one time where it gave me the option to repair it and it ran a check disk on all three drives and found no errors.  Shortly after that it began a repair process which took many hours to complete.  When it was complete it still did not tell me anything other than it was completed (not much help).  Since check disk showed OK, and it was rebuilding, I was unhappy with not knowing what happened so during the rebuild process I decided to load the WebUI from Highpoint to view the health of each drive.  I looked the properties of each drive and all appeared to be good so it did not show any errors or health conditions.

The troubling thing about this is that it happed for no reason.  The system did not loose power and is hooked up to a UPS.  None of the drives showed any Smart or health issues.  The controller did not flag any errors of any type, the event viewer did not show any problems either, yet the volume found it necessary to do a 6 hour rebuild.  There are several things that concern me about this little issue.  First, there does not appear to be any way to know which drive is causing the problem.   Had this been a production system, I would not be able to identity which drive in the system is causing the problem.  Even if it flagged a drive as bad, there does seem to be anyway to track that back to a specific controller port or drive location.  Second, the unexplainable reason for the rebuild in the first place.  What caused it and why?  They should tell me something.  Lastly, since this has only been running for a bout 10 days, the robustness of storage spaces is now a big concern to me.  I have never seen this happen in two years of running RAID 24/7.  This might be a fluke, however it throws doubt for me as to how stable and robust this really is.  Granted I did not loose data, but this is still a bit too flaky for me to trust right now.  I definitely would wait for the first service pack if you are going to trust you data entirely to Storage spaces.

 

SNAG-0256  SNAG-0254  SNAG-0253    SNAG-0258  SNAG-0257  SNAG-0260

Share

Tags: , , , , , , , , ,

Category: BYOB Hardware

Comments (6)

Trackback URL | Comments RSS Feed

  1. C. Bratcher says:

    This doesn't sound good at all.

  2. John Wills says:

    I installed Drive Bender yesterday and was very pleased with the ease of install and the increase in write speeds.

  3. fredamn76 says:

    John: Do you run it on Windows 8?

  4. SusiBiker says:

    Hi pcdoc.
    I was having a similar problem with a HighPoint card, a PCI-X RocketRAID 2220. It would work for weeks, then the alarm would go off. I could not identify the drive that was apparently at fault, nor did any of the SMART information or Win7x64Ult Event Viewer. As it was my main data storage area, consisting of 8x2TB Samsung HD203WI drives.
    I eventually got really worried, and bought a Synology DS2411+, and set that up with those drives and four others in a Hybrid RAID Array with 2xHDD fail-over.
    Many weeks went by… I then retired the HighPoint 2220 as it looked like that was the cause of my woes.
    Then… Occasionally, one particular HD203WI, would drop out of the RAID. I took it out. Ran a quick SMART test – passed. Ran a full SMART test – passed again. Did a surface scan – passed with flying colours.
    Put it back in the Syno box. Quick and Full SMART tests – passed.
    Shrugged, and set to repairing the array. This took a couple of days. No problems.
    More weeks went by…
    The drive drops out again. More tests – passed.
    Swapped the drive to a different slot. Ran tests – all passed.
    More weeks…
    Drive drops out again.
    Conclusions…?
    1. I write too much.
    2. The drive is poorly.
    3. I can't return the drive as broken, because it passes all tests, every time!
    4. Bought another drive, and used the dodgy one as a "backup" for non-critical files.
    5. HighPoint 2220 back in action. for about two months – no problems.
    6. You may have an intermittent drive failure that only management software/managed hardware can track.
    Regards,
    Susi xx

  5. Jeff Hare says:

    I wonder whether this might be an issue with using/not drives not intended for RAID. There could be TLER (time limited error recovery) settings that are too long for raid and too much delay can cause them to drop out of the array.

    On the flip side, using RAID drives in a desktop (or Gasp! A/V (DVR/Tivo) drives TLER=0 where pretty close is good enough) might give up too soon and assume the RAID controller will correct it.

    Bottom line? There's a difference in the firmware settings between different classes of drive that make them suitable or unsuitable for reliable use in different hardware / software environments.. So, your drives may be perfectly fine but have firmware settings that aren't suitable for the hardware/software using them.

  6. Steve says:

    You aren't alone. This describes my long bout of testing with Storage Spaces.
    http://blogs.technet.com/b/mspfe/archive/2013/02/

×

Shop and help out Home Server Show. Drag this box to your BookMarks Bar. Amazon