Jump to content
RESET Forums (homeservershow.com)
JazJon

Drives Dropping? important fix for San Digital multi-bay eSata enclosure users!

Recommended Posts

JazJon

I wanted to share an important fix for San Digital multi-bay eSata enclosure users!

 

Both my San Digital TR5M-B and TR5M-PB 5-bay eSata enclosures would randomly DROP out of Windows when I was moving a lot of files around. The only way to get them back was to reboot the server. (power cycling the enclosure didn't fix it) Well it turns out there is a simple fix all along. I wish I would have researched this more 12 months ago! The below fix applies to the 4, 5, 8, etc bay enclosures

 

Here's the enclosure I use the most

http://www.sansdigital.com/towerraid-plus/tr5mbp.html

 

The fix:

http://www.sansdigital.com/index.php?option=com_kunena&Itemid=190&func=view&catid=10&id=4250#4296

 

Problem / question: Currently the disk system is set up as just single pass through disks. Anytime there is extremely heavy I/O all the disks drop out and the controller cannot see them anymore. Restarting the TowerRAID system and unplugging the cables does not help, only a reboot of the server. This is getting frustrating as when all the drives were connected to a 3ware 9550sx-12 they never had an issue. Is there another HBA I can try to solve this issue? I do not want to use raid, just single disk pass through.

 

SOLVED

Turn off "Link State Power Management" under PCI Express in the power options menu. (in control panel)

Share this post


Link to post
Share on other sites
pcdoc

Thanks for the update. I know that there are a few that use SD enclosures

Share this post


Link to post
Share on other sites
Renny

Thanks Jaz. I have not had this problem with my TRMB4 but I will implement the fix anyhow.

Share this post


Link to post
Share on other sites
JazJon

I've had mystery eSata drops on my EX495 ever since WHS V1 and Then still on WHS 2011. I finally found the fix and am happy.

 

I let Matt know about this, he's the dev from the SMART (monitoring) WHS add-in. Here is his interesting response.

 

"You’ve made some interesting discoveries/observations here. It’s intriguing to me that you could reproduce the problem by subjecting the enclosure to intense I/O, or by allowing SMART tools to run against it under minimal I/O. The link state power management problem seems to affect a lot of different hardware. I have an OCZ Octane 128GB SSD in both my work-issued laptop (by day I’m a Microsoft SharePoint consultant for HP) and my personal laptop. Both exhibited a peculiar behavior of freezing for 30 seconds at seemingly random times during the day. The system would always become responsive, so it was more of an annoyance than anything. The system event log would show an error along the lines of “the device \\Harddisk0\ did not respond within the timeout period.”

 

In both cases the guilty party was an Intel ICH SATA/RAID controller and the fix was to go into the Registry and turn off the—you guessed it—link state power management!

 

According to the SATA specification, there are many different commands you can send to a device—things like DEVICE IDENTIFY, SMART READ DATA, etc. If a device doesn’t recognize a command, the device should return an error code so that the program that issued the command knows the operation failed.

 

Something I found out in developing WindowSMART and Home Server SMART, particularly when I got to the part where I started supporting device self-testing (short, extended, conveyance), was that there are a LOT of devices that don’t conform to the specification. Rather than returning an error code, the device just seems to die silently.

 

As an example, you can send a command to the device and it’ll return—as a number of minutes—the length of each test it supports. If the return value is zero, the test is not supported by the device. In the example of the OCZ Octane 128, this particular SSD doesn’t support any of the tests. Of course, in the UI, I dim the button to allow you to run a test if the device doesn’t support it. Out of curiosity, I did want to see what happens if I send the test command to the OCZ Octane. Doing to effectively renders the laptop inoperative. The resolution is to do a full power cycle on the laptop, and the device returns to normal operating.

 

The moral of the story here is that I’m guessing the enclosure and/or some (or all) of the devices within it don’t support the link state power management command, so if the host sends that command to the device(s), they die silently and stop responding to commands. And Windows eventually detects they’re no longer responding and finally takes them offline.

 

By the same token, when a SMART tool sends commands to the devices, it’s possible one of them doesn’t recognize a command and it locks up, and seemingly all of them follow suit. Probably because commands start queuing up in the I/O controller due to the locked-up device. And eventually the whole controller goes down.

 

Matt"

Edited by JazJon

Share this post


Link to post
Share on other sites
ikon

This is verrry interesting info. Looks like everyone should just disable Link State Power Management by default.

Share this post


Link to post
Share on other sites
JazJon

I am trying to have the ultimate movie library via media center/extenders etc. My roommates are constantly telling me the video shares are not working, and the whole reason was this one stupid Link State setting! I dug in deep into google and there it was, finally answers.

 

I guess I should have posted this in the following area below but whoops. (is it ok where it's at?)

Share this post


Link to post
Share on other sites
kam

I had the same problems as you . Turn off "Link State Power Management" didn't works for me.

 

Since I had already replaced all of the PSU come with the enclosure so I am sure the problem had nothing to do with PSU. 
At first, I guess it’s due to bad esata cable so I spent $60 USD to purchase different brand esata cable , including C2G, startech.

 

However, the problems remain the same. I think it’s likely there’s a compatible issue between controller chipset. So I spent another $200 USD to purchase different esata controller card using different chipsets, including Marvell, silicon Image, Asmedia, Jmicron. I bought all of them and test them all one by one but none of them can fix the problems.

 

I am glad to tell you I finally found the root caused and get them fixed.

 

You can simple Google "Ultimate fix for Sansdigital TowerRAID Enclosure" for the solution and I had explain the root cause in details.

 

Direct Link: http://kamserver.com/2018/04/24/ultimate-fix-for-sansdigital-towerraid-enclosure/

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×