Jump to content
RESET Forums (homeservershow.com)
dude1922@outlook.com

B120i RAID 1 Logical drive failing inexplicably

Recommended Posts

dude1922@outlook.com

I have microserver gen8 with two HGST deskstar 6TB HDD in RAID 1 for all data storage and client backup, plus Crucial 120GB SSD for WHS 2011. Also I have applied the latest SPP (P01456_001_spp-2017.10.1-SPP2017101.2017_1027.10)

 

Everything has been working just fine for past year, but recently the server just decides that logical drive for the RAID 1 with 6TB drives is failing at seemingly random times. Powering off and restarting the server fixes it, but a few days later it fails again.

 

I did move all data from the 6TB data drives to other drives I have and then proceeded to change the server to ACHI mode and run a battery of tests on each of the 6TB HDD, including full reformatting, using both Seagate's Seatools and HGST's WinDFT (windows drive fitness). No issues were found after testing them for days.

 

Since then I switched the server back to RAID mode, rebuilt the RAID 1 array, enabled Bitlocker in the entire drive. Nothing failed during that time, but later one during client back the logical drive failed again. I can't find any clue what can be the problem. I don't believe the HGST HDD are the problem since they passed the battery of tests.

 

What could this be?

Share this post


Link to post
Share on other sites
schoondoggy

Have you checked the log files in SSA to see if there are errors showing?

Share this post


Link to post
Share on other sites
dude1922@outlook.com

There is a lot of information there in the logs. It just happened today past 5AM, here are the logs during that time period. It looks like the controller sees the HHD as unplugged, but that doesn't make sense.

 

Log Name:      System
Source:        Cissesrv
Date:          1/21/2018 5:32:19 AM
Event ID:      24582
Task Category: None
Level:         Information
Description:
A SATA physical drive located in bay 3 was inserted. The drive can be found in box 0 which is attached  to port 3I of array controller B120i [Embedded].
 
Log Name:      System
Source:        Cissesrv
Date:          1/21/2018 5:32:19 AM
Event ID:      24595
Task Category: None
Level:         Error
Computer:      HPSERVER
Description:
A drive failure notification has been received for the SATA physical drive located in bay 3.  This drive can be found in box 0 which is connected to port 3I of the array controller B120i [Embedded].  The failure reason received from the HP Smart Array firmware is: REMOVED_IN_HOT_PLUG.

Event Xml:
Log Name:      System
Source:        Cissesrv
Date:          1/21/2018 5:32:19 AM
Event ID:      24582
Task Category: None
Level:         Information
Description:
A SATA physical drive located in bay 3 was removed. The drive can be found in box 0 which is attached  to port 3I of array controller B120i [Embedded].
 
Log Name:      System
Source:        Cissesrv
Date:          1/21/2018 5:19:49 AM
Event ID:      24582
Task Category: None
Level:         Information
Description:
A SATA physical drive located in bay 4 was inserted. The drive can be found in box 0 which is attached  to port 4I of array controller B120i [Embedded].
 
Log Name:      System
Source:        Cissesrv
Date:          1/21/2018 5:19:49 AM
Event ID:      24601
Task Category: None
Level:         Information
Description:
Logical drive 2 configured on array controller B120i [Embedded] is in a failed state but has had one or  more drive replacements and is now ready to change to a status of "OK".  However, this status change will not occur until the logical drive is re-enabled. Please re-enable the logical drive via the HP Array Configuration Utility, HP Smart Storage Administrator or by rebooting the system.
 
Log Name:      System
Source:        Cissesrv
Date:          1/21/2018 5:19:49 AM
Event ID:      24582
Task Category: None
Level:         Information
Description:
A SATA physical drive located in bay 3 was inserted. The drive can be found in box 0 which is attached  to port 3I of array controller B120i [Embedded].
 
Log Name:      System
Source:        Cissesrv
Date:          1/21/2018 5:19:48 AM
Event ID:      24600
Task Category: None
Level:         Error
Description:
Logical drive 2 of array controller B120i [Embedded] has encountered a status change from: 
Status: INTERIM RECOVERY MODE 
to 
Status: FAILED
 
Log Name:      System
Source:        Cissesrv
Date:          1/21/2018 5:19:48 AM
Event ID:      24595
Task Category: None
Level:         Error
Description:
A drive failure notification has been received for the SATA physical drive located in bay 4.  This drive can be found in box 0 which is connected to port 4I of the array controller B120i [Embedded].  The failure reason received from the HP Smart Array firmware is: REMOVED_IN_HOT_PLUG.
 
Log Name:      System
Source:        Cissesrv
Date:          1/21/2018 5:19:48 AM
Event ID:      24582
Task Category: None
Level:         Information
Description:
A SATA physical drive located in bay 4 was removed. The drive can be found in box 0 which is attached  to port 4I of array controller B120i [Embedded].
 
Log Name:      System
Source:        Cissesrv
Date:          1/21/2018 5:19:48 AM
Event ID:      24605
Task Category: None
Level:         Warning
Description:
Due to an unrecoverable write error, the recovery of logical drive 2 configured on array controller B120i [Embedded] was  aborted while rebuilding a physical drive. 
The physical drive which was being rebuilt is located in bay 0 of box 0 which is connected to port ?? of  array controller B120i [Embedded] .
The physical drive that reported the write error is located in bay 0 of box 0 which is connected to  port ?? of array controller B120i [Embedded].
 
Log Name:      System
Source:        Cissesrv
Date:          1/21/2018 5:19:48 AM
Event ID:      24598
Task Category: None
Level:         Information
Description:
Logical drive 2 of array controller B120i [Embedded] has encountered a status change from: 
Status: RECOVERING 
to 
Status: INTERIM RECOVERY MODE
 
Log Name:      System
Source:        Cissesrv
Date:          1/21/2018 5:19:48 AM
Event ID:      24595
Task Category: None
Level:         Error
Description:
A drive failure notification has been received for the SATA physical drive located in bay 3.  This drive can be found in box 0 which is connected to port 3I of the array controller B120i [Embedded].  The failure reason received from the HP Smart Array firmware is: REMOVED_IN_HOT_PLUG.
 
Log Name:      System
Source:        Cissesrv
Date:          1/21/2018 5:19:48 AM
Event ID:      24582
Task Category: None
Level:         Information
Description:
A SATA physical drive located in bay 3 was removed. The drive can be found in box 0 which is attached  to port 3I of array controller B120i [Embedded].
 
Log Name:      System
Source:        Cissesrv
Date:          1/21/2018 5:19:48 AM
Event ID:      24598
Task Category: None
Level:         Information
Description:
Logical drive 2 of array controller B120i [Embedded] has encountered a status change from: 
Status: READY FOR RECOVERY 
to 
Status: RECOVERING
 
Log Name:      System
Source:        Cissesrv
Date:          1/21/2018 5:19:48 AM
Event ID:      24598
Task Category: None
Level:         Information
Description:
Logical drive 2 of array controller B120i [Embedded] has encountered a status change from: 
Status: INTERIM RECOVERY MODE 
to 
Status: READY FOR RECOVERY
 
Log Name:      System
Source:        Cissesrv
Date:          1/21/2018 5:19:48 AM
Event ID:      24582
Task Category: None
Level:         Information
Description:
A SATA physical drive located in bay 3 was inserted. The drive can be found in box 0 which is attached  to port 3I of array controller B120i [Embedded].
 
Log Name:      System
Source:        Cissesrv
Date:          1/21/2018 5:19:18 AM
Event ID:      24598
Task Category: None
Level:         Information
Description:
Logical drive 2 of array controller B120i [Embedded] has encountered a status change from: 
Status: OK 
to 
Status: INTERIM RECOVERY MODE
 
Log Name:      System
Source:        Cissesrv
Date:          1/21/2018 5:19:18 AM
Event ID:      24595
Task Category: None
Level:         Error
Description:
A drive failure notification has been received for the SATA physical drive located in bay 3.  This drive can be found in box 0 which is connected to port 3I of the array controller B120i [Embedded].  The failure reason received from the HP Smart Array firmware is: REMOVED_IN_HOT_PLUG.
 
Log Name:      System
Source:        Cissesrv
Date:          1/21/2018 5:19:18 AM
Event ID:      24582
Task Category: None
Level:         Information
Description:
A SATA physical drive located in bay 3 was removed. The drive can be found in box 0 which is attached  to port 3I of array controller B120i [Embedded].

 

Share this post


Link to post
Share on other sites
schoondoggy

Have you checked the power and data cables? Is the failing drive always in the same slot?

Share this post


Link to post
Share on other sites
dude1922@outlook.com

I tried HDDs first on slots 1 and 2 (SSD on slot 4) for about a week, then moved them to slots 3 and 4.(SSD on slot 1) for another week. Same results in both arrangements. Note during all this time I had no failures with SSD.

 

Right now I am running in ACHI mode and putting everything in in one 6TB drive, and using the other 6TB as daily backup for the first. It is not ideal setup by I hope it keeps going without failures.

Share this post


Link to post
Share on other sites
dude1922@outlook.com

Last night HDD failed also in ACHI mode during backup of one of PCs with 2TB of data. I am puzzled on why testing HDDs using seatools and windft running for days time don't detect any failure. 

 

 

Share this post


Link to post
Share on other sites
schoondoggy

What slot was the drive in? Have you checked the SATA cable connection at the motherboard? I have seen several cases where a bit of stress on the cable connection at the motherboard causes a drive to lose connection. When troubleshooting problems like this it can be helpful to keep track of which drive is failing and which slot is failing.

  • Like 1

Share this post


Link to post
Share on other sites
dude1922@outlook.com

I finally got to the bottom of this problem. I had 1-to-3 power splitter cable in between the PSU and the drive cage. I had this splitter so I could power additional hard drive in the cdrom space. I removed the splitter cable and for the past 3 weeks it has been running reliably.

 

Thanks schoondoggy for the guidance, it really helped me here.

  • Like 1
  • Thanks 1

Share this post


Link to post
Share on other sites
Dvampoul
On 2/26/2018 at 8:49 PM, dude1922@outlook.com said:

I finally got to the bottom of this problem. I had 1-to-3 power splitter cable in between the PSU and the drive cage. I had this splitter so I could power additional hard drive in the cdrom space. I removed the splitter cable and for the past 3 weeks it has been running reliably.

 

Thanks schoondoggy for the guidance, it really helped me here.

 

I Had the same problem after adding a PSU Splitter.

 

I had up to 40+ hours of troubleshooting, Formating Disks, Nuke Disks, Changing Array settings, Scrubbing RAID metadata DoD bla bla bla and the problem was the mentioned one!!!

 

Thanks!

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...