Jump to content
RESET Forums (homeservershow.com)

A few setup questions


CaptainFred
 Share

Recommended Posts

Why do you say that about BackBlaze? It looks as if they have a lot of drives and therefore have lots of hardware in use to analyse for failures and errors. Although as someone points out surely they should be using Enterprise grade drives not desktop drives?! Bit odd that they don't...if that's right that they don't. Unless profits are that tight.

 

The entire Backblaze operation model revolves around using cheap consumer grade components including HDDs to lower the cost of their Storage pods as much as possible. With the appropriate amount of redundancy you can use consumer grade HDDs without much of a problem as proven by Backblaze. Enterprise disks are x2/x3 the price of a consumer grade with the only notable difference being the MTTF, if your setup allows you to lose twice as much disks, twice as fast and you are replacing them under warranty as Backblaze does - you are golden.

Link to comment
Share on other sites

The Google document is a good one and so is this:

http://static.usenix.org/events/fast07/tech/schroeder/schroeder.pdf

 

BackBlaze info is anecdotal at best. I do not put much value in their findings. It has been discussed in other threads.

 

The B120i is a firmware based RAID using the Intel SATA controller, but I would not lump it in with the rest of the firmware RAID's. It is fully compatible with all of the HP SmartArray controllers as it uses the same code. It is limited in performance do to the load it puts on the CPU, but it is a much better firmware RAID solution than the other firmware RAIDs.

 

 

How can you possibly say that the Google document is good, considering their data is not available and the results they report opaque of make and model of drive, and yet call Backblaze data "anecdotal"? A study made based on what seems to add up to around 50,000 drives with all of the data available for scrutiny is anything but anecdotal. If anything it is much more valid and independently validatable than the opaque data from Google's study.

 

My personal data based on the drives at various clients and on my owh systems numberingin the low hundreds of drives could be considered anecdotal - but it, oddly enough approximates the Backblaze numbers. Failure rate of my 1TB Barracudas (ST31000333AS, ST31000340AS, many of the replacements were ST31000524AS and ST31000528AS) which I got 7 or so years ago in the last month they came with 5 year warranties was about 120% within the warranty period of those 5 years. On other words, for 10 drives originally purchased I have had 12 replacements. Three of the original drives haven't failed yet, so the other 7 have had 12 replacements between them. So they failed ones, got replaced, and most of the replacements then failed and got replaced within 5 years of the original purchase date. THAT you can call anecdotal, but not when somebody has actually look at a sample size of about 50,000 drives and had the balls to name names, unlike Google.

 

 

Why do you say that about BackBlaze? It looks as if they have a lot of drives and therefore have lots of hardware in use to analyse for failures and errors. Although as someone points out surely they should be using Enterprise grade drives not desktop drives?! Bit odd that they don't...if that's right that they don't. Unless profits are that tight.

 

Well, they actually did do a more limited analysis of their enterprise drives in other servers (not in their bulk storage pods):

https://www.backblaze.com/blog/enterprise-drive-reliability/

 

Granted, the sample size on their enterprise drives was nearly 40x smaller, but it was still based on enough drives for the findings to at the very least not be dismissable.

 

Just because it says "enterprise" on it doesn't make it more reliable.

 

There is no consistency in anything BackBlaze does. You can not formulate supportable conclusions when your samples come from many different sources. 

In my professional experience, higher than average fail rates tend to be related to mishandling during installation or use. We don't know much about their assembly process, but their 'pods' are very densely packed, potential heat issue:

https://en.wikipedia.org/wiki/Backblaze#/media/File:StoragePod.jpg

The drives appear to be free standing, not physically mounted, potential vibration issues:

https://www.backblaze.com/blog/wp-content/uploads/2009/08/backblaze-storage-pod-partially-assembled-large.jpg

During the drive shortage they sourced drives from retail channels and 'shucked' USB drives. Most retail and all USB drives are not designed for array use. Removing a drive from and external case can be very hard on the drive.

 

What do you mean by "different sources"? Different suppliers? From what they were saying they buy drives retail, so damage in transit is almost certainly going to be pretty evenly distributed between makes/models. All of the drives in question are used in the same environments which means the comparison as valid as is theoretically possible. Also, since all of their pods are put together the same way by the same people regardless of the drives used, I don't see how you can argue the massive yet consistent failure across different models of the same manufacturer, with a large discrepancy in failure rates between drives by different manufacturers could possibly be significantly attributable to mishandling during the assembly process. Are you suggesting they deliberately drop the Seagates a few times before fitting it into the chassis?

 

The point is that all drives are treated equally, and regardless they still exhibit very distinctly different failure rates between drives of different manufacturers.

Also, they don't use RAID "arrays", they use custom reed-solomon encoding.

But since you mentioned RAID, the main and only thing that distinguishes disks sold for RAID use and desktop use is that the desktop ones have had TLER (Time Limited Error Recovery) removed from their feature set specifically to sabotage their use in an array. HGST are the only manufacturer that hasn't removed TLER from their drives. WD, OTOH, sell you the same dog slow 5400 rpm disk labeled "red" (more expensive) or "green" based purely on the former having firmware that hasn't had the TLER feature removed, and the latter that instead of TLER insanely makes the disk disappear for minutes at the time trying to read back a sector that has gone bad (normal TLER timeout is 7 seconds, which translates to 700 read attempts in a 5400 rpm disk - if the sector hasn't come back as readable after 700 attempts, it's not going to come back as readable on the following 70,000 either).

 

 

'Designed' may not be the best choice of words on my part. It is possible to find Desktop drives, NAS drives and USB drives that appear to be identical builds, but the firmware is different on each to handle the specific needs of each application. More of my concern in using desktop or USB drives in these types of applications and how it deals with sleep/head park, error handling and alike.

 

The firmware difference is mainly in disabling TLER on everything but the more expensive "NAS" grade drives.

Link to comment
Share on other sites

How can you possibly say that the Google document is good, considering their data is not available and the results they report opaque of make and model of drive, and yet call Backblaze data "anecdotal"? A study made based on what seems to add up to around 50,000 drives with all of the data available for scrutiny is anything but anecdotal. If anything it is much more valid and independently validatable than the opaque data from Google's study.

 

My personal data based on the drives at various clients and on my owh systems numberingin the low hundreds of drives could be considered anecdotal - but it, oddly enough approximates the Backblaze numbers. Failure rate of my 1TB Barracudas (ST31000333AS, ST31000340AS, many of the replacements were ST31000524AS and ST31000528AS) which I got 7 or so years ago in the last month they came with 5 year warranties was about 120% within the warranty period of those 5 years. On other words, for 10 drives originally purchased I have had 12 replacements. Three of the original drives haven't failed yet, so the other 7 have had 12 replacements between them. So they failed ones, got replaced, and most of the replacements then failed and got replaced within 5 years of the original purchase date. THAT you can call anecdotal, but not when somebody has actually look at a sample size of about 50,000 drives and had the balls to name names, unlike Google.

 

 

 

Well, they actually did do a more limited analysis of their enterprise drives in other servers (not in their bulk storage pods):

https://www.backblaze.com/blog/enterprise-drive-reliability/

 

Granted, the sample size on their enterprise drives was nearly 40x smaller, but it was still based on enough drives for the findings to at the very least not be dismissable.

 

Just because it says "enterprise" on it doesn't make it more reliable.

 

 

What do you mean by "different sources"? Different suppliers? From what they were saying they buy drives retail, so damage in transit is almost certainly going to be pretty evenly distributed between makes/models. All of the drives in question are used in the same environments which means the comparison as valid as is theoretically possible. Also, since all of their pods are put together the same way by the same people regardless of the drives used, I don't see how you can argue the massive yet consistent failure across different models of the same manufacturer, with a large discrepancy in failure rates between drives by different manufacturers could possibly be significantly attributable to mishandling during the assembly process. Are you suggesting they deliberately drop the Seagates a few times before fitting it into the chassis?

 

The point is that all drives are treated equally, and regardless they still exhibit very distinctly different failure rates between drives of different manufacturers.

Also, they don't use RAID "arrays", they use custom reed-solomon encoding.

But since you mentioned RAID, the main and only thing that distinguishes disks sold for RAID use and desktop use is that the desktop ones have had TLER (Time Limited Error Recovery) removed from their feature set specifically to sabotage their use in an array. HGST are the only manufacturer that hasn't removed TLER from their drives. WD, OTOH, sell you the same dog slow 5400 rpm disk labeled "red" (more expensive) or "green" based purely on the former having firmware that hasn't had the TLER feature removed, and the latter that instead of TLER insanely makes the disk disappear for minutes at the time trying to read back a sector that has gone bad (normal TLER timeout is 7 seconds, which translates to 700 read attempts in a 5400 rpm disk - if the sector hasn't come back as readable after 700 attempts, it's not going to come back as readable on the following 70,000 either).

 

 

 

The firmware difference is mainly in disabling TLER on everything but the more expensive "NAS" grade drives.

In my opinion, the Google paper is good information, because they do not focus on drive vendors, they look at SMART info on why drives fail and they found that much of the time drives fail without warning. At the time this was good info as many users expected that SMART would always advise them of an eminent failure. 

Because of the way BB procures drives there is no way to determine where the drives came from. Two drives may have the same base part number, but may be designed for different use cases or assembled in different factories.

I am not saying that there are no issues with this family of Seagate drives. Obviously BB had a lot of bad ones, as have reviewers on Amazon and NewEgg. I believe it is difficult to draw conclusions when there is no way to research the origination of the drives. Also, as I have stated before, regardless of the drive vendor, using consumer drives in enterprise/professional environments voids the warranty.

Your thought that all drives were handled the same, my reply would be, we don't know that for sure. Also, some drive models handle abuse better than others. We don't know much about their manufacturing process, but we do have a picture of drives piled up in a avalanche ready stack, how many of these hit the floor?

https://gigaom.com/2012/10/09/how-to-add-5-5-petabytes-and-get-banned-from-costco-during-a-hard-drive-crisis/

I am bit suspicious of the drive handling expertise when I see a trunk full of drives.

https://www.backblaze.com/blog/backblaze_drive_farming/

I am not suggesting that they deliberately drop Seagates, but Seagate uses different factories. Perhaps drives from a specific factory are more susceptible to shock or perhaps the drives from a specific factory have a high fail rate. We will never know, because BB does not drill down into the data that far. Also we dont know that all drives are handled the same. Some are removed from USB enclosures, some from retail packing and some from case packs. After that point I would assume they are all handled the same.

TLER is not the only thing that is different in drive firmware. WD sells Green drives as desktop drives and USB externals. I believe these two drives will have different firmware that is specific to their applications. The differences may be small, but they are different.

My big frustration with BB is if the went a few steps further they could have offered more meaningful information. Here is a review for Amazon, by using more detailed drive info the author was able to determine a higher fail rate on a certain build of Seagate 2TB drives. Although his sample size is small and he does not claim to be taking a scientific approach. His detective work is sound. The same part number from Seagate could have different numbers of platters and be built at different factories. This is the drill down I would expect to see from BB:

http://www.amazon.com/gp/customer-reviews/R205L2P2CHJ02D/ref=cm_cr_pr_viewpnt?ie=UTF8&ASIN=B005T3GRN2#R205L2P2CHJ02D

 

I do believe that this family of drives from Seagate has a higher than average fail rate, but we need more information to understand why.

Link to comment
Share on other sites

In my opinion, the Google paper is good information, because they do not focus on drive vendors, they look at SMART info on why drives fail and they found that much of the time drives fail without warning. At the time this was good info as many users expected that SMART would always advise them of an eminent failure. 

Because of the way BB procures drives there is no way to determine where the drives came from. Two drives may have the same base part number, but may be designed for different use cases or assembled in different factories.

I am not saying that there are no issues with this family of Seagate drives. Obviously BB had a lot of bad ones, as have reviewers on Amazon and NewEgg. I believe it is difficult to draw conclusions when there is no way to research the origination of the drives. Also, as I have stated before, regardless of the drive vendor, using consumer drives in enterprise/professional environments voids the warranty.

Your thought that all drives were handled the same, my reply would be, we don't know that for sure. Also, some drive models handle abuse better than others. We don't know much about their manufacturing process, but we do have a picture of drives piled up in a avalanche ready stack, how many of these hit the floor?

https://gigaom.com/2012/10/09/how-to-add-5-5-petabytes-and-get-banned-from-costco-during-a-hard-drive-crisis/

I am bit suspicious of the drive handling expertise when I see a trunk full of drives.

https://www.backblaze.com/blog/backblaze_drive_farming/

I am not suggesting that they deliberately drop Seagates, but Seagate uses different factories. Perhaps drives from a specific factory are more susceptible to shock or perhaps the drives from a specific factory have a high fail rate. We will never know, because BB does not drill down into the data that far. Also we dont know that all drives are handled the same. Some are removed from USB enclosures, some from retail packing and some from case packs. After that point I would assume they are all handled the same.

TLER is not the only thing that is different in drive firmware. WD sells Green drives as desktop drives and USB externals. I believe these two drives will have different firmware that is specific to their applications. The differences may be small, but they are different.

My big frustration with BB is if the went a few steps further they could have offered more meaningful information. Here is a review for Amazon, by using more detailed drive info the author was able to determine a higher fail rate on a certain build of Seagate 2TB drives. Although his sample size is small and he does not claim to be taking a scientific approach. His detective work is sound. The same part number from Seagate could have different numbers of platters and be built at different factories. This is the drill down I would expect to see from BB:

http://www.amazon.com/gp/customer-reviews/R205L2P2CHJ02D/ref=cm_cr_pr_viewpnt?ie=UTF8&ASIN=B005T3GRN2#R205L2P2CHJ02D

 

I do believe that this family of drives from Seagate has a higher than average fail rate, but we need more information to understand why.

 

IMO it doesn't matter what factory or channel the drives come from, over enough drives an overall overarching trend emerges, which is exactly what we want to know about. Overall it doesn't matter if individual drives were handled the same, it is the overall trend over 10s of thousands of drives that matters, and this clearly emerges. As a buyer, I don't have a way of telling at ordering time any more about the drives than make and model, so statistics aggregated by this information is exactly what I want to know. It would take extraordinarily bad luck for all models of a particular manufacturer to be handled specially differently from all others. You can see from the statistics that all HGST models have about half the failure rate of all WD models, which have aobut 1/3 of the failure rate of all Seagate models. That kind of pattern would be extraordinarily implausible as a coincidence. The numbers really do seem to show that the most unreliable WD and HGST drives are several times more reliable than the most reliable model of Seagate drive. This kind of consistent trend is quite difficult to argue with, even if there is no data to drill down to the assembly plant level. And as I said before, it's not like I can place an order on, say, amazon.co.uk for 20 4TB disks and then demand that they are all from a specific factory but not another. So the overall trend is really the only thing that is useful, especially when it is that pronounced.

 

And on the subject of a trunk full of drives, do you really think that parcels containing drives get handled any differently by couriers than any other parcel? That is seriously wishful thinking. Manufacturers pack disks appropriately to survive all but the most eggreigiously bad handling.

 

Where do you get the idea that any one particular use of retail drives voids the warranty? Where does it say so in the warranty statement of any drive manufacturer? The main difference between enterprise and consumer drives is the length of warranty offered, and in some cases the lack of TLER. Plus some models (e.g. 15,000 rpm ones) aren't available in non enterprise variety.

Link to comment
Share on other sites

IMO it doesn't matter what factory or channel the drives come from, over enough drives an overall overarching trend emerges, which is exactly what we want to know about. Overall it doesn't matter if individual drives were handled the same, it is the overall trend over 10s of thousands of drives that matters, and this clearly emerges. As a buyer, I don't have a way of telling at ordering time any more about the drives than make and model, so statistics aggregated by this information is exactly what I want to know. It would take extraordinarily bad luck for all models of a particular manufacturer to be handled specially differently from all others. You can see from the statistics that all HGST models have about half the failure rate of all WD models, which have aobut 1/3 of the failure rate of all Seagate models. That kind of pattern would be extraordinarily implausible as a coincidence. The numbers really do seem to show that the most unreliable WD and HGST drives are several times more reliable than the most reliable model of Seagate drive. This kind of consistent trend is quite difficult to argue with, even if there is no data to drill down to the assembly plant level. And as I said before, it's not like I can place an order on, say, amazon.co.uk for 20 4TB disks and then demand that they are all from a specific factory but not another. So the overall trend is really the only thing that is useful, especially when it is that pronounced.

 

And on the subject of a trunk full of drives, do you really think that parcels containing drives get handled any differently by couriers than any other parcel? That is seriously wishful thinking. Manufacturers pack disks appropriately to survive all but the most eggreigiously bad handling.

 

Where do you get the idea that any one particular use of retail drives voids the warranty? Where does it say so in the warranty statement of any drive manufacturer? The main difference between enterprise and consumer drives is the length of warranty offered, and in some cases the lack of TLER. Plus some models (e.g. 15,000 rpm ones) aren't available in non enterprise variety.

If the BB study gives you the info you want, good for you. I prefer to know why something is failing.

Under what the warranty does not cover- 'Commercial use':

http://www.seagate.com/support/warranty-and-replacements/limited-consumer-warranty/

And this one under warranty limitations- 'The product was not used for its intended function (for example, desktop drives used in an Enterprise environment'

http://support.wdc.com/Warranty/warrantyPolicy.aspx

 

There are bigger differences between Enterprise and Consumer drives than warranty and TLER

  • Like 1
Link to comment
Share on other sites

If the BB study gives you the info you want, good for you. I prefer to know why something is failing.

Under what the warranty does not cover- 'Commercial use':

http://www.seagate.com/support/warranty-and-replacements/limited-consumer-warranty/

And this one under warranty limitations- 'The product was not used for its intended function (for example, desktop drives used in an Enterprise environment'

http://support.wdc.com/Warranty/warrantyPolicy.aspx

 

There are bigger differences between Enterprise and Consumer drives than warranty and TLER

 

 

I care a lot less about for what reason outside of my control a disk failed than I do about buying a disk that is least likely to fail based on the data available to me at purchase time. If your buying strategy is different, I would love to hear the reasoning and the data you use to drive your disk purchase decision. Before all manufacturers equalized their warranties I used to by on longest warranty, and factored the cost as if I was leasing the drive for the warranty term for the purchase cost. Now that they all offer the same length warranties, the only other things you can buy on are cost are reliability. But since costs are also reasonable similar...

 

Those warranty clauses you mentioned are ultimately unprovable and unenforceable. I have a microserver that is on 24/7, serving no commercial purpose. How do you reckon they are going to tell that apart from commercial 24/7 use? Seagate have certainly never turned down my warranty claim due to uptime of approximately the age of the drive.

 

Also, HGST warranty terms have no such exclusions - as if we needed any more reasons to prefer them.

 

As for there being bigger differences yes, there is one - 2-3x higher price. But there is evidence that there aren't any that demonstrably improve reliability, and no evidence to the contrary.

Edited by gordan
Link to comment
Share on other sites

I care a lot less about for what reason outside of my control a disk failed than I do about buying a disk that is least likely to fail based on the data available to me at purchase time. If your buying strategy is different, I would love to hear the reasoning and the data you use to drive your disk purchase decision. Before all manufacturers equalized their warranties I used to by on longest warranty, and factored the cost as if I was leasing the drive for the warranty term for the purchase cost. Now that they all offer the same length warranties, the only other things you can buy on are cost are reliability. But since costs are also reasonable similar...

 

Those warranty clauses you mentioned are ultimately unprovable and unenforceable. I have a microserver that is on 24/7, serving no commercial purpose. How do you reckon they are going to tell that apart from commercial 24/7 use? Seagate have certainly never turned down my warranty claim due to uptime of approximately the age of the drive.

 

Also, HGST warranty terms have no such exclusions - as if we needed any more reasons to prefer them.

 

As for there being bigger differences yes, there is one - 2-3x higher price. But there is evidence that there aren't any that demonstrably improve reliability, and no evidence to the contrary.

My purchasing strategy is buying the right drive for the application, NAS rated drives for RAID controller application, Desktop drives for desktops, and so on. I compare price, performance and power consumption. I have had great luck with refurbished drives, especially enterprise versions.

Link to comment
Share on other sites

My purchasing strategy is buying the right drive for the application, NAS rated drives for RAID controller application, Desktop drives for desktops, and so on. I compare price, performance and power consumption. I have had great luck with refurbished drives, especially enterprise versions.

 

These days I only buy HGST 3.5" and 2.5" and Toshiba 2.5" drives. I also only buy disks with TLER for any use (HGST and 2.5" Toshibas all have it). Seagate I don't buy due to astronomical failure rates in any application I used them in, even though, credit to them, their SMART attributes don't lie. WD and Samsung are reasobably reliable but I don't buy them any more because their disks outright lie about their SMART statistics (e.g. on many if not all models pending sectors disappear after overwriting, and reallocated sector counts don't go up, implying that either the disk reused a questionable sector or it is outright lying about reallocated counts, either way not good). Toshiba drives have been reasonably reliable for me, and they are honest in their SMART counters. HGST have been the most reliable and have honest SMART. I also only buy 7200 rpm disks (because life's too short). If power consumption is an issue, I get SSDs.

 

Just based on recent personal and commercial experience.

Edited by gordan
Link to comment
Share on other sites

These days I only buy HGST 3.5" and 2.5" and Toshiba 2.5" drives. I also only buy disks with TLER for any use (HGST and 2.5" Toshibas all have it). Seagate I don't buy due to astronomical failure rates in any application I used them in, even though, credit to them, their SMART attributes don't lie. WD and Samsung are reasobably reliable but I don't buy them any more because their disks outright lie about their SMART statistics (e.g. on many if not all models pending sectors disappear after overwriting, and reallocated sector counts don't go up, implying that either the disk reused a questionable sector or it is outright lying about reallocated counts, either way not good). Toshiba drives have been reasonably reliable for me, and they are honest in their SMART counters. HGST have been the most reliable and have honest SMART. I also only buy 7200 rpm disks (because life's too short). If power consumption is an issue, I get SSDs.

 

Just based on recent personal and commercial experience.

Many SSD's actually draw as much power as their 2.5" spinning counterparts. For laptops I find Samsung and SanDisk to consistently offer lowest power draw.

As for using TLER drives in everything, there is no standard time set for TLER. It can very widely, 7 seconds to 100 milliseconds.

 

Anther thought on why I like more detail on why drives fail is to better identify risk for people using them. If someone has a four drive array using Seagate 3TB drives that BB has called into question, better detail on which versions are failing would help them plan to replace them or stay with them. 

Link to comment
Share on other sites

Many SSD's actually draw as much power as their 2.5" spinning counterparts. For laptops I find Samsung and SanDisk to consistently offer lowest power draw.

As for using TLER drives in everything, there is no standard time set for TLER. It can very widely, 7 seconds to 100 milliseconds.

 

Anther thought on why I like more detail on why drives fail is to better identify risk for people using them. If someone has a four drive array using Seagate 3TB drives that BB has called into question, better detail on which versions are failing would help them plan to replace them or stay with them. 

 

True, but most SSDs, especially of recent generations in fact draw a tiny fraction of 2-4W that a typical spinning 2.5" disk will draw. Intel drives are specced at 100mW, for example.

 

Most RAID controllers seem to set TLER at 7 seconds, but it always has to be set manually as far as the drive is concerned, e.g. smartctl -l scterc,70,70. Proper RAID controllers (the dinosaurs that they are) will do this for you.

 

If I had a 4-disk array of Seagates, I'd be replacing them post haste if my restore process was inconvenient (e.g. I didn't have an on-site backup and restore downtime was an issue - let's face it restoring 6-9TB of data is going take a long time whatever you do). If there was a redundant server with non-Seagate drives, I'd probably just live with it and rely on failover to deal with it and live with the reduced throughput while failed disks are getting replaced. Having said that, if I had a redundant similar server with non-Seagate drives in n+2 redundancy, I'd probably be swapping two of the disks between the servers to avoid being too bitten at the same time when the inevitable begins to happen.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...