Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Mtbf

  • 15-11-2007 4:23pm
    #1
    Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭


    So....hard drives come with an MTBF value (mean time between failures). The values for these seem fairly substantial. 500,000 hours, 600,000 hours, etc.

    I had a squizz on wikipedia and my understanding is this: They take a large number of drives and test them for a short time (let's say, 10,000 drives, tested for 50 hours and get 1 failure...therefore mtbf = 500,000 hours).

    Is mtbf a useful indicator of how long my drive will last? I just had a look at the smart data on two of my samsung P120s. I'm considering raid 1 ing them as my OS / games drives. They've both been up for around 7000 hours. How long should they really last (mean value is fine)? Surely it's not another 60 odd years. ;)


Comments

  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    Here's the actual query I used:
    sudo smartctl -A /dev/sdd
    
    smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
    Home page is http://smartmontools.sourceforge.net/
    
    === START OF READ SMART DATA SECTION ===
    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000f   253   100   051    Pre-fail  Always       -       0
      3 Spin_Up_Time            0x0007   100   100   025    Pre-fail  Always       -       5952
      4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       546
      5 Reallocated_Sector_Ct   0x0033   253   253   010    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0
      8 Seek_Time_Performance   0x0025   253   253   015    Pre-fail  Offline      -       0
      9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       7162
     10 Spin_Retry_Count        0x0033   253   253   051    Pre-fail  Always       -       0
     11 Calibration_Retry_Count 0x0012   253   002   000    Old_age   Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       544
    190 Temperature_Celsius     0x0022   169   112   000    Old_age   Always       -       23
    194 Temperature_Celsius     0x0022   169   112   000    Old_age   Always       -       23
    195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       8757
    196 Reallocated_Event_Count 0x0032   253   253   000    Old_age   Always       -       0
    197 Current_Pending_Sector  0x0012   253   253   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0030   253   253   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
    200 Multi_Zone_Error_Rate   0x000a   253   100   000    Old_age   Always       -       0
    201 Soft_Read_Error_Rate    0x000a   253   100   000    Old_age   Always       -       0
    202 TA_Increase_Count       0x0032   253   253   000    Old_age   Always       -       0
    


  • Closed Accounts Posts: 12,401 ✭✭✭✭Anti


    there are other things to factor in, like operating temprature, humidity. Have they been moved bumped. Power surges.

    I dont realisticially think there is a way to work out the life time on them to be honest.


  • Registered Users, Registered Users 2 Posts: 68,317 ✭✭✭✭seamus


    Wikipedia explains this very well:
    MTBF is not to be confused with life expectancy. MTBF is an indication of reliability. A device (e.g. hard drive) with a MTBF of 100,000 hours is more reliable than one with a MTBF of 50,000. However this does not mean the 100,000 hours MTBF HD will last twice as long as the 50,000 MTBF HD. How long the HD will last is entirely dependent on its life expectancy. An 100,000 MTBF HD can have a life expectancy of 2 years while a 50,000 MTBF HD can have a life expectancy of 5 years yet the HD that's expected to break down after 2 years is still considered more reliable than the 5 years one. Using the 100,000 MTBF HD as an example and putting MTBF together with life expectancy, it means the HD system should on average fail once every 100,000 hours provided it is replaced every 2 years. Another way to look at this is, if there are 100,000 units of this drive and all of them are in use at the same time and any failed drive is put back in working order immediately after the failure, then 1 unit is expected to fail every hour (due to MTBF factor).

    In practice, I've found that drives tend to outlive their useful life. In the course of the thousand or so machines I've tended to in my shortish working life, I can only recall two random hard drive failures (as opposed to drops, water immersion, etc). I still have a number of < 10GB drives which saw fairly constant use during their lives and work fine now but are useless.

    As Hard drive capacities and efficiencies improve, you'll find that your great 120GB drives probably work fine in 5 years' time but are slower, heavier and less efficient than the 2TB model on sale for €80.


  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    Anti wrote: »
    there are other things to factor in, like operating temprature, humidity. Have they been moved bumped. Power surges.

    Not been moved much. Operating temperature is 23C ( have an antec 900 so there's air flowing directly over all my hdd's).
    I dont realisticially think there is a way to work out the life time on them to be honest.

    Ah, it was more....x = y, i.e. "generally they'll last around 4 years ish", but it's grand. As seamus has pointed out...most drives outlive their usefulness (hadn't really thought about that tbh). With raid 1 on these 200G jobbies, I suppose they'll probably last a good few years.


  • Registered Users, Registered Users 2 Posts: 2,259 ✭✭✭Shiny


    I remember reading an article by a university which attempted to find out the causes of hardrive failures.
    They had multiple scenarios with constant heavy loads, random access, high temperatures, 24 hour operation, etc. They found that these variables didn't have as much of an affect on the hardrive as one would have thought.

    They found that the single biggest factor in hardrive failure was old age.

    If you really really want i can go and try and find the article.:rolleyes:


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    Interesting.

    Ah, I believe ye, but it'd make a nice read. :) How bored are you? That's the real question here.


  • Registered Users, Registered Users 2 Posts: 2,259 ✭✭✭Shiny




  • Registered Users, Registered Users 2 Posts: 32,417 ✭✭✭✭watty


    Statistics and Probability are non-intuitive to humans.

    An MTBF tells you nearly nothing about YOUR drive.
    But is an accurate guide to how many may fail in the overall population. Also to be considered is that many products have a bathtub curve of failures. Most failures may be in the infancy AND old age of the product, thus the MTBF is misleading, individual drives may fail much much earlier or much much later.

    Obviously a drive with a higher MTBF may be more reliable than a lower for you. But you can't know how long your drive will last. Irrespective of MTBF it may last another 10 years or fail today. You can't know.

    But if you have a data centre with 10,000 drives, the MTBF will give you an indictation of how many drives you should have spare on site and how often a replacement might occur. But never which particular drive will fail.

    Drive failures I have seen since 1980
    :: > 40 Infant failures in 1st month.
    :: Approx 10 due to abuse (One server moved while running and one knocked over by cleaner). I would have sacked the technican who had been told NEVER to move a Live PC/Server and the owner of the other server had been told that under a work bench was unsuitable.
    :: About 5 at 1 to 2 years.
    :: Approx 5 or so really really old drives (>6 years).

    One reason you need RAID5 for 15Krpm SCSI drives is that they generally are in servers (more important data) and in failure rate my experience is 10Krpm & 15Krpm SCSI drives failed much more (outside of infant, really old or abuse) than 5400rpm IDE.

    Of course with 5 drives you are 5 times more likely to get a failure. This is why striping or volume sets made from multiple drives to get more speed or storage with out fault tolerance is stupidity.


  • Registered Users, Registered Users 2 Posts: 1,065 ✭✭✭Snowbat


    I'm surprised noone has yet mentioned the disk reliability paper by Google engineers - interesting read:
    http://labs.google.com/papers/disk_failures.pdf
    http://hardware.slashdot.org/article.pl?sid=07/02/18/0420247
    Khannie wrote: »
    Operating temperature is 23C ( have an antec 900 so there's air flowing directly over all my hdd's).
    Maybe a bit cold? Google's drives between 30 and 40C show increased reliability, and "there is a clear trend showing that lower temperatures are associated with higher failure rates". However, you're almost in the 25-50C window where temperature seems to have minimal effect on failure rates.


  • Registered Users, Registered Users 2 Posts: 32,417 ✭✭✭✭watty


    I had one old server that if it was shut down for a while needed the blow heater on it for an hour or so or most of the drives didn't spin up. It was in an outhouse.


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    Nice read that google paper. Pity they wont reveal manufacturer information.


Advertisement