Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Something in my PC is eating drives

  • 29-10-2017 3:59pm
    #1
    Registered Users, Registered Users 2 Posts: 7,181 ✭✭✭


    This is a bit of an odd one.

    I had two SSDs in my system: A 250GB 840 Evo, and a 960GB SSD Ultra (I think) from Sandisk.

    A couple of months ago, the Sandisk was giving me disk write errors in Steam during updates and the like (basically the only thing on that SSD). They got worse to the point that I couldn't play games because they couldn't update at all.

    Set up an RMA with Sandisk, but they refused, saying the drive reported fine based on the report I sent the them using their software. I reformatted, and it worked.

    The same thing came back a couple of weeks ago, but this time after a week or so, the computer wouldn't boot - or even POST - with this thing installed. Took it out, all's fine. This time Sandisk set up and RMA. So I have only the Evo in at the moment.

    Except that the same thing happened to the Evo this morning.

    Tried it in another machine, and on boot CHKDSK started, found a tonne of errors, but it seems to have corrected them, because I'm back in the OS.

    Checked for the most up-to-date BIOS and chipset drivers already. Thermals seem within margins as well.

    Anyone have any idea what could be causing this?


Comments

  • Registered Users, Registered Users 2 Posts: 10,299 ✭✭✭✭BloodBath


    Have you tested your ram?


  • Registered Users, Registered Users 2 Posts: 7,181 ✭✭✭Serephucus


    I haven't. I figured I'd get more application errors if that was the problem. I'll leave Memtest running tonight though and see what happens.


  • Closed Accounts Posts: 29,930 ✭✭✭✭TerrorFirmer


    I actually had the same problem recently. Write fails and unable to download games, or even stuff from the web or torrents. I'd just installed new RAM so I figured it must be the ram - nope, extensive testing, nothing. So I figured it was the drive (even though it was new), replaced it and same crap. I was actually completely dumbfounded......especially since it actually resolved itself and has been 100% since without me actually doing anything.

    I wonder was it something to do with Win updates because I cannot for the life of me figure out otherwise what was going on.


  • Closed Accounts Posts: 29,930 ✭✭✭✭TerrorFirmer


    I actually had the same problem recently. Write fails and unable to download games, or even stuff from the web or torrents. I'd just installed new RAM so I figured it must be the ram - nope, extensive testing, nothing. So I figured it was the drive (even though it was new), replaced it and same crap. I was actually completely dumbfounded......especially since it actually resolved itself and has been 100% since without me actually doing anything.

    I wonder was it something to do with Win updates because I cannot for the life of me figure out otherwise what was going on.


  • Registered Users, Registered Users 2 Posts: 12,708 ✭✭✭✭Skerries




  • Advertisement
  • Registered Users, Registered Users 2 Posts: 36,170 ✭✭✭✭ED E


    You arent doing BCLK overclocking perhaps? That can really f'ck SATA. (really doubt you are).


    Your fix actions (Format, Chkdsk) are likely just triggering the drive to do its own repairs (re-allocs). Thats the primary metric you need to care about. If many cells are going bad though then this has a very limited capability to keep the drive going.

    I'd suggest letting spinrite run on both on Lvl2 to catch anything thats currently on the way out.


    Then take a log of the SMART data. Write a few TB of random crap to the disk (use a benchmark tool), then capture SMART again. If the Re-alloc count is significantly higher the disk is worn out and needs RMA no matter what they say.

    I've dealt with Sandisk for RMA recently and while slow they were very helpful (spent about €40 in UPS shipping for a €15 mSD though :pac: )


    Update:
    For example, 128GB Crucial piece of junk CX series. 14TB written. 0 Reallocs.


  • Registered Users, Registered Users 2 Posts: 14,012 ✭✭✭✭Cuddlesworth


    When testing memory now, I don't bother with memtest, I use stressapptest. It's found errors for me in seconds, that memtest couldn't find in hours.


  • Registered Users, Registered Users 2 Posts: 7,181 ✭✭✭Serephucus


    24 hours of Memtest and no errors. I'll give StressAppTest a look, to be honest, from a quick glance I'm too tired to give a **** figuring out how to run it. I'll have a look in the morning when my brain is awake.

    Nope, not doing any BCLK OCing. A little OC, technically, as it's undervolted, but it's not by a lot, and absolutely nothing else about the machine leads me to believe it's that.

    No reallocated sectors on this disk at all. Wear levelling count is at 94, so I don't think it's the drives, given the same thing is happening to both, and this one at least shows as fine.

    Full SMART:
    (1) Samsung SSD 840 EVO 250GB
    Model : Samsung SSD 840 EVO 250GB
    Firmware : EXT0DB6Q
    Serial Number : [ snip ]
    Disk Size : 250.0 GB (8.4/137.4/250.0/250.0)
    Buffer Size : Unknown
    Queue Depth : 32
    # of Sectors : 488397168
    Rotation Rate : ---- (SSD)
    Interface : Serial ATA
    Major Version : ACS-2
    Minor Version : ATA8-ACS version 4c
    Transfer Mode : SATA/600 | SATA/600
    Power On Hours : 15558 hours
    Power On Count : 2126 count
    Host Writes : 15241 GB
    Wear Level Count : 68
    Temperature : 33 C (91 F)
    Health Status : Good (100 %)
    Features : S.M.A.R.T., 48bit LBA, NCQ, TRIM
    APM Level : ----
    AAM Level : ----
    Drive Letter : C:

    -- S.M.A.R.T.
    ID Cur Wor Thr RawValues(6) Attribute Name
    05 100 100 _10 000000000000 Reallocated Sector Count
    09 _96 _96 __0 000000003CC6 Power-on Hours
    0C _97 _97 __0 00000000084E Power-on Count
    B1 _94 _94 __0 000000000044 Wear Leveling Count
    B3 100 100 _10 000000000000 Used Reserved Block Count (Total)
    B5 100 100 _10 000000000000 Program Fail Count (Total)
    B6 100 100 _10 000000000000 Erase Fail Count (Total)
    B7 100 100 _10 000000000000 Runtime Bad Block (Total)
    BB 100 100 __0 000000000000 Uncorrectable Error Count
    BE _67 _45 __0 000000000021 Airflow Temperature
    C3 200 200 __0 000000000000 ECC Error Rate
    C7 100 100 __0 000000000000 CRC Error Count
    EB _99 _99 __0 000000000254 POR Recovery Count
    F1 _99 _99 __0 000771314D21 Total LBAs Written


  • Registered Users, Registered Users 2 Posts: 36,170 ✭✭✭✭ED E


    Are you running any AV suites that might have hooked disk writes?


  • Registered Users, Registered Users 2 Posts: 7,181 ✭✭✭Serephucus


    How do you mean?


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 36,170 ✭✭✭✭ED E


    The real time protection is done (mostly) by intercepting system calls like those to the file system. If say Kaspersky is grabbing each write, checking it, then writing it - it has the ability to scramble the data too if it goes screwy.


  • Registered Users, Registered Users 2 Posts: 14,012 ✭✭✭✭Cuddlesworth


    Serephucus wrote: »
    24 hours of Memtest and no errors. I'll give StressAppTest a look, to be honest, from a quick glance I'm too tired to give a **** figuring out how to run it. I'll have a look in the morning when my brain is awake.

    Create a live linux mint USB stick and boot into it.

    Go into console, enter "sudo apt-get install stresstestapp" and enter root password.

    Run the following command when installed.

    stressapptest -W -s 3600


  • Registered Users, Registered Users 2 Posts: 7,181 ✭✭✭Serephucus


    Update after running about 3TB through the drive. All looks fine, not a huge amount of reallocations.

    I'll give the linux live CD a go. Use it all the time in work, never crossed my mind...



    CrystalDiskInfo 7.0.5 (C) 2008-2016 hiyohiyo
    Crystal Dew World : http://crystalmark.info/

    OS : Windows 10 Professional [10.0 Build 15063] (x64)
    Date : 2017/10/31 23:21:10

    -- Controller Map
    + Standard SATA AHCI Controller [ATA]
    - Samsung SSD 840 EVO 250GB
    - Microsoft Storage Spaces Controller [SCSI]
    + Virtual CloneDrive [SCSI]
    - ELBY CLONEDRIVE SCSI CdRom Device

    -- Disk List
    (1) Samsung SSD 840 EVO 250GB : 250.0 GB [0/0/0, pd1] - sg

    (1) Samsung SSD 840 EVO 250GB
    Model : Samsung SSD 840 EVO 250GB
    Firmware : EXT0DB6Q
    Serial Number : [ snip ]
    Disk Size : 250.0 GB (8.4/137.4/250.0/250.0)
    Buffer Size : Unknown
    Queue Depth : 32
    # of Sectors : 488397168
    Rotation Rate : ---- (SSD)
    Interface : Serial ATA
    Major Version : ACS-2
    Minor Version : ATA8-ACS version 4c
    Transfer Mode : SATA/600 | SATA/600
    Power On Hours : 15582 hours
    Power On Count : 2126 count
    Host Writes : 18085 GB
    Wear Level Count : 79
    Temperature : 40 C (104 F)
    Health Status : Good (100 %)
    Features : S.M.A.R.T., 48bit LBA, NCQ, TRIM
    APM Level : ----
    AAM Level : ----
    Drive Letter : C:

    -- S.M.A.R.T.
    ID Cur Wor Thr RawValues(6) Attribute Name
    05 100 100 _10 000000000000 Reallocated Sector Count
    09 _96 _96 __0 000000003CDE Power-on Hours
    0C _97 _97 __0 00000000084E Power-on Count
    B1 _93 _93 __0 00000000004F Wear Leveling Count
    B3 100 100 _10 000000000000 Used Reserved Block Count (Total)
    B5 100 100 _10 000000000000 Program Fail Count (Total)
    B6 100 100 _10 000000000000 Erase Fail Count (Total)
    B7 100 100 _10 000000000000 Runtime Bad Block (Total)
    BB 100 100 __0 000000000000 Uncorrectable Error Count
    BE _60 _45 __0 000000000028 Airflow Temperature
    C3 200 200 __0 000000000000 ECC Error Rate
    C7 100 100 __0 000000000000 CRC Error Count
    EB _99 _99 __0 000000000254 POR Recovery Count
    F1 _99 _99 __0 0008D4A4F1D1 Total LBAs Written


Advertisement