Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

work project need ideas/help

  • 27-10-2006 9:25pm
    #1
    Registered Users, Registered Users 2 Posts: 9,604 ✭✭✭


    OK guys.

    i have been assigned to a new project at work. We build EPOS systems and we put out a huge number of PC's and EPOS bases. All of these use hard drives.

    We are experiencing a huge number of hard drive failures every week and i have been assigned to get to the bottom of this.

    We also have a number of PC's which fail regularly and i have been assigned to solve that as well.

    Basically i have to figure out the root cause of these problems.

    I am looking for utilities to test hard drives which come back to the office and give me some idea of how it failed or why.

    I am also looking for software which will test components of PCs for failure.

    I have only been assigned a half day a week in order to do this so i need software which will be highly automated so i can leave it running and go do other work.

    I need to update a spreadsheet in each week with a cause for each hard drive and PC failure.

    Is there any software also that i can put on PCs and Hard drives on a pilot basis in order to track usage of the hard drives which will generate a report as to the usage of the hard drive each day. i have a feeling that us failing to run a defrag and the high writing of data to the drives is causing the problems.This software has to be non- intrusive.The report can be got back from the site via a dial in.

    I have been looking already at PASSMark Burn In Test.

    If you have any ideas or suggestions i would be grateful. It will look very good for me if i can get this right and this vibe came from one of the directors of the company.

    Thanks.


Comments

  • Registered Users, Registered Users 2 Posts: 6,762 ✭✭✭WizZard


    Main issue with HDD failure is usually due to heat. And with any EPOS that I've worked with there has been very little proper ventillation.

    What OS do you run on the EPOS? That will determine what programs you could use...


  • Registered Users, Registered Users 2 Posts: 6,762 ✭✭✭WizZard


    If it's Windows try DriveSitter - it can email reports to you daily/scheduled


  • Registered Users, Registered Users 2 Posts: 9,604 ✭✭✭irishgeo


    WizZard wrote:
    If it's Windows try DriveSitter - it can email reports to you daily/scheduled

    its windows 2000 and dos but it they are going for a 0 failure rate on the windows sites.

    if i can suggest to run this on a windows site on a trial basis to monitor the temp of the drive on a trial site.


  • Registered Users, Registered Users 2 Posts: 4,142 ✭✭✭TempestSabre


    Theres some thing seriously wrong there. Even in companies that had 2000-4000 machines I've never seen failure rates like that. Are these PC or the EPOS all in a the same kind of chassic or case? My hunch would also be heat. Are you fitting new HD's they tend to run hotter than older drives.


  • Registered Users, Registered Users 2 Posts: 9,604 ✭✭✭irishgeo


    Most of the epos is in the same cases. I have been told that heat is a problem with newer drives. I cant just go suggesting heat as a problem without evidence from the hard drives that come back.is there anyway of testing for heat when a hard drive is returned faulty from a site. It could be 3 days floating around before i get it.


  • Advertisement
  • Closed Accounts Posts: 12,401 ✭✭✭✭Anti


    Okay, so this is your job, yet you want us to do the work for you ?

    When i worked in for a pcshop we had applications that could determin within reason to why the drive has failed. And like wiz said heat is the main cause.

    I have a copy of this, and for a fee you can have it. But i still think being given a job then asking someone else how to do it is dispicable. If you are unable to work it out yourself, Maybe you shouldn't be in your job :p

    Nah if you go to all the main hdd manufactures websites they all have their own tools on it. These can be great. They are all dos based, ie boot cd.

    Also ubcd has a few nice tools on it.

    oh yeah ubcd = ultimate boot cee dee :)


  • Closed Accounts Posts: 1,226 ✭✭✭hopeful


    You can use Speedfan top produce a report on SMART enabled drives.

    Produces a screen like this:

    hdqj9.jpg

    I've found it handy to give early warnings in the past.


  • Registered Users, Registered Users 2 Posts: 6,762 ✭✭✭WizZard


    How are you defining "failed" - di they just crap out and never work, or did they kinda work when you get that back?


  • Registered Users, Registered Users 2 Posts: 9,604 ✭✭✭irishgeo


    WizZard wrote:
    How are you defining "failed" - di they just crap out and never work, or did they kinda work when you get that back?

    failed in not booting up, some data can be recovered off some. a fdisk and reuse usually works on most and anything that fails to fdisk gets dumped.

    others just hang the PC when booted as a slave.


  • Registered Users, Registered Users 2 Posts: 6,762 ✭✭✭WizZard


    irishgeo wrote:
    failed in not booting up, some data can be recovered off some. a fdisk and reuse usually works on most and anything that fails to fdisk gets dumped.
    Then it's most probably a heat issue since they are slightly recoverable/reusable afterwards.

    If you are reusing previously failed drives, a)you're nuts, and b)no wonder they fail again.


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 4,142 ✭✭✭TempestSabre


    You can get a heat sensitive strip that you can put in a case and/or on a hard disk and it records the highest temp it reaches. Put that in a few machines and see if they exceed the manfacturers specs. Also fit one case with extra fans, put extra holes in the case for ventilation. See if that has less errors etc. stays up longer. Or simply fit a cool running drive (even a 2.5) and see if the problem reoccurs and if it does does it happen a less often.

    You could do all that one afternoon and check back the following week. I think to set up a remote system to check temps would be more complex and you don't have the time.

    http://www.dataclean.com/temperature_strips.htm
    http://www.t-m-c.com/our_products.html


  • Registered Users, Registered Users 2 Posts: 771 ✭✭✭Sir Random


    It's almost impossible to know what caused hd failure after the fact. You would need to have monitoring software running on all hds and generating logs of temp etc, and that only works if the data is retrievable after the crash.

    This reminds me of a problem with excessive hardware failures in a certain civil-service training department years ago. PCs were suddenly running very slow or not starting up at all. It was discovered that an employee had been taking ram and swapping cpus from the PCs :D


  • Moderators, Recreation & Hobbies Moderators, Science, Health & Environment Moderators, Technology & Internet Moderators Posts: 93,596 Mod ✭✭✭✭Capt'n Midnight


    get statistics on the drives, in case of a bad batch, or one type of motherboard / hdd

    could be power supplies too

    could also be enivromental - vibration / kicking etc.
    2.5" laptop drives are slower and more expensive than 3.5" drives but generally stand higher g forces and should run a lot cooler in a roomy desktop case than cacooned in a plastic laptop

    as for re-using drives
    if it's a software crash that's fine,
    but no live data on a drive with hardware errors


Advertisement