Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Statistics and non-normal distribution

  • 01-10-2014 1:37pm
    #1
    Registered Users, Registered Users 2 Posts: 86,729 ✭✭✭✭


    I am trying to analyze some lab data. The assumption that we were told to use was a normal distribution curve, but I'm not too enthused by this.

    The problem is the data is not normally distributed: we are timing production cycles for a device. The average is about 24 seconds. Once in a while, we simulate a 'false start' and these can be 30, 40 seconds at worst, and represent a 10th of the data or less. So these drive the Mean up, and 90% of the data is to the left of the mean, therefore, not a normal distribution.

    I'm not a math wizard and I was always bad at statistics: aside from normal distribution how do I reduce this data properly to report what the time will be for 99% of the runs performed?


Comments

  • Registered Users, Registered Users 2 Posts: 86,729 ✭✭✭✭Overheal


    Or my test plan sucked, and I shouldn't simulate failures for an undetermined test system. good job heals.

    deleted the data values which timed the erroneous runs and divided the mean+3dev over the total production time to get a margin of safety instead. Seems like more appropriate data


  • Registered Users, Registered Users 2 Posts: 5,141 ✭✭✭Yakuza


    Yeah, you should exclude the data that aren't random (i.e. the data relating to observations whose start times you deliberately tweaked), only then can you begin from the assumption that you've got a normally-distributed set of data. To get what's called a two-tailed 99% confidence interval of your startup times, you need to calculate the mean and the sample standard deviation of your data. Take the mean and then respectively subtract and add 2.576 times the standard deviation to get the lower and upper bounds of the range.

    In other words, there is a .5% chance of getting a value more than 2.576 standard deviations above the mean (or less than -2.576 standard deviations from the mean), which adds up to a 1% chance overall; conversely you can be 99% confident that your startup times will fall in the range you calculate.


Advertisement