Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Statistics question

  • 15-01-2004 4:09pm
    #1
    Registered Users, Registered Users 2 Posts: 78,577 ✭✭✭✭


    How do I "prove" the blip (Injury 2002 @ midnight) on the attached graph is unnatural? I suspect procedural or similar error.

    The graph shows the number of injury accidents in Ireland over the last few years. Y axis is number of injury accidents (, excluding fatal accidents which may also have injuries, number of accidents not injuries). X-axis is time of day from 5pm (peak, to make reading easier) to 5pm. Minor peak at 8am coincides with morning rush hour. Low point is 5am.


Comments

  • Registered Users, Registered Users 2 Posts: 78,577 ✭✭✭✭Victor


    Graph


  • Closed Accounts Posts: 9,700 ✭✭✭tricky D


    could it even be a typo

    dunno bout methodology or the metrics - would need to be told more about that: sources, changes over years if any etc.

    ok that doesn't help much but maybe something happened late in 2001 (9/11 excepted). pre 2001 have v similar shapes and the blue accidents seem to start a bit earlier in the night and be higher at 12. it just looks like there is a bit of a possibility that some pattern shift might have started in autumn 01. any new pub laws start around then, other legislation??

    any data for 03, even partial??? or info on the type of accidents at the time in question

    whatever the cause or error, given the scale of change there just has to be some explantion


  • Registered Users, Registered Users 2 Posts: 78,577 ✭✭✭✭Victor


    I don't think it was legislation, I think the late night pub opening was well before that.
    Originally posted by tricky D
    any data for 03, even partial??? or info on the type of accidents at the time in question
    No - it takes them 10-11 months to present the (presumably already computerised) data. A very short summary for deaths is available on the Garda Site.

    http://www.garda.ie/angarda/statistics98/nroadstats.html


  • Closed Accounts Posts: 2,155 ✭✭✭ykt0di9url7bc3


    would the number of cars on the road reflect anything to the results....?


  • Registered Users, Registered Users 2 Posts: 78,577 ✭✭✭✭Victor


    Originally posted by SearrarD
    would the number of cars on the road reflect anything to the results....?
    Possibly, however the blip seems to be well outside the norms.

    Did midnight street racing suddenly become popular in 2002?


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 31 robroy


    The blip shown is clearly unexpected. The only way I have ever found to understand anomalies is to dig deeper into the raw data.
    My hunch on this one would be that
    1. As the hour is 0 and
    2. As the stats for 2002 are lower overall

    there is a strong possibility that for year 2002 unentered time has been entered as Zero hence distortion.

    I would be tempted to take the base data, remove the 200 excess at 0:00 and distribute this 200 across other data points in proportion to the existing incidents at each time. Then look at the data to see if it makes better sense, and if so go look for the entered data to backup the hypothesis.
    Good Luck


  • Registered Users, Registered Users 2 Posts: 78,577 ✭✭✭✭Victor


    Originally posted by robroy
    The blip shown is clearly unexpected. The only way I have ever found to understand anomalies is to dig deeper into the raw data.
    My hunch on this one would be that
    1. As the hour is 0 and
    2. As the stats for 2002 are lower overall

    there is a strong possibility that for year 2002 unentered time has been entered as Zero hence distortion.
    I thought about that, but unknown / unentered seem to be indicated separately. Another factor may be pulse terminals that aren't adjusted for summer / winter time recording something as 0.23 instead of 11.23 etc.


  • Registered Users, Registered Users 2 Posts: 31 robroy


    I would have thought that a summertime error would only have a sideways distortion for those terminals. The 200 excess is far in excess of the effect that would have unless terminals frequently reverted to zero.
    My instinct would be to challenge the 'time not specified' claim before you accept it as true. The pointer here is sufficient for a detective to pick out a 'prime suspect'. In either case the evidence is that the base data is incorrect. The task is to find out how. It gets back to the need to , in computer programming language, debug the reporting system, and the data.

    PS Is there any quantity for 'unentered or unknown' that you could compare across the years? Perhaps looking for a drop of 200 for 2002?


  • Moderators, Recreation & Hobbies Moderators, Science, Health & Environment Moderators, Technology & Internet Moderators Posts: 93,563 Mod ✭✭✭✭Capt'n Midnight


    Redraw the graph as a % of the figure based on 1998 figure for each time.

    This will produce a large spike - not statistical but visually obvious.

    What you would like is a figure for standard deviation, then you could show that that figure was so many standard deviations away from where you would expect it to be - this can then be expressed as a % likely hood of happening by chance.

    Is the raw data available ?


  • Registered Users, Registered Users 2 Posts: 78,577 ✭✭✭✭Victor


    Originally posted by Capt'n Midnight
    Is the raw data available ?
    Yes. It's my graph. I have the tendency to present graphs in atypical ways to present the most pertinent data and thats how I spotted the blip.


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 78,577 ✭✭✭✭Victor


    Right, took the minimum for any particular hour over the five years (using any one year as a start point wasn't suitable) and divided the number of injuries by that number and came up with this graph. It's quite obvious that we have a "blip".

    http://members.boards.ie/victor/RoadInjuriesStatistics.gif

    The number of unknown time was actually well up, so they weren't erroneously attributed to a default midnight.


  • Registered Users, Registered Users 2 Posts: 2,648 ✭✭✭smiles


    You need to use the data to find the mean and the variance and the standard deviation.

    A large number of trials in the measurement of a particular quantity will result in a "normal" distribution. 68 % of the values will be within one standard deviation of the average. 95 % will be within two standard deviations, 99.7% of the values will be within three standard deviations, scientifical significance says that if a measurement is more than 3 standard deviations from the mean or another measurement then this difference is significant. If the values are within 2 standard deviations, the difference is not significant.

    I don't trust myself to reproduce all the formulas exactly right now, (it's late), but I can write them up and try and explain them, but google is your friend too! :)

    [edit: A quick look gives Statistics at Square One which might be helpful to you ]

    << Fio >>


Advertisement