Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Some interesting stats from the London Marathon

  • 04-05-2011 8:50am
    #1
    Registered Users, Registered Users 2 Posts: 15,704 ✭✭✭✭


    excerpt
    The often-cited average marathon time of 4:30 is based on the point at which exactly 50 per cent of the field has passed the finish line (while the remaining 50 per cent has yet to pass the line). In mathematics, this ‘average’ is known as the median. ...
    The really interesting figure is the mode – in this instance, the minute during which the highest number of people passed the finish line. Look back at the chart and you’ll see a sharp spike that’s completely out of keeping with traditionally smooth edges of a bell curve. In 2010, the mode finishing time was 3:59 when 347 runners surged past the finish line (or 0.95% of the entire field of runners).


Comments

  • Closed Accounts Posts: 2,120 ✭✭✭Gringo78


    RayCun wrote: »
    excerpt

    Link to full article?


  • Registered Users, Registered Users 2 Posts: 15,704 ✭✭✭✭RayCun


    Doh!
    article
    :rolleyes:


  • Registered Users, Registered Users 2 Posts: 1,140 ✭✭✭snailsong


    There is no reason to expect a smooth edged bell curve as this is not random data ('stochastic process' I think is the technical term). As mentioned in the article, people are looking at their watches, and the pacers, and putting a spurt on for the 4 hr mark.

    Kinda sad to see the 55-59 age group had a modal time of 4.01. Close but no....:).


  • Closed Accounts Posts: 208 ✭✭airscotty


    Very interesting stuff! As a younger runner :) Id put the slower times of the 18-39 yrs down to the fact that when people are young they think they can just get through a marathon but maybe as people get older they feel 'oh im getting on, I better actually train' and you get the more serious older runners tackling the distance?


  • Registered Users, Registered Users 2 Posts: 137 ✭✭danburke


    finish-time-bell-curve.jpg

    This looks like a Poisson process..which would make sense as each runner's time is individual to themselves,each runner has a different fitness level, (imagine a bunch of photon's traveling together, they all have different energies)

    Interesting though


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 137 ✭✭danburke


    so the mode of this distribution should be the variance of the distribution minus 1


  • Registered Users, Registered Users 2 Posts: 214 ✭✭tyler71


    I'd say having pacers at 3:30, 3:45, 4:00, etc probabaly contributed to the peaks as well - you can pick them out. Interesting analysis on the age groups but it falls down a bit in not splitting up men and women. Most age groups seem to have at least two modes, be interesting to see the proportion of men and women in each.


  • Closed Accounts Posts: 4,608 ✭✭✭donothoponpop


    The events (finishing times) aren't independent of each other, the probability of an event being slightly under 4 hours is greater than that of being slightly over. The author has pointed out the spike before 4 hours as completely out of tradition of a normal bell curve, without understanding that that spike (being non-independent) is the very reason why it shouldn't fit a normal bell curve.

    If you model random samples from all the finishing times, then you'll have a normal bell curve.


  • Registered Users, Registered Users 2 Posts: 137 ✭✭danburke


    The events (finishing times) aren't independent of each other

    How are runner's finishing times correlated?


  • Closed Accounts Posts: 4,608 ✭✭✭donothoponpop


    danburke wrote: »
    How are runner's finishing times correlated?

    For data to fit a pattern, it needs to be independent. There's a far greater probability of runners finishing slightly under 4 hours, than slightly over, for example. If you took away watches and clocks, and let everyone run blind, you'd have a far closer approximation of a smooth bell curve.


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 1,514 ✭✭✭Dermo


    Not everyone is finished the marathon yet
    http://www.bbc.co.uk/news/uk-england-london-13253116


  • Registered Users, Registered Users 2 Posts: 137 ✭✭danburke


    For data to fit a pattern, it needs to be independent. There's a far greater probability of runners finishing slightly under 4 hours, than slightly over, for example. If you took away watches and clocks, and let everyone run blind, you'd have a far closer approximation of a smooth bell curve.

    Okay I think I don't understand....

    patterns fit data, data doesn't have to fit to anything.
    There's a far greater probability of runners finishing slightly under 4 hours, than slightly over, for example.

    That is just a prior, that doesn't imply a normal distribution of running times about this point


  • Registered Users, Registered Users 2 Posts: 214 ✭✭tyler71


    Yeah, all the runner-runner interaction is what makes the difference. The fact that people are actually racing each other has a huge affect on each others performance, and making each other go faster (or slower), the advantages of running in a group and adjusting their pace (usually upwards) to match, the encouragement from other runners - in fact all the reasons that make a race enjoyable is what makes it non-normal :)


  • Closed Accounts Posts: 4,608 ✭✭✭donothoponpop


    @dan, it could be that I'm reading a different slant in the article than you- or of course that I'm just wrong! I find it slightly surprising that the author would comment on data spikes as deviating from a normal distribution curve, and his suggestion that the mode gives a more accurate picture of the "average"- sometimes this is the case, but not so here. The mode (most frequently occurring finishing time) is skewed in this instance because it is an arbitrarily-popular yardstick- if there was a £100 bonus for anyone who finished in 3:52, this would be the mode.

    Applying statistics to sporting events usually does one of two things- take an incorrect analysis from the statistics (as here), or bore the tears off whoever is listening (I lived in NY and endured many obese Yankees fans quoting team stats to me during drinking time :D)


  • Registered Users, Registered Users 2 Posts: 970 ✭✭✭mithril


    It's an interesting article but I would quibble with the way in which the numbers are analyzed and interpreted.
    Comparing median and mode, in the way he has done it is not good practice, and unnecessarily complicates the analysis. They are apples and oranges.
    It's much clearer if you compare the median of one group directly with another, without introducing the concept of a mode, and you reach the same conclusions.

    As others have pointed out, the choppiness in the data which has a local peak every quarter hour, is accounted for pace groups, and the fact that these are natural target times for a runner.


  • Closed Accounts Posts: 4,742 ✭✭✭ultraman1


    iv read this thread 4.5 times and i still dont know wtf its about:o


  • Registered Users, Registered Users 2 Posts: 4,454 ✭✭✭Clearlier


    ultraman1 wrote: »
    iv read this thread 4.5 times and i still dont know wtf its about:o

    Some guy got some numbers from the London marathon and made a few graphs. He then tried to analyse them but unfortunately did a pretty poor job of it. If he had done it in a better way it probably would have been easier to understand.

    For my money he mixes up median and average which to borrow from a previous poster is like comparing apples to oranges. Why he thought that the mode was interesting in the context in which he tried to discuss it is beyond me.


  • Moderators, Science, Health & Environment Moderators, Sports Moderators Posts: 24,144 Mod ✭✭✭✭robinph


    What I did get from it is that the 45 year old blokes are generally better than the youngsters. Something I noticed in the BHAA races as well where there tended to be a lot more of them having faster times than the young whipersnappers.


  • Registered Users, Registered Users 2 Posts: 1,170 ✭✭✭Hard Worker


    On a similar note, the "busiest finishing minute" in Dublin 2010 was from 3.59 to 4.00.


  • Registered Users, Registered Users 2 Posts: 7,550 ✭✭✭plodder


    ... and if I've looked it up right, the median finishing time for Dublin in 2010 was 4:09, which is quite a bit faster than London (more serious runners and fewer gorilla costumes maybe :pac:). I agree with the previous poster's quibbles about the stats in the article. Being pedantic maybe, but you wouldn't really expect a normal distribution of results in a marathon. Aside from the spikes, the tail of the graph should be larger because there's far fewer fast marathon runners compared to slower ones.


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 80 ✭✭firemouth


    plodder wrote: »
    ... and if I've looked it up right, the median finishing time for Dublin in 2010 was 4:09, which is quite a bit faster than London (more serious runners and fewer gorilla costumes maybe :pac:). I agree with the previous poster's quibbles about the stats in the article. Being pedantic maybe, but you wouldn't really expect a normal distribution of results in a marathon. Aside from the spikes, the tail of the graph should be larger because there's far fewer fast marathon runners compared to slower ones.
    whats wrong with gorilla costumes??


  • Registered Users, Registered Users 2 Posts: 7,550 ✭✭✭plodder


    firemouth wrote: »
    whats wrong with gorilla costumes??
    nothing - just you probably won't get a PB running in one ...


  • Closed Accounts Posts: 4,608 ✭✭✭donothoponpop


    Here's a very interesting series of intro vids about statistics, for anyone interested. The link is to the brilliant Khan academy, and this one explaining the Central Limit Theorem, which would be applicable in this London marathon data instance.


  • Registered Users, Registered Users 2 Posts: 6,340 ✭✭✭TFBubendorfer


    Here's a very interesting series of intro vids about statistics, for anyone interested. The link is to the brilliant Khan academy, and this one explaining the Central Limit Theorem, which would be applicable in this London marathon data instance.

    From playing around with a few numbers to a university course in statistics in one single thread. Impressive.


  • Registered Users, Registered Users 2 Posts: 137 ✭✭danburke


    Here's a very interesting series of intro vids about statistics, for anyone interested. The link is to the brilliant Khan academy, and this one explaining the Central Limit Theorem, which would be applicable in this London marathon data instance.

    One must always be careful when using the central limit theorem. Given enough events (i.e. runners/photon arrival times) most distributions will approximate to a Gaussian (normal bell shaped curve). That destroys information. It would be like averaging your time over an hour would destroy all the info you had on instantaneous running speed.

    I think there are a load of factors here that cannot be explained by a simple graph/mean/mode analysis...

    It all depends what you want to get out of the data...

    if you had marathon times, where the runners were kept in the dark about their own times and those of everybody else, each runners arrival time would become a random independent variable and you could then happily apply a lot of stationary statistics.


Advertisement