Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Statistics/Correlation question

  • 02-09-2014 12:34pm
    #1
    Registered Users, Registered Users 2 Posts: 1,490 ✭✭✭


    I've never done statistics or mathematics properly, so apologies for any silliness.

    If I have a hunch that, let's say, in a certain country, the number of businesses in a city is correlated to the amount of revenue that the city makes as a whole. And I have a bunch of data on the amount of businesses in lots of cities, and on the revenue of lots of cities. Let's say that I scatterplot the two, for one country, and they look perfectly correlated. I do it for many countries and it's always the same result. r^2 is always close to 1.

    Firstly, assuming I haven't gotten something drastically wrong already, might it be suspicious that they're always highly correlated, even if your hunch makes intuitive sense?

    Secondly, if it's not suspicious, and I see the same result time and time again, and I start analysing a new country but have no data, except for that of one city, can I simply 'construct' a regression line from 0,0 to that one city, and then know the revenue for cities within that country, when i find out the number of businesses in them?


Comments

  • Registered Users, Registered Users 2 Posts: 5,141 ✭✭✭Yakuza


    It stands to reason that, within a particular country, the more businesses there are, the more revenue will be generated by those businesses.

    However, different countries will have different economic factors - wages, rents, rates, taxes etc. Even different cities have different costs of living, so I don't think you can infer the number of businesses in a particular city of by only having revenue data for one city of that country.

    For example, a city in Norway or Sweden might have the same revenue as a large city in Spain, but (and I'm going on semi makey-uppey figures here), the number of businesses might be lower in the Scandanavian city, but each business charges more (Mcdonalds in Sweden charges about €9 for a meal, where it's €6 in Spain).


  • Registered Users, Registered Users 2 Posts: 1,490 ✭✭✭floorpie


    It's a made up example, and probably a bad one, but you're right, it does seem like there are many confounding factors. But if we assume that they're negligible...let's say that every business sells the same number of products at the same price.

    (I was thinking about it the other way around by the way, inferring revenue from number of businesses). So it probably does stand to reason, the independent variable is the number of businesses in a city, who are all selling the same number of products at the same price, and the dependent variable is the city's revenue. So my hunch is that they're correlated, cause it does stand to reason as you say.

    Let's assume that the country we want to apply the assumption to is similar economically. The slope of the regression line will be different maybe...but apart from that, do you think one could construct the line with limited data and infer revenue, if we assume that the variables are still linearly correlated?


Advertisement