Randomness, clumping and Stochastic

DropTables · 18-08-2009 10:30AM #1

Here is the problem...

You have a very large square and scatter randomly 100s of 1000s (the little toppings). There are maybe 3 million little grains and it's a big enough square that they all fit with less than 10% of surface covered.

One sixth of them are red.

Some will clump. That is what a random distribution does.

Now we imagine a grid of approx 32 x 31 squares. to give about 1000 subdivisions.

1) so ON AVERAGE the square has 3,000,000/1000 grains, ie 3,000

1a: What is likely hood of an empty square?
1b: What is likely hood of a square with 6,000 grains?
1c: What is likely hood of a square with 12,000 grains?
1d: What is likely hood of 500 red ones?

2) Now we overlay with 100 x100 grid so on average we have 3,000,000/10,000 = 300 per square

2a: What is likely hood of an empty square?
2b: What is likely hood of a square with 600 grains?
2c: What is likely hood of a square with 1,200 grains?
2d: What is likely hood of 50 red ones?

3) Now we overlay with about 316 x316 such that there are about 100,000 squares. We now have ON AVERAGE 30 per square.
3a: What is likely hood of an empty square?
3b: What is likely hood of a square with 60 grains?
3c: What is likely hood of a square with 120 grains?
3d: What is likely hood of 5 red ones?

As you have smaller squares it's more likely that some squares are empty or have many more than the average. With one square, it's impossible for it to be empty. With 1000 squares presumably a smaller proportion of squares are empty or twice as full compared to 100,000 squares.

What is the mathematics to predict the distribution curve of numbers grains per square as the squares are smaller, % total coverage of surface vs grains (grains can even lie on top of each other, up to 6 deep) and the % of grains that are "activated" (i.e. red coloured?).

Remember Random distribution (the same distribution no matter which Grid Overlay we use after scattering), does not mean totally even distribution like floor tiles.

Fremen · 18-08-2009 12:08PM

My initial approach would have been to model the grains as uniformly distributed points on a plane.

In general I think you need the dimensions of a grain to be able to answer that problem. Given that the grains are allowed to stack, this looks difficult. It reminds me of the buffon's needle problem (though there's only one needle there).

Are you asking out of curiosity, or preparing for an exam or MSC thesis? Depending on how much time you're prepared to invest, you could take a look in Peter Jaeckel or Paul Glasserman's books on monte carlo methods.

gerry87 · 18-08-2009 12:13PM

I'll give it a lash,

1)

Probability of a grain landing on one particular square is 1/1000
Prob of a grain not landing on one particular square is 999/1000

Probability of 3,000,000 grains not landing on 1 particular square is (999/1000)^3,000,000. So if that’s the probability of it not landing on 1 specific one, then allowing the one specific one to be any of the 1000, 1a : [LATEX]1000*(999/1000)^{3000000} = 2.915 *10^{-1301}[/LATEX]

For 1b and 1c you can use a binomial

1b:[LATEX]
1000 * (3000000 choose 6000)*(\frac{1}{1000})^{6000}*(\frac{999}{1000})^{2994000} = 5.8*10^{-504} [/LATEX]

1c: [LATEX]
1000 * (3000000 choose 12000)*(\frac{1}{1000})^{12000}*(\frac{999}{1000})^{2988000} = 4.15*10^{-3322} [/LATEX]

2)

Same deal, but this time p=1/10,000 and (1-p) = 9,999/10,000, so:

2a: [LATEX] 1000*(9999/10,000)^{3000000} = 5.07 *10^{-128}[/LATEX]

2b: [LATEX]
1000 * (3000000 choose 6000)*(\frac{1}{10000})^{6000}*(\frac{9999}{10000})^{2994000} = 1.31*10^{-532} [/LATEX]

2c: [LATEX]
1000 * (3000000 choose 1200)*(\frac{1}{10000})^{1200}*(\frac{9999}{10000})^{2998800} = 4.12*10^{-1414} [/LATEX]

3)

3a: [LATEX] 1000*(99,999/100,000)^{3000000} = 9.36 *10^{-11}[/LATEX]

3b: [LATEX]
1000 * (3000000 choose 60)*(\frac{1}{10000})^{60}*(\frac{9999}{10000})^{2999940} = 0.000476 [/LATEX]

3c: [LATEX]
1000 * (3000000 choose 120)*(\frac{1}{10000})^{120}*(\frac{9999}{10000})^{2999880} = 2.5*10^{-32} [/LATEX]

The part D's are just the same, but instead of just miltiply the p's by (1/6).

I could be way of the mark here. Someone let me know if I went wrong.

DropTables · 18-08-2009 12:58PM

We can ignore the stacking for now. Pretend they are such a shape that two landing on same location will be side by side if you like.

Assume the grain is 1/30,000,000,000 of area of the initial single square, ie. 3M and they cover 10% of area.

We only scatter the grains once. Then look at the distribution via the different size grids. Maybe we should consider a grid of 30,000,000 sub squares, randomly filled in, then consider 100,000, 10,000 and 1,000 ever larger grid squares that exactly cover the 30,000,000 subsquares, of which exactly only 3,000,000 have a grain/populated subsquare and exactly only 500,000 of those are a red coloured (active) grain/subsquare.?

My Interest?
I have a suspicion that given
a total Resource R.
a total Population P
and an average contention K, where P only get full access to R if only P/K grains are active (red). That providing a fraction of R to a population P has a higher contention assuming there is R/(K x P) average contention when you subdivide R to ever smaller geographic chunks, because the "grains" are only 10% of the area and will "clump", they will not be all equidistant, that would not be random.

In other words of the Resource is divided too much, some bits are not consumed at all, and other bits have excessive contention, compared with a situation of a less divided resource.

It's not for an exam or a thesis. It's personal curiosity. I have seen counters, erlangs and Stochastic theory mentioned in the past.

I'll explain the practical application later.

gerry87: we don't care which grains are in which square, so does that make the numbers different?

Maybe someone can guess what my application is? I'll post what the specific levels of square size relate to tomorrow, or maybe later today depending on the posts.

maybe a distribution graph of Resource utilisation is a better idea for each scenario, where 3x as many vs average = 3 and empty resource, no usage = 0 and resource usage is average of total resource /total active population =1?

Fremen · 18-08-2009 03:11PM

Well, this is certainly related to Quasi-monte carlo methods. Regular monte-carlo works by scattering grains over a surface, then sampling points of some function wherever the grains fall, and averaging over the samples.

The trouble is, you get dense areas and sparse areas in your sample. There are specially constructed patterns which avoid dense or sparse patches, allowing a uniform sampling. The maths behind those constructions is quite sophisticated.

DropTables · 18-08-2009 05:16PM

Yes, yes, real life (TM) has denser and sparser areas. This is what I want to quantify.

as you average larger areas the population is more even.

As you look at smaller squares, the population is more uneven, thus if the resource is spread perfectly evenly, then the resource is under used or even unused in empty squares, and in denser squares there is not enough resource.

A Stocastic model I suppose can be solved iteratively using Monte Carlo techniques.

Of course a real population is not even randomly distributed. But if we do not have the whole population as consumers, but a proportion, randomly distributed, and provide the resource density according to Urban/Suburban/Rural population density, then we can start with the simpler model above to understand the loss of efficiency and increase in contention when we chop the resource into many more smaller pieces than the same total resource as fewer larger chunks covering a larger area.

(BTW this is not small local health centres versus central large centres of Excellence, though no doubt that is a related problem).

DropTables · 20-08-2009 03:10PM

I still can't see how to have a probability curve for population vs resource usage for each of the three scenarios.

More tomorrow.

DropTables · 20-08-2009 03:10PM

I still can't see how to have a probability curve for population vs resource usage for each of the three scenarios.

More tomorrow.

Fremen · 20-08-2009 07:06PM

If you're trying to do this yourself without any references, it seems to me you're starting out too ambitiously. Start with two squares, and see if you can generalise to three.

Randomness, clumping and Stochastic

Comments