Multi-Variate Analysis help!

sternn · 08-03-2010 05:37PM #1

I am very confused about this MVA course question, we understand how the linkages relate to the dissimilarity measures but don't know where to begin proving?

Question:

A dissimilarity measure d(x, y) for two data points x and y typically
satisfy the following three properties:

1. d(x, y) <= 0 and d(x, y) = 0 if and only if x = y
2. d(x, y) = d(y, x)
3. d(x, z)<= d(x, y) + d(y, z)

The following have also been proposed as methods for measuring the dissimilarity between two
sets of data points A = {xa1 , xa2 , . . . , xam} and B = {xb1 , xb2 , . . . , xbn}:
• Single Linkage: d(A, = minx2A,y2Bd(x, y)
• Complete Linkage: d(A, = maxx2A,y2Bd(x, y)
• Average Linkage: d(A, = 1|A||B| Px2APy2B d(x, y)

For each of the proposed linkage methods and dissimilarity properties, show that the linkage method satisfies that property or provide a counter example

(a visual/diagram representation of any counter example is sufficient if appropriate).

ray giraffe · 12-03-2010 02:55AM

I understand the question to be the following.

If the sets A and B are regarded as two 'points', and we use a definition of linkage (e.g. 'single linkage'), are each of the properties 1,2,3 satisfied?

E.G. Is it true that d(A,B) <= 0 ? Experiment if you are not sure.

equivariant · 12-03-2010 02:01PM

Your notation is confusing. The properties of a dissimilarity measure seem clear (it's like a metric, except that the values are negative). However, I don't understand your notation for the examples.

e.g. You say

• Single Linkage: d(A, = minx2A,y2Bd(x, y)

what do you mean by x2A? or by y2B? or what are x and y here? If I could understand this notation, maybe I could say something about how to prove/disprove that it satisfies the given properties.

sternn · 16-03-2010 11:32AM

Sorry, the question I posted was a bit of a mess.

• Single Linkage: d(A,

= min (x an element of A),(y an element of

d(x, y)
• Complete Linkage: d(A,

= max SIZE="1"](x an element of A)[/SIZE],(y an element of

d(x, y)
• Average Linkage: d(A,

= 1/(|A||B|) Sum ((x an element of A) Sum((y an element of

d(x, y)

equivariant · 18-03-2010 02:58PM

sternn wrote: »

Sorry, the question I posted was a bit of a mess.

• Single Linkage: d(A, = min (x an element of A),(y an element of d(x, y)
• Complete Linkage: d(A, = max SIZE="1"](x an element of A)[/SIZE],(y an element of d(x, y)
• Average Linkage: d(A, = 1/(|A||B|) Sum ((x an element of A) Sum((y an element of d(x, y)

OK, that makes more sense. Also, I suspect that in your original post, property 1. should be

1. d(x, y) >= 0 and d(x, y) = 0 if and only if x = y

and not 1. d(x, y) <= 0 and d(x, y) = 0 if and only if x = y as otherwise, nothing works.

Assuming that is true, then the properties you have listed are those that are characteristic of a "metric". If you google for "metric space" you will find lots of info about these properties. For example http://en.wikipedia.org/wiki/Metric_space

Back to your original question. In the case of the

single linkage: this does not satisfy property 1. For example, consider the sets A={1,2} and B={2,3}. Clearly A and B are different, but d(A,B) = 0.
It does have property 2 and it does not have property 3. Prop 2 is easy to see in this case. To see that it does have property 3, consider the following example. A = {1}. B={2,4}, C={5}. If you compute the quantities, d(A,C), d(A,B) and d(B,C), you will see that prop. 3 does not hold in this example.

You can analyse the other examples in a similar way.

Hopefully this helps

ray giraffe · 27-03-2010 05:56PM

equivariant wrote: »

OK, that makes more sense. Also, I suspect that in your original post, property 1. should be

1. d(x, y) >= 0 and d(x, y) = 0 if and only if x = y

and not 1. d(x, y) <= 0 and d(x, y) = 0 if and only if x = y as otherwise, nothing works.

You're right. The definition given by the OP implies that the space has at most one point, which is fairly pointless

Concretely: if we have 2 distinct points x, y then d(x,y)<0, by rule 1, then putting z=x in rule 3 gives d(x,x)<=d(x,y)+d(y,x) and so by rule 2 we have 0<= 2d(x,y) , contradiction.

Michael Collins · 27-03-2010 06:01PM

ray giraffe wrote: »

You're right. The definition given by the OP implies that the space has at most one point, which is fairly pointless

Nice.

Multi-Variate Analysis help!

Comments