Assigning weights to observed data

El Siglo · 27-12-2011 06:14PM #1

Hi folks,

I have a geological core that I've done some mineralogical analysis on. I've taken 80 or so samples and from the mineral analysis, the minerals that are present are then grouped into clusters and I have six clusters present in the core. Now I was hoping to interpolate between sample points as each sample is about 8 cm apart. Is it possible to assign weights to each sample point based on the cluster present, so that I can then estimate the potential mineral cluster between points?

Fremen · 31-12-2011 12:42AM

You might be able to use Gaussian process regression (known as Kriging in geology).

In the simplest case, if you know the value of a function at points

y1 = f(x1)
y2 = f(X2)
.
.
.
yn = f(xn),

then once you make some assumptions about the function f (how smooth it is, for example), Gaussian process regression allows you to come up with a probability distribution for possible values of f(x), for unknown x.

This works for vector-valued x's and y's, too. I guess your x's would be the location in the core here. Your y could possibly be some sort of vector of mineral concentrations, but without knowing more about the task I wouldn't be able to say.

El Siglo · 31-12-2011 05:23PM

I know about kriging and I have been able to produce semivariograms of the data using weighted least squares regression. The problem is that for kriging to work you're data needs to have values; e.g. zinc is 10ppm for sample 1 and 15ppm for sample 2, 20ppm for sample 3 etc... and the samples are equally distributed etc... The problem is my data points are dimensionless, they just correspond to the mineral cluster but I don't have data on specific mineral concentrations (not possible with XRD, it's a qualitative or semi-quantitative technique you see).

Is there a way that I could assign a rank or something to each cluster and use that then as a means of interpolating between points? So take each mineral present and work out the percentage proportion of each mineral and assign these percentages for each sample. So for example, quartz is in every sample, feldspar is in 25% of samples, etc... Essentially in order to interpolate I need to try and give qualitative data quantitative values based on some kind of proportion abundance or something like that.

Fremen · 31-12-2011 06:35PM

Let me see if I can abstract away some of this geology stuff

So you've got a bunch of locations

{x1,x2,...,xn}, and a bunch of categories {c1,c2,...ck} corresponding to quartz, feldspar, etc...

You've got some observed data that consists of a subset of the categories found at each location. Say

x1 -> {c1,c2}

x2 -> {c1,c5}

x3 -> {c2,c4,c6}

etc...

Now given some other location x, you want to know what other categories you might find. Is that right?

I'm almost certain that problem will be well known in machine learning, but I'd have to think about how to approach it.

Fremen · 31-12-2011 06:37PM

Ah, something just clicked. This might be a good candidate for logistic regression.

Edit: I linked to the wikipedia article, but that doesn't really explain categorical logistic regression at all. I'll try to dig up a good reference. Maybe try Andrew Ng's "stanford engineering everywhere" machine learning lectures.

Edit #2: actually it's known as a multinomial logistic regression problem.

El Siglo · 03-01-2012 02:09PM

Fremen wrote: »

Ah, something just clicked. This might be a good candidate for logistic regression.

Edit: I linked to the wikipedia article, but that doesn't really explain categorical logistic regression at all. I'll try to dig up a good reference. Maybe try Andrew Ng's "stanford engineering everywhere" machine learning lectures.

Edit #2: actually it's known as a multinomial logistic regression problem.

I've looked up a few articles on that there now, it looks really good so it does and suits my kind of analysis. Essentially the data I have are non-parametric so this kind of technique is definitely worth following up.

Cheers for getting back to me so soon, I really appreciate it!

Fremen · 04-01-2012 12:07AM

If it's for college work, be sure to credit "some dude on the internet" in your bibliography.

El Siglo · 04-01-2012 05:19PM

Fremen wrote: »

If it's for college work, be sure to credit "some dude on the internet" in your bibliography.

I'm doing it for a PhD so I'm not entirely sure how I approach the subject with the supervisor!

Still, way better than a wikipedia article.

Assigning weights to observed data

Comments