Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Assigning weights to observed data

Options
  • 27-12-2011 6:14pm
    #1
    Registered Users Posts: 3,803 ✭✭✭


    Hi folks,

    I have a geological core that I've done some mineralogical analysis on. I've taken 80 or so samples and from the mineral analysis, the minerals that are present are then grouped into clusters and I have six clusters present in the core. Now I was hoping to interpolate between sample points as each sample is about 8 cm apart. Is it possible to assign weights to each sample point based on the cluster present, so that I can then estimate the potential mineral cluster between points?


Comments

  • Registered Users Posts: 2,481 ✭✭✭Fremen


    You might be able to use Gaussian process regression (known as Kriging in geology).

    In the simplest case, if you know the value of a function at points

    y1 = f(x1)
    y2 = f(X2)
    .
    .
    .
    yn = f(xn),

    then once you make some assumptions about the function f (how smooth it is, for example), Gaussian process regression allows you to come up with a probability distribution for possible values of f(x), for unknown x.

    This works for vector-valued x's and y's, too. I guess your x's would be the location in the core here. Your y could possibly be some sort of vector of mineral concentrations, but without knowing more about the task I wouldn't be able to say.


  • Registered Users Posts: 3,803 ✭✭✭El Siglo


    I know about kriging and I have been able to produce semivariograms of the data using weighted least squares regression. The problem is that for kriging to work you're data needs to have values; e.g. zinc is 10ppm for sample 1 and 15ppm for sample 2, 20ppm for sample 3 etc... and the samples are equally distributed etc... The problem is my data points are dimensionless, they just correspond to the mineral cluster but I don't have data on specific mineral concentrations (not possible with XRD, it's a qualitative or semi-quantitative technique you see).

    Is there a way that I could assign a rank or something to each cluster and use that then as a means of interpolating between points? So take each mineral present and work out the percentage proportion of each mineral and assign these percentages for each sample. So for example, quartz is in every sample, feldspar is in 25% of samples, etc... Essentially in order to interpolate I need to try and give qualitative data quantitative values based on some kind of proportion abundance or something like that.


  • Registered Users Posts: 2,481 ✭✭✭Fremen


    Let me see if I can abstract away some of this geology stuff :)

    So you've got a bunch of locations

    {x1,x2,...,xn}, and a bunch of categories {c1,c2,...ck} corresponding to quartz, feldspar, etc...

    You've got some observed data that consists of a subset of the categories found at each location. Say

    x1 -> {c1,c2}

    x2 -> {c1,c5}

    x3 -> {c2,c4,c6}

    etc...

    Now given some other location x, you want to know what other categories you might find. Is that right?

    I'm almost certain that problem will be well known in machine learning, but I'd have to think about how to approach it.


  • Registered Users Posts: 2,481 ✭✭✭Fremen


    Ah, something just clicked. This might be a good candidate for logistic regression.

    Edit: I linked to the wikipedia article, but that doesn't really explain categorical logistic regression at all. I'll try to dig up a good reference. Maybe try Andrew Ng's "stanford engineering everywhere" machine learning lectures.

    Edit #2: actually it's known as a multinomial logistic regression problem.


  • Registered Users Posts: 3,803 ✭✭✭El Siglo


    Fremen wrote: »
    Ah, something just clicked. This might be a good candidate for logistic regression.

    Edit: I linked to the wikipedia article, but that doesn't really explain categorical logistic regression at all. I'll try to dig up a good reference. Maybe try Andrew Ng's "stanford engineering everywhere" machine learning lectures.

    Edit #2: actually it's known as a multinomial logistic regression problem.

    I've looked up a few articles on that there now, it looks really good so it does and suits my kind of analysis. Essentially the data I have are non-parametric so this kind of technique is definitely worth following up.

    Cheers for getting back to me so soon, I really appreciate it!


  • Advertisement
  • Registered Users Posts: 2,481 ✭✭✭Fremen


    If it's for college work, be sure to credit "some dude on the internet" in your bibliography.


  • Registered Users Posts: 3,803 ✭✭✭El Siglo


    Fremen wrote: »
    If it's for college work, be sure to credit "some dude on the internet" in your bibliography.

    I'm doing it for a PhD so I'm not entirely sure how I approach the subject with the supervisor! ;) Still, way better than a wikipedia article.


Advertisement