Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

R: Compositional Data Analysis

Options
  • 19-02-2010 11:12am
    #1
    Registered Users Posts: 1,205 ✭✭✭


    Hi, if anyone can help with this I'd be most appreciative.

    I'm currently trying to analyse different components of tree biomass (root, stem, branch etc.). I'm trying to use the R Statistics package, in particular the StatConn Excel add-on, in order to carry out Compositional Data Analysis of all the components.

    I'm having trouble not only with the installation of said package but also with data entry into R. Have rtfm but am not the greatest with programming so any help/pointers/suggestions would be greatly appreciated.


Comments

  • Registered Users Posts: 8,452 ✭✭✭Time Magazine


    Firstly, obviously, you'll need to install the package correctly. This should be done easily enough.

    I don't know what StatConn does but if you have an Excel file, it's very easy to get that open in R.

    R isn't easy to master. It's a steep learning curve so unless you can't do a Compositional Data Analysis (I'm sorry, I don't know what this is either) in another package then I wouldn't recommend it unless you're very enthusiastic. What is Compositional Data Analysis - are you looking to do anything much more complicated than means and the like? If not, Excel is probably best?

    Anyway if you have an Excel file and you want to open it in R, here are the steps:
    1. Allow the first row of the Excel file to be the variable names, so it would look something like:
    tree_id | height | mass | age | root_mass | stem_mass
    001 | 100 | 150 | 8 | 2 | 8
    002 | 120 | 200 | 12 | 4 | 12
    ... | | | | |
    150 | 80 | 100 | 7 | 1 | 1

    [*] Save it as a Text (tab-delimited) file and call it "myfile.txt"
    [*] Open up R
    [*] Click File > Working Directory and set it to the folder you've saved "myfile.txt"
    [*] Enter the command mydata = read.delim("myfile.txt",header=TRUE) --- this will import your excel file as a matrix (named "mydata") that you can play with
    [*] Take out the specific variables from the matrix. For example the command tree_age = mydata$age will create a column vector, named "tree_age", that's taken from the column in your matrix with the title "age", i.e. the 4th column from the Excel sheet above
    [*] Better yet, when you've figured out all the commands you want, paste them into notepad and save them in a file called "hello.R"
    [*] You can then tell R to run all these commands in one go with the command source("hello.R")
    [*] Add as many commands to "hello.R" as you wish and let R do the sums. For example you might have mean_age = mean(tree_age) which will create a variable with the mean age of all the trees.


    Hope this helps.


  • Registered Users Posts: 3,483 ✭✭✭Ostrom


    Firstly, obviously, you'll need to install the package correctly. This should be done easily enough.

    I don't know what StatConn does but if you have an Excel file, it's very easy to get that open in R.

    R isn't easy to master. It's a steep learning curve so unless you can't do a Compositional Data Analysis (I'm sorry, I don't know what this is either) in another package then I wouldn't recommend it unless you're very enthusiastic. What is Compositional Data Analysis - are you looking to do anything much more complicated than means and the like? If not, Excel is probably best?

    Anyway if you have an Excel file and you want to open it in R, here are the steps:
    1. Allow the first row of the Excel file to be the variable names, so it would look something like:
    tree_id | height | mass | age | root_mass | stem_mass
    001 | 100 | 150 | 8 | 2 | 8
    002 | 120 | 200 | 12 | 4 | 12
    ... | | | | |
    150 | 80 | 100 | 7 | 1 | 1

    [*] Save it as a Text (tab-delimited) file and call it "myfile.txt"
    [*] Open up R
    [*] Click File > Working Directory and set it to the folder you've saved "myfile.txt"
    [*] Enter the command mydata = read.delim("myfile.txt",header=TRUE) --- this will import your excel file as a matrix (named "mydata") that you can play with
    [*] Take out the specific variables from the matrix. For example the command tree_age = mydata$age will create a column vector, named "tree_age", that's taken from the column in your matrix with the title "age", i.e. the 4th column from the Excel sheet above
    [*] Better yet, when you've figured out all the commands you want, paste them into notepad and save them in a file called "hello.R"
    [*] You can then tell R to run all these commands in one go with the command source("hello.R")
    [*] Add as many commands to "hello.R" as you wish and let R do the sums. For example you might have mean_age = mean(tree_age) which will create a variable with the mean age of all the trees.


    Hope this helps.

    Sorry to butt in - does this procedure work for stata also?


  • Registered Users Posts: 1,205 ✭✭✭Yi Harr


    Thanks for that, there's definitely a steep learning curve with R but I'm finding it slightly easier every day. Found some decent tutorial videos on youtube.


  • Registered Users Posts: 8,452 ✭✭✭Time Magazine


    efla wrote: »
    Sorry to butt in - does this procedure work for stata also?

    It's a good bit easier in Stata, mostly because if the first row is dedicated to variable names (as above), then Stata should automatically recognise the columns as distinct variables.

    If you have a tab-separated Excel file called "myfile.txt", then the command insheet using myfile.txt will have you flying in Stata. To get summary statistics of all the variables, all you have to do is enter the command summ *
    Yi Harr wrote: »
    Thanks for that, there's definitely a steep learning curve with R but I'm finding it slightly easier every day. Found some decent tutorial videos on youtube.

    You're welcome :). Good luck with R, it's an incredible tool/skill if you can master it.


  • Registered Users Posts: 2 ClaireIoana


    Hi, we are doing a project using R. We were given two spreadsheets containing horse racing results and we were asked to predict the future outcomes using the spreadsheets. Any ideas or help would be greatly appreciated.
    Thanks!


  • Advertisement
  • Registered Users Posts: 3,803 ✭✭✭El Siglo


    Use the packages 'compositions' or 'robCompositions' in R, they'll sort you out and there's loads published using both packages (particularly in earth sciences). I've been working on compositional data analysis now for the last few years, what sort of stuff are you looking to do, i.e., in terms of transformation (e.g., centred log-ratio, additive log-ratio, or isometric log-ratio)?


Advertisement