Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

R language - info for begineers

  • 14-08-2015 9:26am
    #1
    Moderators, Sports Moderators Posts: 25,523 Mod ✭✭✭✭CramCycle


    So I used to do very limited programming in my degree, I used Python quite a bit and wrote a program in it to predict protein motions from a certain file type. It was basic but done the job and I had help from someone who understood the physics angle so I knew what i was telling it to do. Before that I had very limited exposure to C and even further back as a kid I used BASIC on a very limited PC to get Q&A games running. While I might pick up Python again with a bit of work if needed to, I don't think I remember enough to sit down and do anything at all now.

    I want to get into using R as it is used alot in my current employment, while it is not directly used in my position, I do work alot with others who use it and I would like not only a deeper understanding but also to be able to read and possibly write R programs, or at least interpret what is written in front of me. It would certainly make discussions with some of my co workers easier.

    Does anyone have any recommendations for where to start. My work would mainly be bioinformatics based, should I jump in at that level and work backwards to understanding, or is it the type of language where understanding the very basics is essential or else it will be complete gobble de gook?

    Any recommendations for books or online resources would be most welcome.

    Thanks


Comments



  • RStudio is a free IDE that helps to tame the learning curve imo

    Also, free Udemy course here - 3 hours long, I've not taken a look at it yet but could well be a good place to dip your toes in. https://www.udemy.com/r-basics/?dtcode=Cxp7RmJ3s4c2
    Another here - https://www.udemy.com/machlearn1/?dtcode=tN3BXkQ3s4ne
    - https://www.udemy.com/machlearn3/?dtcode=kPO2Esz3s4pK

    How familiar with python are you? Would you be better off learning more python (scipy & numpy) through the anaconda / continum science python stack (Spyder being the IDE) and then hope to translate the syntax differences to R?


  • Registered Users, Registered Users 2 Posts: 962 ✭✭✭darjeeling


    I think coding is a good thing to learn for any biologist, and is essential for genomics work due to the increasingly large and abundant datasets that new technologies are generating.

    Coding separates data from analysis. Coding lets you trace every step in the process so you can see exactly what has been done and repeat the analysis if required. It also lets you reuse code in different projects. The result is faster, less error-prone data handling & analysis.

    There's a bit of a debate as to what to learn, or at least where to start: for bioinformatics, R is widely used, but python is becoming increasingly popular as an alternative.

    R

    The core language has powerful functions for data manipulation, summarising, statistical analysis and plotting. These are enhanced by some key modules that are very widely used - eg ggplot2 for plotting, plyr & reshape2 for data manipulation and summarising. If you're learning R for bioinformatics, I think I'd still start with learning the core language: familiarising yourself with the data structures and applying functions to data to get answers of interest.

    The bioinformatics modules (mostly under the Bioconductor umbrella) are a set of individual modules contributed by academic labs that are tailored for particular areas - eg transcriptomics, next gen sequencing, systems biology, GWAS & whole genome selection & more. They often use complex custom data structures and functions that are less intuitive to work with and interrogate than the core language, so there can be a bit of a 'black box' feel. Because the modules are contributed by different labs, there's variability in the way they work and in the accompanying explanatory documentation.

    I'd second the suggestion to use RStudio. It's a very helpful environment for keeping track of code, data objects & plots, and getting help on various functions. I don't have any recommendations for tutorials or books because I've picked up the code over a while and so I don't know where is best to begin at the moment. For getting quick solutions to problems, I find stackexchange.com usually gets me an answer (usually via Google searching).

    Python

    I've not used python much at all, though it's on my list to learn. From reading blog posts I understand that in recent years, development & extension of python using modules such as the data analysis module pandas and the plotting module matplotlib has meant it can now be used for many (all?) of the applications previously seen as being the province of R.


    Linux shell scripting

    In addition to learning R/Python/both, I think it's useful to be able to work in the linux command line environment. This allows you to build pipelines incorporating command-line linux tools for e.g. NGS analysis, custom python / perl / R scripts, and basic file manipulation operations using linux commands. If you use linux-compatible applications, R / python / perl code and linux commands, your code will run on linux clusters, cloud servers and can also be run on Windows machines using linux virtual machine software or - in many cases - Cygwin.


  • Registered Users, Registered Users 2 Posts: 172 ✭✭aidanathome


    https://www.coursera.org/course/rprog

    A new course starts every month with Coursera and it's only 4 weeks long, so a good starting point.


  • Registered Users, Registered Users 2 Posts: 1,757 ✭✭✭Deliverance XXV


    Below is a nice little online R tutorial (allows you to insert R code on the fly with instant feedback as you go along) that gives you a good insight into the very basics of R. It will literally have you building graphs within the first half hour. If you like what you see, you can then download RStudio and follow a more intensive guide into the language.

    http://tryr.codeschool.com/levels/1/challenges/1

    R is something that I will definitely come back to in the near future.


  • Moderators, Sports Moderators Posts: 25,523 Mod ✭✭✭✭CramCycle


    Thanks everyone,
    How familiar with python are you? Would you be better off learning more python (scipy & numpy) through the anaconda / continum science python stack (Spyder being the IDE) and then hope to translate the syntax differences to R?
    its been a few years, I wrote a program for a project, would have been comfortable with the syntax/language, even came up with a few basic tricks for tidying up code for others in the research group. I imagine i will be able to pick it up over a few weeks again
    darjeeling wrote: »
    I think coding is a good thing to learn for any biologist, and is essential for genomics work due to the increasingly large and abundant datasets that new technologies are generating.
    Completely agree, I trust the people doing the large dataset work, my own datasets are small enough to be manipulated manually but it would be better in my opinion if I could read what others are doing rather than just understanding, so that I fully comprehend and am comfortable that we are coming from the same viewpoint.
    Coding separates data from analysis. Coding lets you trace every step in the process so you can see exactly what has been done and repeat the analysis if required. It also lets you reuse code in different projects. The result is faster, less error-prone data handling & analysis.
    My feeling exactly, one of our bioinformatics guys sees things so much quicker than I do, we nearly always come to the same results and conclusions but his is quicker and gives far more in depth views pulling in things that I just can't see with my own techniques.
    There's a bit of a debate as to what to learn, or at least where to start: for bioinformatics, R is widely used, but python is becoming increasingly popular as an alternative.
    In college it was Python, but in the area I am in now, R is used in industry and in academia, and alot of nifty freeware tools ready for adaptation or integration are out there for use.

    I will probably start up with Python again out of interest but R would be my immediate focus.
    Linux shell scripting

    In addition to learning R/Python/both, I think it's useful to be able to work in the linux command line environment.
    This is what I worked in when I was in college but, again, not for awhile.

    Thanks to everyone else for the tutorials, will start having a dig around the basics this week and see how it feels, and if I still have the head for it.


  • Advertisement
  • Moderators, Society & Culture Moderators Posts: 9,768 Mod ✭✭✭✭Manach


    As other above suggestions: as well, I've signed up for http://www.r-bloggers.com/ digest , helpful for trending topics.


  • Registered Users, Registered Users 2 Posts: 19 PIKOMAN


    Hi,

    Anyone know where I could get some Grinds in R, I started a course on data analytics and I need some help to get my head around the R coding part of the course. Any advice greatly appreciated

    Thanks


Advertisement