Advertisement
Private Profiles - an update on how they will be changing here
We've partnered up with Nixers.com to offer a space where you can talk directly to Peter from Nixers.com and get an exclusive Boards.ie discount code for a free job listing. If you are recruiting or know anyone else who is please check out the forum here.

How much theory does one really have to know

  • 05-02-2019 12:52pm
    #1
    Registered Users Posts: 722 ✭✭✭ illdoit2morrow


    I'm looking for some feedback from people working in the data analysis/science arena.

    I completed an MSc from Blanchardstown IT in 2017 in Data Analytics. Some of the course which focused on how the different algorithms worked was very maths based which I struggled with. I kind of understand what the algorithms were trying to achieve (linear regression, logistic regression etc), but I never got the maths element.

    Recently, to try and understand this a bit more, I have started the machine learning on Coursea by Andrew Ng. This started off great, but at the end of Week 2 and into Week 3, its gone very mathematical again, discussing how the algorithms work and I'm often left staring at the screen wondering what is Andrew on about.

    My question I suppose is, does one really need to know how the algorithms work, or is a good understanding of linear regression, clustering, classification etc and their associated algorithms good enough?


Comments

  • Registered Users Posts: 23,427 ✭✭✭✭ Sleepy


    The most used skill in Data work remains SQL ime.


  • Registered Users Posts: 1,304 ✭✭✭ FastFullBack


    My question I suppose is, does one really need to know how the algorithms work, or is a good understanding of linear regression, clustering, classification etc and their associated algorithms good enough?

    I find the best way to move and learn new technologies is to aim to apply them to real world examples that you understand. Once you can apply and understand the basics of the algorithms in real working examples it will make understand the more technical details of the algorithm alot easier.


  • Registered Users Posts: 1,131 ✭✭✭ Nelbert


    As with most things it depends!

    If you present to someone with stats knowledge or even an inquisitive nature and a head for maths you may lose credibility if you can’t explain sampling, distributions and for example how boosting and bagging work with algorithms like xgboost and random forest (let alone neural networks and the joys of explaining how they work to a non maths audience....)

    Working with sample datasets and testing (repeatedly) different parameters will get you a better understanding of how things work and their impact.

    Decision trees are great as when visualised they become fairly self evident to most business audiences but you don’t want good work ruined by a probing question you can’t answer.

    I managed to explain (broadly) how a support vector machine worked to someone who “hates maths” with two simple scatterplot type diagrams I scrawled on a whiteboard. I could see at least 3 other people in the room who looked relieved after the 30ish second explanation.

    Once you’ve built up the credibility some audiences will ask less questions because you know your stuff, others will ask more for the exact same reason (and they are curious as to the “how”).


  • Registered Users Posts: 534 ✭✭✭ rgmmg


    Nelbert wrote: »
    As with most things it depends!

    If you present to someone with stats knowledge or even an inquisitive nature and a head for maths you may lose credibility if you can’t explain sampling, distributions and for example how boosting and bagging work with algorithms like xgboost and random forest (let alone neural networks and the joys of explaining how they work to a non maths audience....)

    Working with sample datasets and testing (repeatedly) different parameters will get you a better understanding of how things work and their impact.

    Decision trees are great as when visualised they become fairly self evident to most business audiences but you don’t want good work ruined by a probing question you can’t answer.

    I managed to explain (broadly) how a support vector machine worked to someone who “hates maths” with two simple scatterplot type diagrams I scrawled on a whiteboard. I could see at least 3 other people in the room who looked relieved after the 30ish second explanation.

    Once you’ve built up the credibility some audiences will ask less questions because you know your stuff, others will ask more for the exact same reason (and they are curious as to the “how”).

    A simple "Sorry, I said at the outset I wasn't taking any comments or questions" might get round this :cool:


  • Registered Users Posts: 5,378 ✭✭✭ jmcc


    My question I suppose is, does one really need to know how the algorithms work, or is a good understanding of linear regression, clustering, classification etc and their associated algorithms good enough?
    All those are important but there is one practical area that is missing. It is often necessary to evaluate a Big Data problem in terms of computational complexity and whether the problem can be parallelised.

    While there is a solid theoretical basis (Computational complexity theory) for this, there is also a practical side that requires some understanding of the capability of the technology (harddrives vs SSDs, CPU, RAM etc) and software in coming up with a workable solution. Sometimes, the textbook approach is too theoretical when it comes to dealing with extremely large sets of data.

    This is where in core and out of core processing and memory handling comes into play. (External memory algorithms.) It might be quicker to process a lot of small chunks of data in RAM than reading, and swapping, large blocks of data to a harddrive. The read times on SSD may be up to ten times faster than a mechanical harddrive. A RAM based temporary drive would be even faster in some situations. It also helps to know the practical limits of the technology you are using for processing data. If you are using database software for handling data, learn about the indexing and the different types of indexing.

    With working on data, there are two constants. One can never know enough and one can never stop learning.

    Regards...jmcc


  • Advertisement
Advertisement