How much theory does one really have to know

illdoit2morrow · 05-02-2019 1:52pm #1

I'm looking for some feedback from people working in the data analysis/science arena.

I completed an MSc from Blanchardstown IT in 2017 in Data Analytics. Some of the course which focused on how the different algorithms worked was very maths based which I struggled with. I kind of understand what the algorithms were trying to achieve (linear regression, logistic regression etc), but I never got the maths element.

Recently, to try and understand this a bit more, I have started the machine learning on Coursea by Andrew Ng. This started off great, but at the end of Week 2 and into Week 3, its gone very mathematical again, discussing how the algorithms work and I'm often left staring at the screen wondering what is Andrew on about.

My question I suppose is, does one really need to know how the algorithms work, or is a good understanding of linear regression, clustering, classification etc and their associated algorithms good enough?

Sleepy · 05-02-2019 1:57pm

The most used skill in Data work remains SQL ime.

FastFullBack · 18-05-2019 8:24am

illdoit2morrow wrote: »

My question I suppose is, does one really need to know how the algorithms work, or is a good understanding of linear regression, clustering, classification etc and their associated algorithms good enough?

I find the best way to move and learn new technologies is to aim to apply them to real world examples that you understand. Once you can apply and understand the basics of the algorithms in real working examples it will make understand the more technical details of the algorithm alot easier.

Nelbert · 24-05-2019 7:22pm

As with most things it depends!

If you present to someone with stats knowledge or even an inquisitive nature and a head for maths you may lose credibility if you can’t explain sampling, distributions and for example how boosting and bagging work with algorithms like xgboost and random forest (let alone neural networks and the joys of explaining how they work to a non maths audience....)

Working with sample datasets and testing (repeatedly) different parameters will get you a better understanding of how things work and their impact.

Decision trees are great as when visualised they become fairly self evident to most business audiences but you don’t want good work ruined by a probing question you can’t answer.

I managed to explain (broadly) how a support vector machine worked to someone who “hates maths” with two simple scatterplot type diagrams I scrawled on a whiteboard. I could see at least 3 other people in the room who looked relieved after the 30ish second explanation.

Once you’ve built up the credibility some audiences will ask less questions because you know your stuff, others will ask more for the exact same reason (and they are curious as to the “how”).

rgmmg · 20-06-2019 8:51am

Nelbert wrote: »

As with most things it depends!

If you present to someone with stats knowledge or even an inquisitive nature and a head for maths you may lose credibility if you can’t explain sampling, distributions and for example how boosting and bagging work with algorithms like xgboost and random forest (let alone neural networks and the joys of explaining how they work to a non maths audience....)

Working with sample datasets and testing (repeatedly) different parameters will get you a better understanding of how things work and their impact.

Decision trees are great as when visualised they become fairly self evident to most business audiences but you don’t want good work ruined by a probing question you can’t answer.

I managed to explain (broadly) how a support vector machine worked to someone who “hates maths” with two simple scatterplot type diagrams I scrawled on a whiteboard. I could see at least 3 other people in the room who looked relieved after the 30ish second explanation.

Once you’ve built up the credibility some audiences will ask less questions because you know your stuff, others will ask more for the exact same reason (and they are curious as to the “how”).

A simple "Sorry, I said at the outset I wasn't taking any comments or questions" might get round this :cool:

jmcc · 20-06-2019 10:28am

illdoit2morrow wrote: »

My question I suppose is, does one really need to know how the algorithms work, or is a good understanding of linear regression, clustering, classification etc and their associated algorithms good enough?

All those are important but there is one practical area that is missing. It is often necessary to evaluate a Big Data problem in terms of computational complexity and whether the problem can be parallelised.

While there is a solid theoretical basis (Computational complexity theory) for this, there is also a practical side that requires some understanding of the capability of the technology (harddrives vs SSDs, CPU, RAM etc) and software in coming up with a workable solution. Sometimes, the textbook approach is too theoretical when it comes to dealing with extremely large sets of data.

This is where in core and out of core processing and memory handling comes into play. (External memory algorithms.) It might be quicker to process a lot of small chunks of data in RAM than reading, and swapping, large blocks of data to a harddrive. The read times on SSD may be up to ten times faster than a mechanical harddrive. A RAM based temporary drive would be even faster in some situations. It also helps to know the practical limits of the technology you are using for processing data. If you are using database software for handling data, learn about the indexing and the different types of indexing.

With working on data, there are two constants. One can never know enough and one can never stop learning.

Regards...jmcc

How much theory does one really have to know

Comments