DIY Data Science: The Alphabet

in Data

A few weeks ago, I decided to properly commit to learning some of the many open-source tools and skills and concepts that can help you get your hands dirty with data to dig out valuable insights, communicate them successfully and make them look pretty.

Since data science can be an overwhelming field and there’s an abundance of free online resources on offer – some amazing, some of questionable quality – I realised that I would need to structure this somehow. This was great news, because I love to structure things.

The result – my DIY Data Science alphabet – is far from perfect. Turns out, it’s pretty restrictive to only use each letter once, so you have to get a little creative. Plus, there are just so many things to choose from – unless you’re looking for something that starts with a Y or Z, since I will not dedicate any time to learning about zettabytes.

But apart from these challenges, I believe the alphabet will serve its purpose and keep my efforts focused. It’s a healthy mix of a) concepts that I want to take apart and simplify in my mind (like machine learning and algorithms) and b) stuff I already know pretty well but want to refresh and expand on (like Python’s Pandas library, basic stats concepts, and network science – here under G for graph analysis because, well, N is a popular letter). I’ve also included c) practical skills that I am keen to develop further or learn from scratch (like D3.js, dealing with databases and NLP) and d) machine learning concepts that I want to wrap my head around and learn how to use in practice (like decision trees and neural networks).

My alphabet also includes E for Excel, because I tend to avoid Excel whenever I can and use Pandas instead, which is not a very grown-up thing to do. It’s not because I think I’m too cool for Excel; I just never learned how to use it very well.

Why I’m doing this

I know that this is an exciting area that I can gain a lot from, both personally and professionally. Plus, I want to prove to myself that it’s possible to learn all these things even if you don’t have the right kind of technical background and a brain that is naturally more creative than it is analytical – if you’re determined, enjoy what you’re doing and know why you’re doing it.

How is this going to work?

Every week, I will cover one letter of the alphabet. I’m going to spend some time on my morning commute and in the evenings after work understanding how this works and what I need to install and prepare if I want to do this in practice. On weekends, I’m going to set apart a big chunk of time (that would otherwise be spent binge watching Youtube videos) to get to work and apply what I’ve learned. For W (web scraping), I want to scrape a fun data set off the web, for G (graph analysis), I will analyse a network and its properties, and so on.

I’m going to document my journey here, posting the results of my efforts and a little bit on how I got there and the resources that I found helpful on my way. To keep myself (and whoever might read this) entertained, I will try to pick current or intriguing data sets and topics that I am curious about, visualise as much as I can, and have some fun. I don’t want to run code just for the sake of it – I want to see the value in this, ideally every step of the way.

Making this a little easier

This is a pretty ambitious project for me. It might get tricky and frustrating at times, so I will try to make things as easy and pleasant as possible for myself.

Python is the programming language that I’m comfortable with, so I won’t try to learn new frameworks if the topic doesn’t explicitly requires it. For interactive visualisation with D3, I will need Javascript, for example. But most of the time, I will make use of the many Python packages and libraries that exist out there, building on what I know.

I’ve also included quite a few topics that I’m already comfortable with, but want to refresh and document somewhere, so those weeks should be a little easier.

And last but not least, I won’t work through this in alphabetical order, but just pick what I think makes sense to do next and whatever I feel excited about and have a project in mind for.

Let’s get this started

Luckily, I am genuinely expecting this to be a lot of fun, or I don’t think I would commit to doing what is essentially one small data project per week. I love learning new things, I love the satisfaction that you get from working out how to get things to work – and I’m excited to see where I can be a few months from now.


Leave a Reply

Your email address will not be published.