We made a big deal yesterday about GPUs, so I'm going to take a moment to talk about why.

You might vaguely remember from the days before we all switched to MacBooks and stopped paying attention to the technical specs of a computer that a GPU is a graphical processing unit. For most of its existence, the role of the GPU was confined to, well, computer graphics.

In 2001, Nvidia introduced the first programmable GPU, allowing GPUs to be used to do simple things like addition and multiplication. This might not sound like much, but addition and multiplication play surprisingly large roles in computer graphics, as well as - and this is where it gets interesting - tasks like encryption, data mining, computer vision, machine learning, and so on.

It turns out the type of work a GPU is good at even has its own cute name. An embarrassingly parallel task is one that can be easily broken down into a bunch of smaller, independent tasks.

We normally think of a CPU as the brain of a computer. This is because a CPU can be programmed to handle all kinds of complex work, but the tradeoff of that complexity is that most CPUs only have 2 or 4 cores, so no more than 4 calculations can be run at once.

The Nvidia K80 GPU has 4,992 cores, and a single P2 instance (like the one I created yesterday) can provide up to 16 of them. This is the equivalent of running 79,872 calculations at once, and the reason why having a computer with a modern GPU is so important when it comes to deep learning.

By the way, this bit of information about general purpose GPUs is one of my favorite things I've learned all year. Why? Because it is a wonderful reminder that the limitations and expectations we place on objects are mostly artificial and usually arbitrary. The GPU was never built for this.

But the GPU doesn't care that it wasn't built for this - it does it anyway.

Okay, back to the course.

I watched the 30-minute course overview while rushing from work to dance today. If you're at all interested in deep learning, or technical education in general, watch this video. It explains what's different about this particular course and, more importantly, the creators' approach to technical teaching and learning, which can be summarized as being:

  • Code-centric vs math-centric

    Ask HackerNews, Quora, or Reddit about the best way to pick up deep learning (or any data science-related skill), and you'll end up with an inbox full of people recommending several years worth of graduate-level math. The course creators compare this approach to studying the entirety of music notation and theory before ever being allowed to sing a song or touch an instrument, and I have to agree.

    Going code-first means that you can start hacking away immediately. Even if you don't completely understand what your program is doing under the surface, that's okay! You'll get there, and it's better to learn by doing and making mistakes along the way than by reading the first half of a textbook on discrete mathematics and giving up.

  • Big picture vs elemental

    I took a data science course at General Assembly last year, which taught me a lot about various algorithms and their use cases, but very little about how to apply a data science solution to a real-world problem. Remember, yesterday was the first time I'd ever used a cloud computing service, and it felt good to mimic the steps a real data scientist might take to set up a production environment - even if it was just to do the Jupyter Notebook equivalent of a hello world.

  • Motivated by state-of-the-art results vs good enough results

  • This one is pretty self-explanatory - but the whole point of deep learning is to deliver state-of-the-art results, and the course promises to enable you to do just that in its first lesson.

Does that sound good to you? It sounds pretty good to me!