I had the opportunity to attend Martin Thompson’s Writing Concurrent Code with Lock Free Algorithms Course ( http://skillsmatter.com/course/java-jee/martin-thompsons-writing-concurrent-code-with-lock-free-algorithms ) at SkillsMatter last week. Here is a brief summary of my experience.

The course starts off with Martin talking about mechanical sympathy, and describing how the innards of modern processors work from a conceptual point of view. This includes various level of caches, buffers, memory controllers, processor architecture, cache lines and what not. This might sound strange considering the course is about writing algorithms, however the discussions proved invaluable -  both for understanding and for performance – for the rest of the course. Having studied microprocessor design in my undergraduate days, a lot of it seemed familiar. However, there was enough covered on optimisations and approaches in modern processors (think Sandybridge, and even Haswell) to give me completely new material. If you don’t have any previous knowledge of microprocessors, it will be even more useful.

The rest of the course alternated between some theory, and a lot of practical exercises. The language used was Java, and although I’m from a .NET background, I could relate to a lot of the examples and almost all of the theory (some constructs are easier to implement in C#, while others benefit from various Java approaches and libraries). In no time, we were doing inter-thread messaging in the range of millions of messages per second. With optimisations, this increased to over 250 million messages per second. We also managed to do inter process messaging on the same machine at a few million messages per second. These numbers may sound incredible, yet with some fairly simple approaches that take advantage of the power of modern processors (and avoiding those that hurt performance) they were achieved during the duration of the course.

For compute intensive problems, these approaches can be immensely powerful – instead of having to run hundreds of servers, you may be able to reduce you hardware requirements to a few (if not one) servers. For IO intensive operations, these will obviously not perform as well, as you will be bound by your IO. However, these approaches can virtually eliminate the cost of inter-thread and inter-process communication and ensure your system can run pretty much as fast as your IO will allow. They will let you make best use of what’s available to you, eliminate waste, and be able to have easier monitoring of what your thoughput is, when you might need to increase resources, and where.

Quite often, the problems solvable with these approaches are done through expensive message oriented middleware, which are often bloated and slow – so slow in fact that it is considered “good” exchanging a few hundred messages per second. That leads to so much waste that it isn’t even funny. Yet that seems to be the prevailing norm. So much of this waste can be eliminated, and result in high performance systems taking advantage of modern hardware. This course is a very good introduction to those approaches.

You are required to do quite a bit of programming in this course. This is not a purely theoretical course, and you will need a laptop with at least two processor cores (having Hyperthreading would be a bonus). While you can do all the exercises on Windows (which I was using), a few tools are talked about that can give you better performance reporting that work in Linux. Using a Mac is NOT recommended on this course as Mac OS doesn’t give you some option with regards to pinning tasks on certain processors.

If you deal with messaging systems, or even things that need to churn through a “backlog of tasks”, this course will empower you with tools and concepts to carry out your task in a highly efficient manner. It has definitely been of great benefit to me. I would like to thank Martin, Wendy and all the other people at SkillsMatter for an excellent three days.