Machine learning is a world filled with an assortment of truly wonderful tricks - appearing as mathematical simplifications, cute insights and clever approximations - that make modern data analysis, algorithms and computation easier and more useable. As machine learning becomes more widespread, these tricks will continue to find their way into all aspects of our work. These tricks come to use from all fields of statistical science, whether from statistical physics, probability theory, numerical analysis or random matrix theory, to list just a few, and points to the rich connections between machine learning and all other computational sciences.
This series is an attempt at collecting these tricks, describing them and their more formal names, pointing out why they were sought out, and how they are used in machine learning today. I'll update the links and summaries on this page as I go along.
A Smorgasbord of Tricks
[This first set of tricks was written between July-December 2015]
- Replica trick. A useful trick for the computation of log-normalising constants. With this trick we can provide theoretical insights into many of the models we find today, and predictions for the outcomes that we should see in experiments.
- Gaussian Integral trick. A instance of a class of variable augmentation strategies that allows us to introduce auxiliary variables that allows for easier inference. This particular trick allows quadratic functions of discrete variables to be represented using a continuous underlying representation for which Monte Carlo analysis is easier.
- Hutchinson's trick. Hutchinson's estimator allows us to compute a stochastic, unbiased estimator of the trace of a matrix. It forms one instance of a diverse set of randomised algorithms for matrix algebra that we can use to scale up our machine learning systems.
- Reparameterisation tricks. We can often reparameterise random variables in our problems using a mechanism by which they are generated. This is especially useful in deriving unbiased gradient estimators that are used for stochastic optimisation problems that appear throughout machine learning.
- Log derivative trick. An ability to flexibly manipulate probabilities is essential in machine learning. We can do this using the score function that then allows us to develop alternative gradient estimators for the stochastic optimisation problems that we encountered using reparameterisation methods.
- Tricks with sticks. The analogy of breaking a stick is a powerful tool that helps us to reason about how probability can be assigned to a set of discrete categories. Using this tool, we shall develop new sampling methods, loss functions for optimisation, and ways to specify highly-flexible models.
[This second set of tricks was written from January 2018 onwards]
- Density ratio trick. Ratios of probability densities are widespread in machine learning. Using this trick we can use the familiar tool of probabilistic classification to more easily estimate density ratios and use them for learning and comparison.
Other tricks of interest for future:
- Kernel tricks, identity trick, Gumbel-max trick, log-sum-exp trick.
- And a many others to come.
This is a wonderful series! Thanks for preparing those tutorials. I read (4) reparameterization trick and (5) log derivative trick, and find them to be extremely helpful and illuminating. I look forward to more tricks in your list: Gumbel-max trick and log-sum-exp trick, in particular.