Memory, the ways in which we remember and recall past experiences and data to reason about future events, is a term used frequently in current literature. All models in machine learning consist of a memory that is central to their usage. We … Continue reading A Statistical View of Deep Learning (III): Memory and Kernels
With the success of discriminative modelling using deep feedforward neural networks (or using an alternative statistical lens, recursive generalised linear models) in numerous industrial applications, there is an increased drive to produce similar outcomes with unsupervised learning. In this post, I'd like to explore the connections between denoising auto-encoders as a leading approach for unsupervised learning in deep learning, and density estimation in statistics. The statistical view I'll explore casts learning in denoising auto-encoders as that of inference in latent factor (density) models. Such a connection has a number of useful benefits and implications for our machine learning practice.
Deep learning and the use of deep neural networks [cite key="bishop1995neural"] are now established as a key tool for practical machine learning. Neural networks have an equivalence with many existing statistical and machine learning approaches and I would like to explore one of these views in this post. In particular, I'll look at the view of deep neural networks as recursive generalised linear models (RGLMs). Generalised linear models form one of the cornerstones of probabilistic modelling and are used in almost every field of experimental science, so this connection is an extremely useful one to have in mind. I'll focus here on what are called feedforward neural networks and leave a discussion of the statistical connections to recurrent networks to another post.
The NIPS 2014 Workshop on Advances in Variational Inference was abuzz with new methods and ideas for scalable approximate inference. The concluding event of the workshop was a lively debate with David Blei, Neil Lawrence, Zoubin Ghahramani, Shinichi Nakajima and Matthias Seeger on the history, trends and open questions in variational inference. One of the questions posed to our panel and audience was: 'what are your variational inference tricks-of-the-trade?'
My current best-practice at present includes: stochastic approximation, Monte Carlo estimation, amortised inference and powerful software tools. But this is a though-provoking question that has has motivated me think in some more detail through my current variational inference tricks-of-the-trade, which are:
Continue reading "Variational Inference: Tricks of the Trade"
I recently received some queries on our paper: S. Mohamed, K. Heller and Z. Ghahramani. Bayesian and L1 Approaches for Sparse Unsupervised Learning. International Conference on Machine Learning (ICML), June 2012 [cite key="mohamed2012sparse"]. The questions were very good and I thought it would be useful to post these for future reference. The paper looked at Bayesian and optimisation approaches for learning sparse models. For Bayesian models, we advocated the use of spike-and-slab sparse models and specified an adapted latent Gaussian model with an additional set of discrete latent variables to specify when a latent dimension is sparse or not. This … Continue reading Bayesian sparsity using spike-and-slab priors
Marr's three levels of analysis [cite key="marr1982"] promotes the idea that complex systems such as the brain, a computer or human behaviour should be understood at different levels. Marr's framework proved to be an elegant and popular way of reasoning about complex systems, and in the context of machine learning and statistics, remains an intuitive framework that is often used when describing probabilistic models of cognition and perceptual systems.
The marginal likelihood of a model is one the key quantities appearing throughout machine learning and statistics, since it provides an intuitive and natural objective function for model selection and parameter estimation. I recently read a new paper by Sumio Watanabe on A Widely applicable Bayesian information criterion (WBIC)[cite key="watanabe2012widely"] (and to appear in JMLR soon) that provides a new, theoretically grounded and easy to implement method of approximating the marginal likelihood, which I will briefly describe in this post. I'll summarise some of the important aspects of the marginal likelihood and then briefly describe the WBIC and some thoughts and questions on its use.
Hello blogging world! All the pieces in this blog will be ways for me to explore my current understanding of concepts and topics in Machine Learning. Putting thoughts into writing really does force us to think clearly about what we think and why, and I put this online in the hope that some of my explorations and thought experiments might be useful to others, and that I might learn other perspectives from those that read it and share their feedback. Plus, I do enjoy writing, so maintaining a blog seems to be a good way to keep writing regularly. Continue reading Hello World!