A Statistical View of Deep Learning (V): Generalisation and Regularisation 1

A Statistical View of Deep Learning (V): Generalisation and Regularisation
We now routinely build complex, highly-parameterised models in an effort to address the complexities of modern data sets. We design our models so that they have enough 'capacity', and this is now second nature to us using the layer-wise design principles of deep learning. But some problems continue to affect us, those that we encountered even in the low-data ...

A Statistical View of Deep Learning (IV): Recurrent Nets and Dynamical Systems 3

A Statistical View of Deep Learning (IV): Recurrent Nets and Dynamical Systems
Recurrent neural networks (RNNs) are now established as one of the key tools in the machine learning toolbox for handling large-scale sequence data. The ability to specify highly powerful models, advances in stochastic gradient descent, the availability of large volumes of data, and large-scale computing infrastructure, now allows us to apply RNNs in the most creative ...

A Statistical View of Deep Learning (III): Memory and Kernels 4

A Statistical View of Deep Learning (III): Memory and Kernels
Memory, the ways in which we remember and recall past experiences and data to reason about future events, is a term used frequently in current literature. All models in machine learning consist of a memory that is central to their usage. We have two principal types of memory mechanisms, most often addressed under the types of models ...

A Statistical View of Deep Learning (II): Auto-encoders and Free Energy 6

A Statistical View of Deep Learning (II): Auto-encoders and Free Energy
With the success of discriminative modelling using deep feedforward neural networks (or using an alternative statistical lens, recursive generalised linear models) in numerous industrial applications, there is an increased drive to produce similar outcomes with unsupervised learning. In this post, I'd like to explore the connections between denoising auto-encoders as a leading approach for unsupervised learning in deep learning, and density estimation ...

A Statistical View of Deep Learning (I): Recursive GLMs 12

A Statistical View of Deep Learning (I): Recursive GLMs
Deep learning and the use of deep neural networks are now established as a key tool for practical machine learning. Neural networks have an equivalence with many existing statistical and machine learning approaches and I would like to explore one of these views in this post. In particular, I'll look at the view of deep ...

Variational Inference: Tricks of the Trade 5

Variational Inference: Tricks of the Trade
The NIPS 2014 Workshop on Advances in Variational Inference was abuzz with new methods and ideas for scalable approximate inference. The concluding event of the workshop was a lively debate with David Blei, Neil Lawrence, Zoubin Ghahramani, Shinichi Nakajima and Matthias Seeger on the history, trends and open questions in variational inference. One of the questions posed to our panel ...

Bayesian sparsity using spike-and-slab priors

I recently received some queries on our paper: S. Mohamed, K. Heller and Z. Ghahramani. Bayesian and L1 Approaches for Sparse Unsupervised Learning. International Conference on Machine Learning (ICML), June 2012 . The questions were very good and I thought it would be useful to post these for future reference. The paper looked at Bayesian ...

Marr's Levels of Analysis 4

Marr's Levels of Analysis
Marr's three levels of analysis promotes the idea that complex systems such as the brain, a computer or human behaviour should be understood at different levels. Marr's framework proved to be an elegant and popular way of reasoning about complex systems, and in the context of machine learning and statistics,  remains an intuitive framework that ...

On Marginal Likelihoods and Widely Applicable BIC 1

The marginal likelihood of a model is one the key quantities appearing throughout machine learning and statistics, since it provides an intuitive and natural objective function for model selection and parameter estimation. I recently read a new paper by Sumio Watanabe on A Widely applicable Bayesian information criterion (WBIC) (and to appear in JMLR soon) ...