Through the Eyes of Birds and Frogs: Writing and Surveys in Machine Learning Research

Savoured the opportunity to talk about writing surveys and reviews at the NeurIPS2020 Workshop on ML Retrospectives, Surveys & Meta-Analyses (ML-RSA) This is the text of the talk.
🎞Watch the video here and 🖥 slides here

Hello! My name is Shakir Mohamed, and I am humbled and at the same time, extremely excited to give this brief talk as part of the NeurIPS 2020 workshop on Retrospectives, Surveys & Meta-Analyses in Machine Learning. I’m also very grateful to all the organisers for the honour of being part of this year’s workshop programme. And thank you for giving me some of your watch time -- let’s take a short journey together for for next 25mins or so, where I hope we can look; through the eyes of birds and frogs, and explore some of the ideas and tools I have used in my own writing and surveys and work in machine learning research.

In his 2009 paper, Freeman Dyson wrote about mathematicians and scientists as inhabited by two types of spirit animals: the birds and the frogs.

Birds fly high in the air and survey broad vistas of mathematics out to the far horizon. They delight in concepts that unify our thinking.

Frogs live in the mud below and see only the flowers that grow nearby. They delight in the details of particular objects.

As we begin, I‘d like you to reflect on how you transform and transfigure between the bird and a frog in your own research.

One of the greatest writers of English, Toni Morrison left us with a profound piece of wisdom. That the aim of our writing, whatever we might be writing, is to familiarise the strange and to mystify the familiar. This expression captures what i consider to be the core of all writing, but especially the writing of surveys and reviews. To familiarise the strange and to mystify the familiar

For me, these first two ideas – of birds and frogs, and familiaring and mystifying – capture the essence of a survey or a review. They are a gentle guide to the art of transfiguration. The best surveys give us the gift of a broad holistic understanding of a field, but also an understanding of the details that are so important. These types of surveys take the concepts and tools that we take for granted and make us see them in a new way - making them strange to us; and they take the things we struggle to understand and that are truly strange to us and make them seem commonplace. These types of surveys give us the gift of wisdom.

We can recognise that all our work -- whether we are writing papers, blog, essays, reviews, theses, perspectives or books -- all these in some way contains an element of survey or review. For this reason, I think it is particularly instructive to master this art of transfiguration, because we write reviews all the time, from the smallest section of a paper all the way to an entire paper.

I don’t claim to myself have mastered this art of familiarising and mystifying, but because I have tried to write surveys, and because surveys appear in all our writing, I took the opportunity to reflect on my process of writing and surveys using some of my own papers.
Some surveys are meant to connect many different fields helping to bridge notation and intuition and applications. Other surveys are fundamentally interdisciplinary connecting previously unconnected technical fields to explore new opportunities for mutual exchange.
Other papers need to do this translational work in just a section, explaining a complex theory in the plainest language possible.
And other surveys are meant purely for us as a field of machine learning to savour and critique.

I’ll use these first two papers to illustrate some other considerations for survey writing.

This is a paper that was recently published in the journal of ML research - and with an amazing set of co-authors. It covers the problems of Monte Carlo gradient estimation in machine learning, one of the problems i consider to be amongst the most fundamental in the computational sciences.

The problem can be summarised by a single equation, which asks us to compute a seemingly benign gradient. This equation will turn out to be important for controlling traffic lights and logistics management, and in photo-realistic image generation and in state of the art control systems. As we dig deeper into this area, we found it hidden in plain sight and tucked away in interesting corners of statistical research, and used far more that we previously knew of.

Mathematically, the problem of sensitivity analysis –as this field is know by – asks us to compute the gradient of an expectation of a function with respect to distributional parameters phi. This problem has been studied for over 50 years in computational finance, and operations research, stochastic optimisation, and machine learning. This is a problem of mathematical statistics and there is a lot to read and to learn and to get easily confused by.

I wont say much more about this problem technically. Instead, I do want to take us through the structure of the paper, since this will expose two useful tools.

The review begins, like all papers, by introducing the general question of sensitivity analysis, like i just did.
To do more we need to know some basics about the tools of Monte Carlo estimation, so we do a brief review of Monte Carlo methods.
We don’t need to know the deep mathematical intricacies of the types of gradients the paper is concerned with, to be build and understanding of their behaviour. So, we do use a set of simple examples for illustrative purposes, and use that to pave the way to going deeper into the theory.
The next section covers three different types of estimators: the score-function, pathwise, and measure-valued gradient estimators.
- Each of these sections has the same structure: beginning with basic tools, then a derivation of the estimator, a discussion on bias and variance, other properties, and notes on computation.
With all these details we can then support the intuition and theory with experiments that better match how they are used in realistic applications.
Finally, we expand the universe of estimators and things that need to be considered, and summarise with a list of rules of thumb for choosing such estimators.

Keep this structure in mind as we study it, not for its technical content, but for its narrative structure.

We can actually study writing form as a subject, and one way we understand narrative is to think of writing structure as geometric forms. The geometric form you‘ll be familiar with, from creative writing in school and from many novels, is the arc. The paper structure I just described instead uses two of my favourite forms of narrative writing: the inverted pyramid, and the spiral.

William Zinsser has an amazing section on writing about science and technology in his book On Writing Well. His advice is to write using the idea on an inverted pyramid.

Start writing with one simple fact that readers must know.
Then add more by broadening this first point. The third adds to the second and broadens the reader’s set of facts and connections.
Our aim is to build knowledge in our readers. To move beyond fact into significance and speculation: to familiarise and mystify, leading our readers to understand how our exposition alters what is known, to shows what new avenues of research it might open, and where that research might be applied.
If we have done our job well, the very top of our inverted pyramid is a new state of wisdom.
We were guided by this type of progression in creating the structure of the MC gradients paper I described earlier.
As Zinsser writes, “There is no limit to how wide the pyramid can become, but your readers will understand the broad implications only if they start with one narrow fact.

The second geometric from is the spiral narrative. With the spiral, we create flow and insight by using a structure that we repeat throughout the text.

In section 3-5 of the MC gradients paper whose structure i described earlier, we used the same structure for every section:

We used the same headings, same proof questions, same types of properties, same presentation of conclusions.
This structure allows us to read these sections in parallel, (hopefully) allowing us to learn the tools they develop at the same pace, and to reveal a hidden and connected logic in the analysis.

I’d quickly like to use an example of a different paper. This a paper about the intersection of environmental science and machine learning. There is an interesting difficulty in writing across two fields that have very strong traditions of modelling, and language and testing, and that in many cases don’t align.

We structured this paper:

By setting out the economic needs for better weather predictions and the important computational questions in this area.
We then reviewed the physics of precipitation.
Followed by modelling based on simulating physical equations.
We then looked at modelling based on data and statistical representations.
We added a glossary here, since some phrases are used in completely opposite ways in these fields and to help us as writers also be clear with ourselves.

I’ll confess that this paper may not have turned out as clear as it could have. But the important lesson is not to let the search for unattainable perfection prevent us from learning as we are writing and sharing what we have learnt with others.

Other than these considerations, there are several other factors that are worthwhile to consider in or reviews.

Like so many things, the best reviews of course, thrive on collaboration. Other than the practical factors of being able to read more quickly and widely, the decision to write a survey creates one of those special projects that have a very clear mission and deliverable that gives clarity to your planning, an internal set of critics to have on hand, and the best mechanism for peer teaching.
We all feel the pressures of time, but giving ourselves the time to work through our thoughts and thinking is what often transforms the boring review that simply is a brain dump of recent references into one the review that familiarises and mystifies.
We do need to be very judicious in considering what is in the scope of our reviews and what is not. This is amongst the hardest things to decide, but this decision is why readers come to our reviews and surveys to begin with.
Finally, you’ll hear me often say and write about readers, as opposed to the reader. There are many types of readers of our papers, the person completely new to our field, and the person who is amongst the experts. We must aim to uplift our readers and to give something to all of them. And the focus on readers will help us remember that there are shared social processes involved in reading that we can use to decide amongst competing ways of explaining our work.

As an aside, i’d like to make a quick comment about blogging. Our field has a strong culture of blogging, and I myself maintain and love blogging.

I think blogging frees us to explore new tools and modes of communication that we wouldn’t use in the more formal setting of a paper - again allowing us to explore new ways of to familiarise and mystify.

To explain technical concepts and problem, i’ve experimented with using imagery and art as way of explaining ideas, used poetry to change the flow of language, experimented with writing I rather we.

As a final note, if part of your inheritance is to think and speak and write in another language, then please do consider doing that. There is no reason that machine learning research should only be communicated in English. Machine learning research in Zulu or Sepedi or Twi or Amharic or Arabic or German, or Gujarati or Maori is a resource that you are uniquely placed to make available to others like you - a way to familiarise and mystify in a global and plural way.

As we reach the end of our journey together, my enjoyment of reviews and in considering them as an artform, means I do have some favourites of my own. I have chosen a small set here for one reason, which is to explore the places where surveys get reviewed and published, which is one of our important outcomes as survey writers.

This paper by Helen Nissembaum appeared in the IEEE Computer magazine. It’s just three pages, but from this, you gain a wonderful insight into the field of Values and Technology, which is as relevant today as when it was first written almost 20 years ago.
Perhaps my favourite paper is this one called a Unifying review of Linear Gaussian Models by Roweis and Ghahramani. It appeared in the Machine Learning Journal. The impact of this paper is why, amongst so many fields working with such models, we in machine learning have a highly flexible and unique way of thinking about Gaussian probabilistic models.
This paper on spectral clustering appeared in statistics and computing, and still so important when thinking about unsupervised and representation learning.
Although, I have spent much of my research career thus far thinking about variational inference, this paper which is a review for statisticians that appeared in the the Journal of the American statistical association, gave me so many new insights and is a pleasure to read.
Finally, these two monographs in the Foundations and Trend Series are two that gave me the structure and insight I needed when studying these two areas of machine learning.
Other important venues to keep in mind are: our flagship journal of machine learning research, the popular ACM Computing Surveys, and importantly contributions to this excellent workshop.

Thank you for allowing me to share some of my thoughts and experience in writing reviews with you. I’d like to leave you with two final insights. This first one is from the great Anne Lamott and is called Shitty First Drafts. Her advice for you dear writer, is clear. In her own words: The only way I can get anything written at all is to write really, really shitty first drafts.

Finally, I’ll end with the words of William Zinsser from his chapter on writing about science and technology. Our aim as writers must be this: to

“…come across as people: men and women finding a common thread of humanity between themselves and their speciality and their readers."

Thank you again. I’d love to learn about the tool you use in thinking about your own writing. And I can’t wait to read the reviews and surveys you will go on to write next.

As a post-script - a quick summary of some resources i listed throughout this video:

Probably my most favourite book ever:Fowler’s dictionary of modern English usage (now in it’s 4th edition).
William Zinnser’s book, On Writing Well.
Jane Allison Meander, Spiral, Explode: Design an Pattern in Narrative.
Freemon Dyson’s great paper on Birds and Frogs.
And Anne Lamott’s riting guide, conveniently for this talks,called Bird by Bird.

2 thoughts on “Through the Eyes of Birds and Frogs: Writing and Surveys in Machine Learning Research”

Qinsheng says:

11 January 2021 at 3:28 pm

Hi Shakir, you may never understand how much I benefit from your talks and blogs, from statistics to writing. I am amazed by the high quality of your talk. I am wondering how do you prepare your talk. The blog here seems the transcripts of your talk. Do you polish the sentence words by words before giving talks?

Pingback: Useful for referring—12/12/2020 | Honglang Wang's Blog

The Spectator

Shakir's Machine Learning Blog

Through the Eyes of Birds and Frogs: Writing and Surveys in Machine Learning Research

Related

2 thoughts on “Through the Eyes of Birds and Frogs: Writing and Surveys in Machine Learning Research”

Leave a Reply Cancel reply

Share:

Related

2 thoughts on “Through the Eyes of Birds and Frogs: Writing and Surveys in Machine Learning Research”

Leave a Reply Cancel reply