Creating slide decks in Jupyter notebook

My job involves organising and analysing data, and training and evaluating machine learning models. In addition, the customers like to see demonstrations or progress updates for their models, so every now and then I also have to put together a slide deck. For better or worse, I use Jupyter notebooks as the main tool for the data management part, so I thought it would be convenient to prepare my slides in the same environment. Jupyter has built in support for generating reveal.js presentations, so I decided to give that a go. This post first goes through the steps I took to make the slides, in case anyone was looking for a cookbook-style instructions. At the end, there are a few remarks regarding my experience with this way of creating slide decks.

Guide

Slide metadata

A reveal.js slide deck can be generated from a regular Jupyter notebook, with static content written in Markdown cells and dynamic generated from Python cells. The slideshow metadata can be controlled from a cell toolbar that is enabled through the menu: View > Cell Toolbar > Slideshow. It amounts to a single property, Slide Type. When set to Slide, it indicates that the cell will be a start of a new slide in the deck. Fragment and - (hyphen, or empty) slide type indicate that the cell forms a part of the slide that started earlier. Fragment will cause the content of the cell to be initially hidden, and be revealed on pressing “next” during the presentation. The - type attaches the content to the current slide without requiring additional key presses. It is useful for combining static and dynamic content in a section that should be shown together. Skip cells do not feature in the deck at all, this type is useful for cells that contain blocks of code that does not generate any output that we want included in the deck. Notes cells are for speaker notes; not shown in the main deck, but displayed in the presenter view offered by reveal.js. There is also Sub-Slide type, which I have not used, but which I assume provides the vertical branches in reveal.js.

Slide deck export

Plain Jupyter cannot display the slide decks in their final format. There are extensions that provide that functionality, but I am yet to try them out, so in this tutorial we will stick to what is available out of the box, which is HTML export. The HTML file will require a reveal.js library to be present, so we will need to download it. Assume we have the following directory structure in our project:

/project-root
    /demos
       /1
       /lib
    /notebooks

We will store the notebook in notebooks directory, and will want to keep the files for the current slide deck in demos/1 directory. Let us first make sure reveal.js is available: download the current release from the GitHub releases page and unzip it to demos/lib. As a result we should end up with the library in a directory such as demos/lib/reveal.js-3.7.0. With that in place, we can now export the notebook to a slide deck using jupyter nbconvert. I do it by placing the following cell (Slide Type=Skip) at the top of the notebook:

!jupyter nbconvert demo1-slides.ipynb \
    --to slides \
    --output-dir ../demos/1 \
    --reveal-prefix=../lib/reveal.js-3.7.0

The output-dir and reveal-prefix arguments make the export aware of our directory structure. Once run, you should end up with the file demos/1/demo1-slides.slides.html, which can be opened in a browser and should immediately work as a reveal.js presentation.

Styling

The Jupyter reveal.js export is by default set up to export both the code and output of Python cells. The audience of my demos is non-technical, and they do not care about the code. nbconvert provides configuration options to exclude the code, as well as the out prompts. We can modify the export cell to include them:

!jupyter nbconvert demo1-slides.ipynb \
    --to slides \
    --output-dir ../demos/1 \
    --TemplateExporter.exclude_input=True \
    --TemplateExporter.exclude_output_prompt=True \
    --reveal-prefix=../lib/reveal.js-3.7.0

Another feature of the export that is non-ideal is that the output text from prints is shown as preformatted, in HTML <pre> tag. This can be addressed by importing display and HTML functions (from IPython.core.display import display, HTML) and then using display(HTML("...")) in place of print. The static content appears more indented than the output of code cells, which is due to an empty prompt area being shown to the left of the static content. To eliminate this, include the following in the first markdown Slide cell:

<style>
.prompt { min-width: 0ex; }
div.prompt { width: 0ex; }
</style>

In general, all styling can be configured comprehensively in a theme stylesheet, and I will probably look into that if I have to make more slide decks in Jupyter, but for the time being I found it expedient to modify the styles directly in the notebook. Other CSS modifications I found useful were elimination of table borders, which makes Pandas dataframes render nicely:

.rendered_html table, .rendered_html tr, .rendered_html td, .reveal table th, .reveal table td {
    border: 0px;
}
.reveal thead {
    border-bottom: 2px solid #8c8c8c;
}

Plots

I used matplotlib for plotting, in the same way I would in a regular notebook. I prefer to output SVG for nice scaling, and also wanted the plots to take the entire slide, so I did the following:

import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_formats = ['svg']
plt.figure(figsize=(12,6))

PDF export

It is a good practice to share the slides with the audience after the presentation, ideally in PDF format, which is easy to open on most computers. Reveal.js’s support for PDF export relies on a custom stylesheet enabled through a query parameter, and the browser’s print to PDF functionality. For the custom stylesheet to work, the presentation has to be delivered by a web server, not just opened from a file on disk. Fortunately, it is easy to start up an ad-hoc web server using Python. In demos directory run:

python3 -m http.server

You should now be able to open http://0.0.0.0:8000/1/demo1-slides.slides.html?print-pdf in a web browser (annoyingly, reveal.js only fully supports Chrome/Chromium for PDF export) and “print” the page to PDF.

If you use fragments to gradually reveal portions of the slide, each fragment will result in a separate slide in the PDF version of the deck. This is not something I wanted, and the behaviour can be configured through a reveal.js pdfSeparateFragments option, but jupyter nbexport does not provide convenient access to this option, so I ended up hacking the setting into the HTML file directly by adding the following in the export cell, directly after the jupyter nbexport call:

!sed -i 's/Reveal\.initialize({/Reveal.initialize({pdfSeparateFragments: false,/' ../demos/1/demo1-slides.slides.html

Observations

The great thing about using Jupyter for creating slide decks is that I could easily access all the data in the environment, in the way I am used to. For example, I wanted to make a plot of how the progress subject matter experts made on labelling the data over time. I could use pyspark SQL to query the relevant data and then matplotlib to output the plot. In theory tables are also easy to render with the use of Pandas, but I found that less useful in practice: in most cases I wanted to highlight particular cells to draw attention to certain aspects or better explain the data, which meant that I had to include the table in the static content and style it by hand. I only ended up with one case where I could use the dynamic output directly; in all others I ran the calculations in Skip cells to then used those results to hand-design a slide. It is still nice to have the calculations available inline, but copying the data and formatting it separately takes away one of the most compelling arguments for Jupyter as slide deck construction tool.

And there are disadvantages. For one thing, Jupyter does not support collaborative editing of a notebook, so there is no straightforward way for more than one person to work on the slide deck – if you do, you risk overwriting each other’s changes. Potentially using separate instances of Jupyter and merging the changes to the notebook file using git might work, but I suspect merges of IPython notebook JSON files can be problematic. Secondly, previewing the result requires going through the export and, in case of working with remote Jupyter, downloading of the output HTML file, which is cumbersome and annoying. Previously I used RStudio with RMarkdown, where just pressing cltr+K rendered the current version of the slides in the preview window. Move to Jupyter feels like a step back. Finally, the default settings, with code and In/Out prompts being included, suggest the slideshow functionality was designed as a way to present notebooks in technical talks rather than design bespoke slide decks that include data-driven elements.

In summary, in my slideshowing use-case, Jupyter does not provide much advantage over RStudio. I get direct access to the data, but then I have to go through an export cycle to look at the slides. Conversely, with RStudio I have to first export the data in order to use it in the slides, but then viewing the rendered slides is easy – and I tend to do the latter much more frequently than the former. Plus, I get to use ggplot2 – probably the best plotting library out there!

31/03/2019