I’m freshly back from JupyterCon in NY and still feeling the bubbly optimism that comes with bringing all you’ve learned at a conference back to your office. In that spirit, I wanted to share some of the coolest and most interesting things I learned with you all.
One quick note before we dive in: I was able to attend JupyterCon because of a very generous scholarship awarded jointly by JupyterCon and Capital One. I would not have been able to attend otherwise and I’m very grateful to these two groups for their commitment to diversity and inclusion in the tech community, so a big thank you to both groups.
In no particular order, here are some of the most interesting things I learned at JupyterCon:
Jupyter Notebooks, Generally
- You can add a table of contents to a notebook(!) using nbextensions. (h/t Catherine Ordun)
- You can parameterize notebooks, create notebook templates for analysis, and schedule them to run automatically with Papermill. (h/t Matthew Seal)
- There are a few cons to teaching and learning with Jupyter notebooks that are worth knowing and acknowledging. Joel Grus’s ‘I Don’t Like Notebooks.’ was a cautionary tale on the use of Jupyter notebooks for teaching, and while I don’t agree with all of his points, I do think it’s worth the time to go through his deck.
Notebooks in Production (!)
- Netflix is going all-in on notebooks in production by migrating over 10k workflows to notebooks and using them as a way to bridge the chasm between technical and non-technical users. (h/t Michelle Ufford)
- “Notebooks are, in essence, managed JSON documents with a simple interface to execute code within”. Netflix is putting notebooks into production by combining the JSON properties of notebooks with open-source library Papermill. (h/t Matthew Seal)
- On debugging: by running a notebook on a notebook server against the same image, you can fix issues without needing to mock the execution environment or code, allowing you to debug locally. (h/t Matthew Seal)
- On testing: templated notebooks are easy to test with Papermill — just run tests with a scheduler using parameters like what a user would be inputting to hydrate and run the notebook (and look for errors). (h/t Matthew Seal)
Data Science in Jupyter Notebooks
- One of my favorite new-to-me ideas is to build your own Kaggle-style board to make iterating and judging performance of internal models more fun (and provide incentive to track them better!). (h/t Catherine Ordun)
- In graph/network analysis, you can connect nodes using multiple edges (characteristics) using a multigraph. (h/t Noemi Derzsy’s great tutorial on graph/network analysis, which I learned a lot from)
- There is a ton of research out there around visualization, including on human perception, that can and should be leveraged for creating impactful data visualizations. I highly recommend Bruno Gonçalves’s slide deck as tour de force of what we know about perception and how to apply it to data.
- In a very cool use of Jupyter notebook widgets, “see” the impact that missing data can have on an analysis (in this case, a linear regression), check out the interactive-plot notebook from Matthew Brems’ missing data repo, which also contains reading materials and a great slide deck.
- I finally figured out how all of the parts of a matplotlib figure go together thanks to this nifty visualization from the matplotlib documentation. (h/t Bruno Gonçalves)
… So yeah, I learned a lot of really cool things from some very talented people at JupyterCon this year. I’m excited to build new data products, apply network/graph analysis to IoT data, play with widgets, and maybe put a notebook or two into production.
If you’re doing cool things with notebooks at your company, I’d ❤ to hear about them. Feel free to leave a comment here or ping me on Twitter.
[Title image credit: Jason Williams]