N=1: My Experience with Motherhood in Tech

Over the past year, I’ve gone through pregnancy, experienced childbirth, took maternity leave, went back to work (while pumping), and worked from home while taking care of a baby during a pandemic. It’s been quite a ride, and while I still love tech and data science and don’t have any plans to change the general theme of this blog, it’s important that I acknowledge that my life looks a lot different than it did a year ago. 

I have a few reasons for wanting to discuss my experience with motherhood so far. Mostly, there was so much I didn’t know before it started, and I would have benefited greatly from hearing someone walk through their whole journey from planning through birth and beyond. In particular, I would have loved to hear about this journey from a person similar to me — a woman in tech who cares deeply about her career and community as well as her family. Influencers can be great, but sometimes you just want to know how a regular person told their boss they were pregnant, and maybe what stroller they picked and why. (I’ll cover both.)

Folks, this is a long one. Before we dive in, I want to take a second to acknowledge that creating a family is a profoundly individual, unique, not always easy thing, full of hard decisions. Whatever stage you are in or family you have chosen, I wish you the best.

No perfect time

Before writing this post, I asked on Twitter if anyone had aspects of this journey that they’d be interested in reading about, and I got several DMs about timing — how to decide when is a good time, if I waited for a particular milestone, etc., and so I want to start by sharing my experience with timing. 

My husband and I knew early on that we wanted to start a family, and checked in about timing a few times each year. We’d heard that there’s never a perfect time to have a baby, so we set our sights on feeling “ready”. We wanted to both be in good places with our careers, to own a house, and to be ready for a big lifestyle shift. We pushed back our timeline several times when we realized we weren’t there yet.

At the beginning of 2018, my husband was chasing a promotion, and I was at a company without paid parental leave. We realized that saving to buy a house in Austin was going to take longer than we expected and wasn’t a must-have before baby. The timing wasn’t right, but we set our sights on the end of the year — “Q4!”, we joked — and hoped things would come together before then. In early fall, things were starting to fall into place. My husband got the promotion, and I got a new job earlier in the year that I loved.

Then a month before we were planning to start trying, I got laid off from my still-pretty-new startup job and started in a new position at another company I’d been contracting for. We debated delaying our timeline, but everything else was lining up. Plus, we had no idea how long it would take to get pregnant, and just didn’t want to put it off any more.  

There’s no perfect time, right? 

The joy of oatmeal (or, the first trimester)

I didn’t tell anyone at work that I was pregnant until toward the end of my first trimester. I’m going to pause here: this was tougher than I thought it would be. 

I didn’t have bad morning sickness but the only foods I wanted to eat for about four months were oatmeal and an occasional (very plain) rice noodle soup. This was noticeably weird. One of my still-new coworkers commented that I was “very into oatmeal”, and beyond chuckling, I had no idea what to say. 

The timing of my first trimester also coincided with the company holiday party. When I asked the waiter for a seltzer water with lime, I was served a large red plastic Coke cup that didn’t exactly blend in with the other mixed drinks. By the time I reached the 12-week mark, I felt like my pregnancy was painfully obvious (though it probably wasn’t), and I was very ready to talk about it.

The first person I told was a coworker who had just returned from maternity leave. I wanted to ask what her experience was like and was relieved to hear her talk about how supportive everyone at our office had been. My company is small (about 40 people at that time), and I was adamant that my boss hear the news from me directly (and before I told my coworkers), so I made a point of telling him as soon as I felt ready, just after I hit the 12-week milestone.

Telling my boss

It’s possible that I was more nervous to tell my boss I was pregnant than I was for actually giving birth. This is no reflection of my boss, who was wonderful throughout the whole experience, but is seemingly very common: a search for “tell boss pregnant” yields 62 million results. I spent hours preparing for this conversation by reading articles, looking for personal stories from Reddit strangers, planning what I was going to say, etc.

I brought it up during our regularly scheduled 1:1. I don’t remember exactly what I said, but I will never forget that the first thing that my boss said was “Congratulations!” with a big smile. (Note: this is the best possible reaction that you can give, especially as a boss, and I will be adopting it for future use.) We had a friendly conversation about due dates and daycares. We’d figure out the details when my parental leave was closer, and I walked out of his office feeling majorly relieved.

Telling the rest of the office was still a little awkward. I would have felt weird making any kind of formal announcement, but I didn’t want people to feel like it was a secret, so I tried to slide it into conversation whenever it fit. (Would love to hear how other folks have approached this one!) Eventually it seemed to be common knowledge, and it was a relief to be able to talk about it openly. 

Preparing for liftoff

Gearing up to go out on parental leave felt like wading through six waterfall projects at once, all with the same due date.

Before I was pregnant, I committed to three talks, the last of which involved me flying to Ann Arbor while 7 months pregnant to discuss data infrastructure (worth it). Giving talks while pregnant felt badass. I found out I was having a girl about an hour after getting off stage at rstudio::conf, and felt her kicking me during a panel at SXSW. On a less badass note, I had to turn down my first keynote opportunity (at a conference I love!) because it was too close to my due date.

At home, we were doing everything from booking daycare to attending classes and learning about the birth process to squeezing in time with friends and building furniture for baby’s room. During the first and third trimesters especially, it was not uncommon for me to come home from work, take a nap, and still go to bed early.

At work, about a month before my due date, I started training a coworker on taking over my responsibilities, and creating a plan for while I was out. As a community organizer for two groups (R-Ladies Austin and the ALL the Ladies in Tech Happy Hour), I recruited helpers and passed projects on to co-organizers. Documentation was key to all of the above, and something I probably would have started sooner in hindsight.

e133506d-bcf4-4443-84f7-05cf889120eb-l
Author, pregnant at work

One of the weirdest things about being pregnant, at work and in the world, is that your personal life is so physically on display. In some ways, this could be a little awkward (like being the only person sitting during hours-long, stand-up only event storming sessions), in some ways, it made for an easy conversation starter, but mostly it felt weirdly normal. I didn’t really experience the “pregnancy brain” that I’ve heard referenced, and generally I walked slower than usual, but it was fine.

My office happened to have five people whose families were expecting at once (!), so we did a big, casual office-wide baby shower for everyone, an approach that I loved. Our gifts were gift cards to Amazon, which was great; it avoided the awkwardness of opening gifts in front of others, getting a single gift card was super useful, and the celebration was a great way to acknowledge the growing families of our employees.

I ended up working until the day before I went into labor (choosing to save my parental leave for spending time with baby), at which point I was very ready, mentally and physically, to be done working.

On birth and baby blues

This part has nothing to do with tech (but everything to do with motherhood) and I feel compelled to share it, so I will.

I delivered at 11:44pm, and barely slept the first night. The hospital counted this as our first of two mandatory nights, and about 36 hours after our daughter entered the world, we left the round-the-clock care of the hospital for home. 

In addition to the physical exhaustion, I felt mentally and emotionally exhausted. For the first week or so, whenever afternoon faded into evening and dinnertime, I’d find myself crying for no reason. Enter baby blues. I always thought that baby blues was a euphemism for post-partum depression (which I was watchful for signs of!), but was surprised to learn in the midst of all this that they’re actually two separate things. I felt like hormone soup for a couple of weeks, and eventually my baby blues went away as quickly as they had come on.

Beyond baby blues, I experienced more anxiety than I ever thought I would. I’m generally an even-keeled, roll-with-the-punches kind of person, but the first few weeks were so, so stressful. There were words I wished I’d never heard (SIDs, failure to thrive), and more anxiety than I’ve ever felt before or since. I learned pretty quickly that Googling made things worse. A mother-of-two friend confirmed this and recommended I consult a single source before Googling anything: Baby 411. She gave us her copy, and looking there first when we had one of our “is this normal?” questions proved to be very helpful and majorly cut down on the number of late-night Google searches.

The anxiety got better with time, experience, and a dedicated effort to avoid Googling things. Before I was pregnant, I hadn’t heard many people talk about that particular part of the experience, so I wanted to share it now. If you feel this way, you’re not the only one. 

Parental leave (is not a vacation)

I took eight weeks of paid parental leave, and my husband took seven. One of the biggest questions I asked other moms before delivering was how much time they were able to take, and how that amount of time felt to them. This is a personal decision, and obviously depends on what options you have available to you, but knowing what I know now, I’ll likely take more time if we do have another baby.

For the first few weeks, newborns have to be fed about every two hours during the day and three hours at night. When you’re nursing, “every two hours” means from the beginning of one session until the beginning of the next — so if a session takes 45 minutes (because you’re both learning!), you have an hour and 15 minutes until the next session. At night, in the “spare time” between sessions, we were soothing the baby and doing the whole dance that is getting a newborn to fall asleep, and then trying to sleep ourselves. 

In general, having both of us home made a world of difference. We could take turns napping to catch up on sleep, and we felt connected and supported generally by family staying with us and helping out, friends who made us food, and folks who came over (for brief visits!) to meet the baby and catch up. I made a point of getting out of the house by going to our local coffee shop every day — some days, not until 2 or 3pm, but I still made it — and spending time outside of the house helped me to feel more connected. I also bumped into my boss there, and just catching up on office stuff made me so happy. I did check Slack occasionally while on leave, but mostly just to keep in the loop on big stuff as it was happening.

Toward the end of week seven, my husband and I visited our daughter’s very-near-future daycare, baby in tow, for a new parent orientation. The sight of a classroom cubby with her name on it made me teary. When friends asked if I was excited to come back to work, my honest answer was “a little”. I love my job, and find it immensely gratifying, but I loved the time I was getting to spend with my daughter and didn’t want it to end. 

Returning to work

Going back to work felt good, but it was tough (especially at first). As a lifetime lover of fall and school supplies, in some ways, I had major back-to-school vibes, except instead of a new backpack, I started carrying an extra tote bag for my pumping stuff. On my first day back, it was great catching up with my coworkers, but I definitely did the first-time-mom thing and snuck out to call daycare and check in.

Returning to work meant pumping three times a day (about every 3 hours). Thankfully, we have a great mother’s room where I could keep my pumping stuff and avoid any sort of community refrigerator. (I know that others are not always so lucky.) Even so, the context switch of having to think through bottle logistics and then transition back to work is a mental load worth mentioning. Breastfeeding is, hands down, the hardest thing I have ever done — and I mean that mentally and physically. During this time, my life felt very dictated by the clock, and lived in 2-3 hour increments. I put this time on my calendar so that I didn’t accidentally get scheduled for meetings when I’d need to pump. That worked great, and during longer half-day kind of meetings, I just mentioned up front that I’d need a long break a couple of hours in.

The first couple of months back were really a grind. I found that I was just as interested in work as ever, but taking care of a baby, even with full-time help in the form of daycare, takes a lot of time. The make-up of my days changed a lot during the first six months, and after trying to explain these transitions (and how dropping a feeding somehow magically gave me multiple hours of free time!), eventually I settled on a visualization to describe these transitions, which you can see below.

time baby

Even with a baby who was a Good Sleeper, I didn’t really have free time until 5 or 6 months in — and by “free time”, I mean time to do anything not baby- or work-related, like cooking, cleaning, laundry, catching up with friends, etc.

Wrapping Up

I wrote this post because I wanted it’s the kind of thing I wish I could read before I started my own experience with motherhood in tech. I had so many questions, and even in this 2700+ word post, I haven’t come close to answering or covering them all. I learned by asking a lot of questions to a lot of women I knew who had children while working with tech, and I’m very grateful for all of the tips, big and small, that they gave me. This post is my attempt to pass it on.

If you’re someone who is on this journey, or thinking about this journey, this post really just scratches the surface of motherhood in tech (n=1), and I’d love to hear about the kinds of questions you have (or, advice, if you’ve been through it!). I also wrote up a sort of companion piece on my “baby stack“, for anyone interested in more specific details on things like pregnancy wardrobes, what I packed in my hospital bag, and how we decided which stroller to get. You can comment here or find me on Twitter. Thank you for reading!

**

I owe a giant thank you to Alex Ensch, who has been a sounding board for ten years and running, for her encouragement and reading of many drafts of this post. Thank you, Alex!

 

Git Your SQL Together (with a Query Library)

If I could teach SQL to analysts who plan to work in industry data science, I’d start by sharing a few SQL Truths I’ve learned, and why I recommend tracking SQL queries in git. Here goes:

  1. You will *always* need that query again
  2. Queries are living artifacts that change over time
  3. If it’s useful to you, it’s useful to others (and vice versa)

Focusing on these points has led to my continuous adoption of a query library — a git repository for saving and sharing commonly (and uncommonly) used queries, all while tracking any changes made to these queries over time. 

Anyone who is following my personal journey might know that I’m in the midst of building data science infrastructure at a start-up. In addition to setting up a data dictionary, setting up a query library is one of the first things I did to start documenting the institutional knowledge we have about our data.

Let’s talk more about these SQL Truths and why they’ve led me to git my SQL together with query libraries (and how to build one yourself!). 

SQL Truth #1: You will always need that query again.

Have you ever written a query for a “one-off” analysis, deleted it (maybe even on purpose! *shudders*), and lost all memory of how to create said query — just before being asked to re-run or tweak that analysis? Happens to the best of us.

Screen Shot 2018-11-27 at 6.15.00 PM

Usually, this is a job for reproducibility. But, even if we take reproducibility seriously (and we do!), it’s easy for queries from “one-off” analyses to slip through the cracks because they can live outside of our normal project workflow. After all, if you’re only doing something once, there’s no need for it to be reproducible, right?

The sooner we can accept that it’s never just once, the sooner we can hit CTRL+S, put them in a query library, and move on to more interesting problems.

SQL Truth #2: Queries are living artifacts that change over time

Here’s a short list of reasons why your queries might change over time:

  • You’ve become more familiar with database(s)
  • You’ve gained a deeper understanding of your data
  • You’ve had to add filters and caveats over time
  • You’ve changed the way you calculate a metric
  • You’re answering slightly different questions
  • You’ve started collecting new data
  • You’ve found discrepancies or issues with tables or fields
  • Your business model has changed
  • You’ve gotten better at writing SQL

This list isn’t all-inclusive, but hopefully it gives you an idea of how and why your queries might change. This isn’t inherently good or bad; it just means that you’ll want to capture the way that you’re doing things both as documentation for the future and to ensure that changes or updates are applied as needed to keep things up-to-date.

SQL Truth #3: If it’s useful to you, it’s useful to others

Have you ever asked a coworker where some data lives and had them respond with a beautiful, hand-curated SQL query? It’s the best. You get to focus more on the fun analysis, or starting new projects, and as a bonus, your company isn’t paying two people to repeat the same process.

SQL queries contain built up domain knowledge — the way you filter, join, and aggregate tables reflects knowledge you’ve collected about how the data is collected, stored, and should be used for practical application. Sharing queries with others is a nice thing to do, and a good way to spread institutional knowledge about your data (as is building a good data dictionary!).

When you start to compile and compare SQL queries, you might find discrepancies in the way that different people pull the same data. (This happened on a team I was on — if you asked three of us to pull all live clients, we’d all do it with slightly different caveats that reflected our individual understanding of the data. Not a bad thing, but no wonder our numbers didn’t always match!) Creating and reviewing a query library is also good way to get everyone on the same page.

Building your own query library

Let’s focus on how you can build your own query library in a few simple steps:

  1. After writing a query that you’ll want to use again*, save it(!!) as a .sql file with a descriptive name. 

    I tend to name my files based on what the query is accomplishing as output, and since naming things is cheap (and I never have to hand-type the names of these files), I use very descriptive (long) names, like activity_on_mobile_since_june2018_by_client.sql.*This was a test — every query is a query you might want to use again!

  2. Create a git repository in a shared location and upload your queries to it. Encourage your team to do the same, and discuss how to best organize queries with them. 

    This is also a good time for a mini blameless post-mortem of existing queries (especially if they come from different analysts) by looking for discrepancies and using them as an opportunity to level-up everyone’s understanding of the data.

  3. Whenever you create a new query, take a few minutes to clean it up so that others can understand how and when they might want to use it, and upload it to the query repository. 

    Whenever you update an existing query locally, make sure to commit those changes to your query library (ideally along with an explanation of why you made them).

Final Notes

The idea of the query library was introduced to me by Tim, who implemented one for our product analytics team while we were both at Web.com. (You might remember Tim from this post on the career “tour of duty” concept.) It was helpful for our team there, and I’ve happily implemented one on every team since.

I hope this post is helpful, and I’d love to hear from you if you have a data infrastructure tool or idea to share (or a question!); please write a comment below or ping me on Twitter. Thanks for reading!

 

Field Notes: Building Data Dictionaries

The scariest ghost stories I know take place when the history of data — how it’s collected, how it’s used, and what it’s meant to represent — becomes an oral history, passed down as campfire stories from one generation of analysts to another like a spooky game of telephone.

These stories include eerie phrases like “I’m not sure where that comes from”, “I think that broke a few years ago and I’m not sure if it was fixed”, and the ever-ominous “the guy who did that left”. When hearing these stories, one can imagine that a written history of the data has never existed — or if it has, it’s overgrown with ivy and tech-debt in an isolated statuary, never to be used again.

blaze-bonfire-campfire-775673.jpg

The best defense I’ve found against relying on an oral history is creating a written one.

Enter the data dictionary. A data dictionary is a “centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format”, and provides us with a framework to store and share all of the institutional knowledge we have about our data.

As part of my role as a lead data scientist at a start-up, building a data dictionary was one of the first tasks I took on (started during my first week on the job). Learning about our data is a crucial part of onboarding for data-focused roles, and documenting that journey in the form of a data dictionary provides a useful data asset for the company (which helps to preserve institutional knowledge) and simultaneously provides a good resource for analyzing the data. 

My data dictionary is a Google Sheets workbook that looks something like this:

Screen Shot 2018-10-30 at 11.31.56 AM

I use one sheet for each database, and the same fields throughout:

  • Table: the table name, exactly the way it appears in the database
  • Table Notes: general notes on the table, like the theme of the data in the table, how often it gets updated, and where it comes from
  • Field: the field name, exactly as it appears in the database
  • Definition: a user-friendly (often long-form) definition of the field
  • Example value: used to show what data in that field actually looks like
  • Field notes: general notes on the field, sometimes including values, caveats or notes of interest, and places (like tables) to find more information about that field

Yours doesn’t have to look like this, and I’ve seen data dictionaries with other fields and structures, but feel free to borrow mine if you’re looking for a format to start with and riff on — it’s worked well for me so far.

Lessons Learned and Best Practices

I’ve built at least half a dozen data dictionaries for various companies, and through that experience, some personal best practices have shaken out:

1. Start small and iterate

A couple of times while building a data dictionary, I tried to document every piece of data I found while spelunking around my company’s database, which was very painful and not a great use of time. Don’t do this.

Focus on starting with the data that’s important and useful to you and documenting those fields or tables. Then, as you incorporate more data from other tables or databases, focus on documenting those incrementally. Avoid shaving the yak. 

2. Answer your own frequently asked questions

Over time, I’ve found myself asking the same questions over and over again about data lineage and usage, so those are the questions I try to answer when building and filling in a data dictionary. Here’s a basic list of questions to consider:

Screen Shot 2018-10-30 at 1.22.57 PM

3. Consider who will be using your data dictionary and how

The structure, content, and location of your data dictionary should be very different if your context is for analyst use in the trenches vs. for business stakeholder understanding. It could also look different depending on the people or groups who will be using it as a reference.

For example, my data dictionaries tend to contain notes pertaining to analyzing the data that others might not need — things I might edit or remove if I were creating a shared cross-functional resource. I’ve also created versions of data dictionaries that are structured as documents rather than spreadsheets, a format that lends itself well to going very in-depth about fields (if you can’t fit that info in a spreadsheet cell) or sharing with less-technical folks alongside a deliverable (like an analysis). These also make a great first draft to be turned into a more shareable version later.

4. Plan for a living document

For a data dictionary to be useful, it has to be kept up-to-date. This is a challenge that straddles both technical and cultural realms.

Technically, to keep a data dictionary up-to-date, it should to be straightforward for collaborators to access and update. It’s also helpful to be able to see when these updates are made to track important changes. Culturally, the importance of a data dictionary should be acknowledged, and upkeep should be incentivized. It’s easy to skip documentation if it’s not required or seen as important, and this is how documentation grows stale (and eventually dies when deemed worthless).

I’ll freely admit that I have yet to implement a data dictionary that perfectly addresses both of these challenges, but I’m working on it, and others have made good suggestions on the “how” below that I plan to incorporate in my own work.

Improvements + More Discussion

There are a few things about my own data dictionaries that could be improved. Namely, I plan to make my data dictionary more “internally public” (as something like a Github wiki or a Confluence page), and add it to source / version control (git) to track changes.

Some of my biggest concerns are making sure that others know about, have access to, and can use and update any data dictionaries I’m building. Some of this is cultural and some of this is technical, so I’m doing by best to tackle these concerns as I’m building, and asking others for advice along the way.

For more ideas and best practices around data dictionaries, check out these two Twitter threads which are full of great suggestions. If you’ve built a data dictionary and have some best practices to share, or if you have questions about how to get started, please feel free to chime in on Twitter or as a comment here.

 

A Month in the Life of a Data Scientist

“What does a data scientist actually *do*?”
“What kinds of projects do you usually work on?”
“What does a typical day look like?”

These are questions I get asked a lot both by aspiring data scientists and the folks who want to hire them. My answer, in true data scientist fashion, is usually something along the lines of “it depends” — and it’s true! Most of my work involves juggling multiple projects that might have different stakeholders or touch different parts of the company, and the lifecycles of these projects can vary greatly depending on the complexity involved. In the eight years I’ve been doing applied analytics, no two weeks have looked the same. Furthermore, data science is such a growing and varied field that it’s rare two data scientists would give the same answer (even at the same company!).

To help others get a feel for the types of projects a data scientist might do, and a bit of the day-to-day work, I used the 1 Second Everyday app to take a series of one second videos of what my work as a data scientist at an IoT startup looked like during the month of August. Check it out:

For context, my startup installed hardware into retail stores in order to track cardboard displays (the ones you see in CVS, for example, that are stocked with sunscreen or allergy medication) that go in and out of each store. We used sales data from those stores to calculate the incremental sales gained as a result of having these displays up, in addition to tracking other things like the supply chain process, and reported all of this back to the stores and brands whose products were on the displays. Lots of fun data to play with!

August consisted of three main projects for me (with lots of smaller projects thrown in):

  1. Testing of IoT device updates: as an IoT company, we periodically rolled out firmware updates to our hardware, and August was a big update month. We used data to decide which units to update, how to space out the updates, and to monitor updates as they were being rolled out. After the updates, we performed more analysis to see whether the updates were making our RSSI signals stronger than they were before.
  2. Improving our data pipeline: as a startup, we were constantly working to improve our data pipeline — this meant incorporating new data, QA-ing our data inputs and pipeline outputs, chasing down bugs, updating to account for new logical cases and products, and building better documentation to describe what various pieces of the data pipeline were doing. As you can see, I used lots of data visualizations along the way to help us diagnose and improve the pipeline.
  3. Professional development: I was fortunate enough to attend the JupyterCon tutorials and conference in August in NYC (check out my recap here). For me, conferences serve as a place to learn, meet great people, and get inspired by all of the cool things that folks are doing. Also, it’s a lot of fun to do a local R-Ladies dinner, and I had a great time hanging out with NYC R-Ladies.

It’s worth noting again that I had I taken these videos in July, or August, the set of projects I was working on would have been very different. (For example, one month was focused heavily on a classifier algorithm, and the other on creating and evaluating new metrics and ways of matching test and control stores.)

It’s hard to distill the variance of a data scientist’s job into a single video (or set of videos), but I hope this helps to give some insight into the types of projects a data scientist might be tasked with. If you’re interested in reading more about what data scientists really do, I highly recommend Hugo Bowne-Anderson‘s HBR article, which is the result of his conversations with 35+ data scientists on the excellent DataFramed podcast (which I also recommend!). One of the coolest things about being a data scientist right now is how much can vary day-to-day and week-to-week (even at the same job!) — there’s always more to learn and something new to try.

**

PS: Here’s a full description of each snippet, in case you’re curious:

  1. Reviewing a design for a test of our hardware to be run in stores.
  2. Working through my daily to-do list. This one includes incorporating and QA-ing a new set of data into our workflow.
  3. Getting the word out about a panel that a few fellow data scientists and I are pitching for SXSW.
  4. Visualizing test results with violin plots(!). A great way to combine and display data from a test on the distribution of signal strength.
  5. Updating SQL case statements in our data ETL pipeline to account for a new case.
  6. Writing pseudo-code documentation for a classifier so that others can understand the data that goes into it, the logic behind it, and are able to explain it in more simple terms to customers.
  7. A quick shot of a “lab” we use to test equipment before it goes in the field. This was a test-heavy month.
  8. This is the face I make approximately a dozen times per day when I’m questioning something I see in the data. I’m getting wrinkles from making this face.
  9. This was a SQL-heavy month, since we were spending lots of time QA-ing our data.
  10. Using Jupyter to spin up some quick exploratory data visualizations to present to answer a question my team had.
  11. Playing with a cool way to visualize the impact of missing data at JupyterCon.
  12. Dinner with R-Ladies NYC! I’ve had a lot of fun meeting R-Ladies when visiting various cities, and this was no exception — it’s nice to have a built-in group of people to hang out with in cities all around the world.
  13. Swag from the Netflix booth at JupyterCon — read about all of the cool things they’re doing in my Jupytercon recap.
  14. Building and visualizing an ad-hoc analysis request from a client.
  15. After making some changes to our data pipeline, monitoring the DAGs in Apache Airflow to make sure everything updates smoothly.
  16. More data visualization while bug-hunting to spot the source of an error in our data pipeline.

The Coolest Things I Learned at JupyterCon

I’m freshly back from JupyterCon in NY and still feeling the bubbly optimism that comes with bringing all you’ve learned at a conference back to your office. In that spirit, I wanted to share some of the coolest and most interesting things I learned with you all.

One quick note before we dive in: I was able to attend JupyterCon because of a very generous scholarship awarded jointly by JupyterCon and Capital One. I would not have been able to attend otherwise and I’m very grateful to these two groups for their commitment to diversity and inclusion in the tech community, so a big thank you to both groups.

In no particular order, here are some of the most interesting things I learned at JupyterCon:

Jupyter Notebooks, Generally

  • You can add a table of contents to a notebook(!) using nbextensions. (h/t Catherine Ordun)
  • You can parameterize notebooks, create notebook templates for analysis, and schedule them to run automatically with Papermill. (h/t Matthew Seal)
  • There are a few cons to teaching and learning with Jupyter notebooks that are worth knowing and acknowledging. Joel Grus’s ‘I Don’t Like Notebooks.’ was a cautionary tale on the use of Jupyter notebooks for teaching, and while I don’t agree with all of his points, I do think it’s worth the time to go through his deck.
29540207-a3d892fe-86cd-11e7-8476-54c79d9f8d7c
Table of contents via nbextensions

Notebooks in Production (!)

  • Netflix is going all-in on notebooks in production by migrating over 10k workflows to notebooks and using them as a way to bridge the chasm between technical and non-technical users. (h/t Michelle Ufford)
  • Notebooks are, in essence, managed JSON documents with a simple interface to execute code within”. Netflix is putting notebooks into production by combining the JSON properties of notebooks with open-source library Papermill. (h/t Matthew Seal)
  • On debugging: by running a notebook on a notebook server against the same image, you can fix issues without needing to mock the execution environment or code, allowing you to debug locally. (h/t Matthew Seal)
  • On testing: templated notebooks are easy to test with Papermill — just run tests with a scheduler using parameters like what a user would be inputting to hydrate and run the notebook (and look for errors). (h/t Matthew Seal)

Screen Shot 2018-08-27 at 1.26.53 PM

Data Science in Jupyter Notebooks

  • One of my favorite new-to-me ideas is to build your own Kaggle-style board to make iterating and judging performance of internal models more fun (and provide incentive to track them better!). (h/t Catherine Ordun)
  • In graph/network analysis, you can connect nodes using multiple edges (characteristics) using a multigraph. (h/t Noemi Derzsy’s great tutorial on graph/network analysis, which I learned a lot from)
  • There is a ton of research out there around visualization, including on human perception, that can and should be leveraged for creating impactful data visualizations. I highly recommend Bruno Gonçalves’s slide deck as tour de force of what we know about perception and how to apply it to data.
  • In a very cool use of Jupyter notebook widgets, “see” the impact that missing data can have on an analysis (in this case, a linear regression), check out the interactive-plot notebook from Matthew Bremsmissing data repo, which also contains reading materials and a great slide deck.
  • I finally figured out how all of the parts of a matplotlib figure go together thanks to this nifty visualization from the matplotlib documentation. (h/t Bruno Gonçalves)
DlZAZ4UXcAA-kV3.jpg-large
Netflix definitely won the prize for best conference swag.

… So yeah, I learned a lot of really cool things from some very talented people at JupyterCon this year. I’m excited to build new data products, apply network/graph analysis to IoT data, play with widgets, and maybe put a notebook or two into production.

If you’re doing cool things with notebooks at your company, I’d ❤ to hear about them. Feel free to leave a comment here or ping me on Twitter.

[Title image credit: Jason Williams]