A Month in the Life of a Data Scientist

“What does a data scientist actually *do*?”
“What kinds of projects do you usually work on?”
“What does a typical day look like?”

These are questions I get asked a lot both by aspiring data scientists and the folks who want to hire them. My answer, in true data scientist fashion, is usually something along the lines of “it depends” — and it’s true! Most of my work involves juggling multiple projects that might have different stakeholders or touch different parts of the company, and the lifecycles of these projects can vary greatly depending on the complexity involved. In the eight years I’ve been doing applied analytics, no two weeks have looked the same. Furthermore, data science is such a growing and varied field that it’s rare two data scientists would give the same answer (even at the same company!).

To help others get a feel for the types of projects a data scientist might do, and a bit of the day-to-day work, I used the 1 Second Everyday app to take a series of one second videos of what my work as a data scientist at an IoT startup looked like during the month of August. Check it out:

For context, my startup installed hardware into retail stores in order to track cardboard displays (the ones you see in CVS, for example, that are stocked with sunscreen or allergy medication) that go in and out of each store. We used sales data from those stores to calculate the incremental sales gained as a result of having these displays up, in addition to tracking other things like the supply chain process, and reported all of this back to the stores and brands whose products were on the displays. Lots of fun data to play with!

August consisted of three main projects for me (with lots of smaller projects thrown in):

  1. Testing of IoT device updates: as an IoT company, we periodically rolled out firmware updates to our hardware, and August was a big update month. We used data to decide which units to update, how to space out the updates, and to monitor updates as they were being rolled out. After the updates, we performed more analysis to see whether the updates were making our RSSI signals stronger than they were before.
  2. Improving our data pipeline: as a startup, we were constantly working to improve our data pipeline — this meant incorporating new data, QA-ing our data inputs and pipeline outputs, chasing down bugs, updating to account for new logical cases and products, and building better documentation to describe what various pieces of the data pipeline were doing. As you can see, I used lots of data visualizations along the way to help us diagnose and improve the pipeline.
  3. Professional development: I was fortunate enough to attend the JupyterCon tutorials and conference in August in NYC (check out my recap here). For me, conferences serve as a place to learn, meet great people, and get inspired by all of the cool things that folks are doing. Also, it’s a lot of fun to do a local R-Ladies dinner, and I had a great time hanging out with NYC R-Ladies.

It’s worth noting again that I had I taken these videos in July, or August, the set of projects I was working on would have been very different. (For example, one month was focused heavily on a classifier algorithm, and the other on creating and evaluating new metrics and ways of matching test and control stores.)

It’s hard to distill the variance of a data scientist’s job into a single video (or set of videos), but I hope this helps to give some insight into the types of projects a data scientist might be tasked with. If you’re interested in reading more about what data scientists really do, I highly recommend Hugo Bowne-Anderson‘s HBR article, which is the result of his conversations with 35+ data scientists on the excellent DataFramed podcast (which I also recommend!). One of the coolest things about being a data scientist right now is how much can vary day-to-day and week-to-week (even at the same job!) — there’s always more to learn and something new to try.


PS: Here’s a full description of each snippet, in case you’re curious:

  1. Reviewing a design for a test of our hardware to be run in stores.
  2. Working through my daily to-do list. This one includes incorporating and QA-ing a new set of data into our workflow.
  3. Getting the word out about a panel that a few fellow data scientists and I are pitching for SXSW.
  4. Visualizing test results with violin plots(!). A great way to combine and display data from a test on the distribution of signal strength.
  5. Updating SQL case statements in our data ETL pipeline to account for a new case.
  6. Writing pseudo-code documentation for a classifier so that others can understand the data that goes into it, the logic behind it, and are able to explain it in more simple terms to customers.
  7. A quick shot of a “lab” we use to test equipment before it goes in the field. This was a test-heavy month.
  8. This is the face I make approximately a dozen times per day when I’m questioning something I see in the data. I’m getting wrinkles from making this face.
  9. This was a SQL-heavy month, since we were spending lots of time QA-ing our data.
  10. Using Jupyter to spin up some quick exploratory data visualizations to present to answer a question my team had.
  11. Playing with a cool way to visualize the impact of missing data at JupyterCon.
  12. Dinner with R-Ladies NYC! I’ve had a lot of fun meeting R-Ladies when visiting various cities, and this was no exception — it’s nice to have a built-in group of people to hang out with in cities all around the world.
  13. Swag from the Netflix booth at JupyterCon — read about all of the cool things they’re doing in my Jupytercon recap.
  14. Building and visualizing an ad-hoc analysis request from a client.
  15. After making some changes to our data pipeline, monitoring the DAGs in Apache Airflow to make sure everything updates smoothly.
  16. More data visualization while bug-hunting to spot the source of an error in our data pipeline.

The Coolest Things I Learned at JupyterCon

I’m freshly back from JupyterCon in NY and still feeling the bubbly optimism that comes with bringing all you’ve learned at a conference back to your office. In that spirit, I wanted to share some of the coolest and most interesting things I learned with you all.

One quick note before we dive in: I was able to attend JupyterCon because of a very generous scholarship awarded jointly by JupyterCon and Capital One. I would not have been able to attend otherwise and I’m very grateful to these two groups for their commitment to diversity and inclusion in the tech community, so a big thank you to both groups.

In no particular order, here are some of the most interesting things I learned at JupyterCon:

Jupyter Notebooks, Generally

  • You can add a table of contents to a notebook(!) using nbextensions. (h/t Catherine Ordun)
  • You can parameterize notebooks, create notebook templates for analysis, and schedule them to run automatically with Papermill. (h/t Matthew Seal)
  • There are a few cons to teaching and learning with Jupyter notebooks that are worth knowing and acknowledging. Joel Grus’s ‘I Don’t Like Notebooks.’ was a cautionary tale on the use of Jupyter notebooks for teaching, and while I don’t agree with all of his points, I do think it’s worth the time to go through his deck.
Table of contents via nbextensions

Notebooks in Production (!)

  • Netflix is going all-in on notebooks in production by migrating over 10k workflows to notebooks and using them as a way to bridge the chasm between technical and non-technical users. (h/t Michelle Ufford)
  • Notebooks are, in essence, managed JSON documents with a simple interface to execute code within”. Netflix is putting notebooks into production by combining the JSON properties of notebooks with open-source library Papermill. (h/t Matthew Seal)
  • On debugging: by running a notebook on a notebook server against the same image, you can fix issues without needing to mock the execution environment or code, allowing you to debug locally. (h/t Matthew Seal)
  • On testing: templated notebooks are easy to test with Papermill — just run tests with a scheduler using parameters like what a user would be inputting to hydrate and run the notebook (and look for errors). (h/t Matthew Seal)

Screen Shot 2018-08-27 at 1.26.53 PM

Data Science in Jupyter Notebooks

  • One of my favorite new-to-me ideas is to build your own Kaggle-style board to make iterating and judging performance of internal models more fun (and provide incentive to track them better!). (h/t Catherine Ordun)
  • In graph/network analysis, you can connect nodes using multiple edges (characteristics) using a multigraph. (h/t Noemi Derzsy’s great tutorial on graph/network analysis, which I learned a lot from)
  • There is a ton of research out there around visualization, including on human perception, that can and should be leveraged for creating impactful data visualizations. I highly recommend Bruno Gonçalves’s slide deck as tour de force of what we know about perception and how to apply it to data.
  • In a very cool use of Jupyter notebook widgets, “see” the impact that missing data can have on an analysis (in this case, a linear regression), check out the interactive-plot notebook from Matthew Bremsmissing data repo, which also contains reading materials and a great slide deck.
  • I finally figured out how all of the parts of a matplotlib figure go together thanks to this nifty visualization from the matplotlib documentation. (h/t Bruno Gonçalves)
Netflix definitely won the prize for best conference swag.

… So yeah, I learned a lot of really cool things from some very talented people at JupyterCon this year. I’m excited to build new data products, apply network/graph analysis to IoT data, play with widgets, and maybe put a notebook or two into production.

If you’re doing cool things with notebooks at your company, I’d ❤ to hear about them. Feel free to leave a comment here or ping me on Twitter.

[Title image credit: Jason Williams]

Better Allies

During my 7+ years in tech, I’ve encountered sexism, but I’ve also had the good fortune to work with some really great guys. These guys cared about me and my work, and lots of the small things they did to help me (sometimes without realizing they were doing anything!) added up to big things for my career over time.

In hopes that others can follow their example, I’d like to share some of the things that “good guys in tech” have done to help my career:

Sharing positive feedback (with my boss)
Over the years, some of my consulting clients made an effort to praise my work *in front of my boss*, which directly helped me to get promoted.

Each promotion has meant more experience, more skills, and higher compensation — in addition to helping to set me up for the next step(s) in my career.

Recommending me
I got my last job because a guy on my future team recommended that his boss take a look at my resume. My resume got screened out through their recruiting process, so I know that I literally would not have gotten the interview if it weren’t for his recommendation. It’s hard to understate how important that recommendation was for my career.

Similarly, many former coworkers and clients have also written formal recommendations for me on LinkedIn, which helps to build trust in my work and establish credibility beyond a single job or boss.

Sharing salary information
I had a male coworker who started in the same position as me, was promoted to a manager role, then switched to a new company. Before he left, he shared his salary in my position, after being promoted, and at his new company.

This helped me to navigate salary negotiations (both internally and externally), ensure that I was being paid fairly, and to evaluate new job offers. Knowledge is power when it comes to salaries and a little bit of “extra” information can go along way.

Sponsoring me
My last company had a highly selective quarterly awards program. My boss put time and effort into my nomination, then went to his boss to ask that he throw his weight behind the nomination as well to give me a better shot at getting the award. I got the award, and I’m positive that having the extra backing had an impact.

This is the difference between mentorship and sponsorship — a mentor might help you gain the skills to earn an award, but a sponsor will nominate you, then go to bat to personally advocate for you. A sponsor has skin in the game. Women are over-mentored and under-sponsored, and we need people with social and political capital to promote us and help us to advance.

Asking about parental leave policies publicly (and lobbying for better ones!)
During an HR / benefits session, a male coworker asked about parental leave so that I wouldn’t have to. It can be quite awkward to have people assume that you’re pregnant (or soon to be) if you’re talking about parental leave policies, and this saved me from having those uncomfortable conversations.

The same guy also wrote a letter to the CEO citing his experiences and how a flexible schedule helped him and his family during pregnancy and beyond. Parental leave helps everyone!

Promoting my work (even Twitter helps!)
When someone references things I’ve written or retweets the things I’m working on, it helps to amplify my message and build my network.

This also provides potential for new opportunities and conversations with people I may not have reached otherwise. (Plus, it never hurts to have people supporting you and talking about your work!)

Supporting women-focused groups
Before launching R-Ladies Austin, we had several men reach out to see how they could help us grow. They offered time, training materials, books for raffles, advice, meeting places, sponsorships, speaking opportunities, and more. This helped a lot as we were getting established.

Similarly, our local Austin R User Group goes out of their way to promote R-Ladies events without me even asking. This helps us to expand, reach new members, and makes us feel supported and welcome in the tech community.

Empathizing (and humor doesn’t hurt!)
I’ve been lucky to have male coworkers who at least try to “get it” when it comes to gender in tech, and who have had my back during tough moments. Some of my favorite coworkers have made laugh out loud after being frustrated by something casually sexist that a client said. That stuff is the worst, and a little bit of empathy and humor can go a long way.

Working to improve gender ratio at tech events
Quarterly, our R-Ladies group teams up with the larger user group for a joint meetup, and recently we asked for volunteers to give lightning talks and ended up with more speakers than slots (a great problem to have for an organizer).

I was prepared to step down to give my slot to another woman in our group. Instead, the male organizer chose to give up his slot so that the event would have a higher women-to-men gender balance. To me, this small action speaks volumes about the type of inclusive tech community we’re working to build here in Austin.

Being a 50/50 partner at home
My husband has picked up domestic “slack” while I study, organize meetups, attend workshops, and travel to conferences (among other personal pursuits). I’m all about being a hashtag-independent-woman, but the dogs still need to go out even if I’m doing back to back events after work.

The balance of responsibilities shifts from week to week, but in the end, we’re partners and teamwork is what makes it all work. The fact that my husband supports my career and is happy to help out at home makes more things possible.

Asking what men can do (better)
A former boss tries his hardest to promote women in tech, and one of the things he’s done best is ask for specific ways that he can be most helpful. This has lead to lots of productive conversations around things like hiring and mentoring.

The act of asking also lets me know that he’s open to feedback and questions. This list started in large part because he’s constantly asking for concrete, actionable ways that he can help women in tech, and I appreciate that.


But wait, there’s more!
All of the above are things that good guys have done for me personally. I asked my network on Twitter whether they had any personal experiences and got lots of feedback. Here are more great (concrete and actionable) ways that men in tech have helped women in my personal network:

  • Helping brainstorm talk proposals (via Stephanie)
  • Offering training to help women build their skillsets (via Susan)
  • Literally just saying “well done”, “that was great”, “I’m impressed”, etc. (via Alexis)
  • Fixing a salary discrepancy after investigating and realizing a woman is underpaid (via Angela)
  • Letting women know that you have confidence in them (via Mara)
  • Supporting women while they are out on maternity leave (via Elana)
  • Asking point-blank, “What is holding you back?” and helping (via Alison)

One more thing
Ladies, if there’s a concrete action that someone has taken to help your career, I’d love to hear about it in the comments or on Twitter [I’ll update this post with any new suggestions].

Also, it’s worth mentioning that none of the above actions are gender specific — plenty of women have helped my career as well (in similar and different ways) — or specific to tech. Anyone can make a big difference on someone’s career — these are just a few ways that some good guys  have helped mine, and I hope they help illustrate small, concrete ways that we can all be better allies to one another.


A Data Science Tour of Duty

In September 2015, I was looking for a job in Austin and started interviewing at Yodle, a marketing tech company. Yodle was looking for someone with experience in data and predictive analytics, and I was looking for a company where I could learn how to code and work on new, interesting problems. It was a good match, but one thing that made this opportunity stand out was the way that my soon-to-be boss described what my time there would be like — a “tour of duty”. Tim was building a data science-y team and was testing out a management framework he had discovered called “The Alliance”. I was intrigued.

A tour of duty?

I’ll pause here for a quick explanation of The Alliance and the tour of duty concept.

The Alliance was created by LinkedIn cofounder Reid Hoffman to address a lack of trust and alignment between employers and employees in a networked age. The gist is that since pensions aren’t really a thing in tech and most people no longer spend a lifetime at the same company, employees behave more like free agents — they are willing to leave a position when the next good thing comes along without concerns about loyalty to their employer.

The Alliance outlines a new employer-employee compact where employers can retain employees better by being open and honest about this situation and focusing on how they can add mutual value to each other. One way to do this is to establish a tour of duty for each employee — a commitment by both parties to a specific and mutually beneficial mission (with explicit terms) to be accomplished over a realistic period of time. For a more thorough explanation, check out the visual summary below.

There are some other key components of The Alliance beyond the tour of duty that I’m not going to outline here. The framework for having open and honest conversations about career goals and timelines was also interesting and impactful for me, and worth a read if you’re interested.

My tour of duty

My transformational tour of duty started with making some goals to be incorporated into a formal growth plan. (One crucial piece of The Alliance is that the mutual agreement between employers and employees should be written down, which includes the things that each party hopes to achieve during the tour of duty.) My initial goals included learning to code, doing innovative data analysis, and learning to automate things, all of which would be put to use on a project to build in some automation around our A/B testing.

Building trust incrementally is another facet of The Alliance — the relationship deepens as each side proves itself. I wanted to learn how to code, and Tim gave me about two months of paid development time to ramp up on company practices and learn to code before diving into analyses. This built trust for me immediately, and because my boss was willing to invest in me, I was happy to invest in his mission and doing great work for our team.

The timeline we set for this first tour was about two years, and especially at the end of my ramp-up period, I was feeling really good about it. We did weekly 1:1’s to check in, and I was able to freely talk about how my goals were changing as we accomplished things along the way.

…Okay, multiple tours of duty

Four months after I started, Yodle was acquired by Web.com. If you ever want to throw a wrench into long-term job plans, an acquisition really is a great way to go. Due to several shakeups that were beyond my boss’s control, I actually ended up completing three distinct tours of duty — one in marketing analytics and automation, one in product analytics (including feature research and user behavior) and a final tour in production machine learning and data science.

During these times, Tim let me know when he was having doubts about projects or when tectonic shifts in our organization’s structure were coming. His openness and honesty empowered me to be open and honest. At one point, I told him that I didn’t want to do product analytics work — my job at the time — anymore. (Note: I actually like product analytics, but I really wanted to learn how to build machine learning models and put them into production.) My goals had grown with my skillset, and he added me to a team where I could pick up these new skills while continuing to add value to the business.

Multiple shorter tours was definitely was not what we initially planned, but we were able to be agile and adjust as needed, and I’m grateful to have gained valuable experience in multiple arenas. My last tour in particular was exactly the kind of transformational launchpad that we talked about when I first joined.

The end of the road

Tim had always been very up-front that if my dream job came along, I should take it. In turn, he let me know when he was contacted by dream-job-level prospects, and kept me up-to-date on how he was feeling about his role, his missions, and his career path. When I had an inkling of when it would be time for me to move on, I told him.

Another goal of The Alliance is to extend the relationship between employer and employee to be a lifetime relationship that exists beyond the scope of a single job. I feel really good about this. I’d love to work with Tim again because I know he cares about my career beyond a single job, and he demonstrated this by giving me the opportunity to work on projects that would grow my skillset and enable me to move on to the next thing.

Post-tour thoughts

If I had the chance to work within The Alliance framework again, I’d take it.

The Alliance is rewarding, but it’s also tough. It takes commitment from both parties and a lot of gradual trust to get the point where you can talk openly about your career beyond the scope of a single job (but once you get there, it’s worth it). When plans changed or gave way to new ones, being able to talk openly and honestly about what was and wasn’t working allowed me to build my skills while helping the company with its goals — a win-win.

My tour of duty was a transformative step in my career, exactly as it was designed to be. With a roadmap, a reasonable amount of time to dedicate, and a clear explanation of how my projects would be mutually beneficial to me and the company, I was enthusiastic about my work. I was given the opportunities I needed to learn new skills, build cool things, and work with great people. As my time at Web.com comes to an end, I’m happy, well-equipped, and ready to start my next tour of duty.

If you’ve worked under The Alliance framework, I’d love to hear about your experience and whether there’s anything you might add or change — feel free to ping me on Twitter.

One Year of R-Ladies Austin 🎉

Today marks one year of R-Ladies Austin! The Austin chapter started when Victoria Valencia and I emailed R-Ladies global (on the same day!) to ask about starting a local chapter. It was meant to be — the ladies from R-Ladies global introduced us, and the rest is history.

For anyone who hasn’t heard of R-Ladies, we are a global organization whose mission is to promote gender diversity in the R community by encouraging, inspiring, and empowering underrepresented minorities. We are doing this by building a collaborative global network of R leaders, mentors, learners, and developers to facilitate individual and collective progress worldwide. There are over 60 R-Ladies chapters around the world and we continue to grow!

Here in Austin, it’s been a busy year. So far, we’ve hosted 16 meetups — including seven workshops, two book club meetings, two rounds of lightning talks, a handful of happy hours, a movie night, and a visit from NASA.


To celebrate our first R-Ladies anniversary, I thought it would be fun to answer some questions with Victoria around our journey so far:

What has been the best part of working with R-Ladies?

Victoria: The best part has been connecting with women in our community that share similar passions and interest in data! It has been so fun. Also, the R Ladies hex stickers are pretty awesome. 🙂

Caitlin: It’s been great to be a part of such a supportive community and to meet so many brilliant women, both here in Austin and in other cities. Since joining R-Ladies, I’ve built a great network, learned cool things, and had a lot of fun along the way.

Have you had a favorite meetup so far?

Victoria: My favorite meetup by far was our book club for Dear Data by Giorgia Lupi and Stefanie Posavec. We started by discussing the book and the types of visualizations and data Giorgia and Stefanie shared with each other. Some were funny and some were sad but all of them were inspiring! We followed by creating our own visualizations in a postcard format of the beer list at Thunderbird Coffee. Who knew a beer list could be so fun to visualize and that each of us would think to do it in such different ways! It was a blast.

Caitlin: I love the book club meetups too — it’s a great space because we can do anything from have deep discussions on the ethical impacts of algorithms in society (I’m looking at you, Weapons of Math Destruction) to getting really creative and using colored pencils to dream up artistic ways of visualizing data. I also loved having David Meza come down from NASA in Houston to talk about knowledge architecture. It would be an understatement to say that he’s been supportive since day one, because he actually reached out to us long before our first meeting. (I guess “supportive since day -75” doesn’t have quite the same ring to it, but it’s true.)

What’s the biggest thing you’ve learned after one year of organizing R-Ladies?

Victoria: That managing a meetup is a fair amount of work, but certainly worth the effort! I have also learned that the R Ladies community is strong and close knit and super supportive! It has been great connecting and learning from them.

Caitlin: I agree with Victoria’s take — managing is a lot of work but also *very* worth it. I’ve learned a lot about building community through collaboration. Working with other local meetups has helped us to expand our reach and provide more opportunities for the women in our group. It’s also been very cool to learn more about the tech community on Austin. We’ve been fortunate to receive lots of support from local companies and other tech groups, and it’s been nice to get more plugged in that way while building a distinct community that adds something new to the mix.

How has R-Ladies helped you (personally or professionally)?

Victoria: R-Ladies has helped me by allowing myself time to learn about cool R stuff I did not know before! It has helped me to learn more efficient ways of coding by going through all of the chapters of R For Data Science, how to relax with colored pencils, data, and beer, and that opened my mind to different perspectives from fellow R-ladies about the continually evolving and expanding world of data that surrounds us.

Caitlin: I can’t say enough good things about the R-Ladies community. The individual chapters help to build local communities and strong networks of highly-skilled women, and the global chapter works hard to promote the work of R-Ladies to the larger global community, including people who might not see that work otherwise. Especially since a lot of women are one of few women on their team (or the only woman on their team), it’s great to have a network who can relate and provide feedback and advice (on all sorts of things) when you need it. On a personal level, I’ve built relationships with amazing women (both in real life and virtually) through R-Ladies, and it’s opened up some opportunities that would have taken a lot longer to find on my own.


The next 12 months

We’ve grown a lot this first year (we’re over 275-strong!), and we’re hoping to grow even more in the next 12 months. If you’re in Austin and haven’t made it out to a meetup yet, we’d love to meet you! We’re beginner friendly, positive, and dedicated to promoting gender diversity in the R Community (and tech in Austin more generally). And even if you are just interested in data and maybe learning more about R we want you to join us as well!

If you’re not in Austin, but want to support R-Ladies, I’d encourage you to check out R-Ladies directory the next time you’re looking for speakers or for local women to reach out to — there are lots of women out there doing amazing things, and R-Ladies is making it easier and easier to find and connect with them.

The two biggest things that we’ll need in the next 12 months are speakers and space. If you use R and have learned a cool thing, discovered a neat package, done an interesting analysis, or have anything else you want to share, we’d love to hear from you. And if you have space available, we’re always looking for new spaces to host the various types of meetups we put on. Please get in touch with us; we’d love to hear from you!

Thanks for a fantastic year, and looking forward to the next 12 months!

Caitlin and Victoria