A lot of us, data fans, are trying to launch data science careers, and we do not initially have work experience. In fact, we are looking for a first data science role so we can check off the work experience requirement that is listed in most data science job postings. To get unstuck from this catch-22, we need to complete projects and build a data science portfolio.
A data science portfolio is a great way to showcase your skillset in lieu of work experience. It also demonstrates your passion for data science, and assuming that passion is genuine, you will also have a lot of fun completing your own projects and learning new data science skills through them. This article will provide some tips to help jumpstart your data science portfolio.
Talking to Data Scientists
There are two ways to better understand the skills you need to showcase in your data science portfolio: talking to data scientists and analyzing data science job postings.
It may sound simple, but many people should be spending more time talking to other data scientists. In hot data science cities like New York and San Francisco, there are many events where data professionals and "amateurs" alike meet and discuss the data science projects they are working on. Meetup and Eventbrite are great resources to find these gatherings.
For those of us living in areas where data science meetups aren’t as common, there are still ways to find other data scientists. My preferred method is to read the Towards Data Science blog. When I read an article I really enjoy, I often find and connect with the author on LinkedIn.
My naive assumption is that people who dedicate their time to writing data science blog posts love talking about data science, and they would likely enjoy talking to me about it. Below is an example of my interactio with one such author on LinkedIn:
The data science community is an incredible resource, and tapping into the expertise of others in the field will accelerate your growth.
Reading Job Postings of Your Dream Job
Another way to identify skills to showcase in your data science portfolio is to analyze job postings. Hiring managers will include the skills they are looking for in the job posting. Reading these descriptions will help you understand what skills you need to showcase.
LinkedIn and Glassdoor are great websites for finding data science job postings. However, an even better resource would be the network you’ve formed by talking to other data scientists. Many job opportunities aren’t even posted online, and the only way to find out about them would be through referrals.
While looking at job postings, make sure to find multiple options you are interested in. Just as you don’t want to build a machine learning model that overfits to a small and narrow dataset, you don’t want to build a portfolio that is based on limited insight from only one job posting.
In addition, some organizations that are newer to data science may not have a clear idea of the type of data scientist they’re looking for, and the job postings they create may be overwhelming.
Below are some quotes from data analyst and data science job postings I've found through LinkedIn jobs:
“Strong Microsoft Excel skills. Must have working knowledge of pivot tables, formula creation, conditional formatting, VLOOKUP, and Index Matching.”
“Working knowledge of database structures (SQL, Access, etc.)”
“Ability to independently produce high-standard, presentation-ready deliverables”
“Strong Knowledge in Data Science, Data Analytics, R, Python, Etc”
“Strong Knowledge in Statistics, Mathematics and Machine Learning”
“Use data visualization tools and programming languages like Tableau, Hive, Oracle, R, Python, Excel, Workday, Vizier and many other internal tools to work efficiently at scale”
This is just a starting point, and depending on your desired industry and type of data science job, you may find different desired skills listed in the job postings you read.
Find a Dataset to Address a Problem You’re Curious About
Now that you’ve identified the skills you need to showcase, it’s time to generate project ideas. There are many other people already doing data science projects and sharing them online. Looking at other people’s projects might give you inspiration for your own project ideas. Below are two great places to see other people’s data science projects:
Another great way to generate project ideas is to find datasets that interest you. Below are some resources to help you find free datasets:
In my case, I wanted to do projects that showcase my interest in education. One project I found especially interesting was Predicting School Performance With Census Data.
After searching for education datasets in the Google Dataset Search Tool, I came across the College Scorecard, which includes data on U.S. higher education institutions. Someone in my network mentioned that she wanted to do work with community colleges, so I thought it would be cool to do a project exploring trends in U.S. community college enrollment.
“Complete” Your Project and Seek Feedback
Completion is a vague term because there is almost always additional work you can include on a given project. The key is to set clear milestones for yourself. For example, in my College Enrollment Exploration project, I wanted to showcase some of my data visualization skills. In this case, my milestone was a slide deck with visualizations explaining the data.
Once you have reached a milestone, make sure to seek feedback. Create a Github repository for your project, and share your Github repository with your network.
In the beginning, you will likely receive constructive criticism. Here is the Github repository for my College Enrollment Exploration project. Clearly, I have a lot more work to do, and below are some areas that need additional work:
- My Github Repository does not contain a readme file that describes the organization of the repository and a description of each file.
- I did not include a pdf file for my slide deck, and I did not discuss the “business problem” I was trying to address.
- My Jupyter notebooks did not include comments on my overall thought process.
- Although I stuck to orange and blue for my visualizations, I alternated the representation of data (orange and blue were both used to represent both community colleges and other colleges). This could be confusing for my target audience.
- I did not regularly commit and push changes to my Github repository as I was working. Rather, I only started making commits towards the end of my project.
My project clearly isn’t ready for my portfolio yet, and that is okay. If I continually make progress with the guidance of my network, the project will eventually help me differentiate myself from other data science candidates. More importantly, continually cycling through the feedback loop will accelerate my learning and ensure that my work is aligned with hiring managers’ needs.
Brag About Your Project
Eventually your project will become portfolio-worthy, and people in your network will actually encourage you to share your work with others. At this point, you should add a link to your Github repository on your LinkedIn profile and resume.
In addition, you may choose to write a blog post to practice your written communication skills. Medium is a great platform for first time bloggers to create posts.
You may feel trepidation when broadcasting your work in this way for the first time, and that is completely normal. The important point to keep in mind is that if the data science community has been a valuable resource for your growth, by posting your work, you are helping others overcome their own challenges as they enter field of data science.
This will also be a good time to brag to your non-technical friends about your project. In business, data scientists often have to communicate with non-technical stakeholders, and this is a wonderful opportunity to practice that skill. In general, my friends are curious about my work, and they enjoy conversations about data science (given that I communicate in a way that they can understand).
Data scientists do data science. Although you might not have a job as a data scientist yet, by completing data science projects, you are doing data science (and, dare I say, you are a data scientist).
Of course, many of us are starting at level zero, and our first few projects won’t have the level of sophistication of more experienced data scientists'. By continually doing projects, we can level up our skills and eventually work on cooler projects.
After your first project, you may continue to expand the scope, or start on a new project with a new dataset. In my case, once I have completed the visualization milestone for my College Enrollment Exploration project, I could try implementing machine learning algorithms, or I could shelve it and begin working with another dataset (such as NCES’s Common Core of Data). The key is to work with datasets and topics that interest you while continually expanding your capabilities.