How to Showcase your Data Science Skills
If a tree falls in the forest and no one is around to hear it, does it make a sound?
– Philosophy students everywhere
Hiring managers have their own version of that question: If a data scientist doesn’t have a portfolio, do they really have the skills?
You need to have a portfolio. A portfolio is evidence that you have the technical skills and knowledge that you say that you do. But what goes into a portfolio and what does that mean in data science?
What Goes Into a Portfolio
Ultimately, exactly what you put in your portfolio is up to you. This project at the end of our Interview Prep Skill Path will help guide you through making sure that you’ve covered all of the major categories, but the projects themselves are completely up to you.
We recommend that you demonstrate the ability to:
- Formulate questions that can be answered by real data
- Clean and transform a real dataset to prepare it for analysis
- Apply the right statistical techniques for the data to answer the question
- Apply the right machine learning models where applicable
- Support your analysis with visualizations
- Summarize the main takeaways from your project
Throughout this Career Path, you will encounter Capstone and Portfolio projects. These can absolutely be put into your portfolio. However, thousands of people take this Career Path. To really make yourself stand out, you will want to add projects about things you are interested in.
If you like sports, use sports data, if you like science, use scientific data, etc. Showing a little bit of your interests and what you are curious about will not only make the project more fun for you, but also let you showcase your own domain knowledge or curiosity. Interest and enthusiasm in your projects will help set you apart from other candidates when you are applying and interviewing.
There are a lot of places to look for datasets. We have an entire lesson on where to find data later in this Career Path. For right now, the best thing you can do is be curious about the world around you, and keep track of some things that you might want to explore as you build your portfolio.
What a Data Science Portfolio Looks Like
A data science portfolio can take a lot of shapes, you could create your own professional website, upload documents to your LinkedIn or other social media site, or create a repository on GitHub. We highly recommend GitHub. It is simple, lightweight, and by far the most popular place to house a portfolio.
You have a few choices in terms of what kinds of files you want to upload. A general rule is the simpler, the better. You can upload Jupyter notebooks of your projects, .Rmd
or Python files of your code, or PDFs, or presentations of summary reports.
The kind of files you upload will depend on the kind of Data Scientist role you are applying for. A role that is more programming-focused (i.e., a Machine Learning specialization) should have a more programming-heavy portfolio. A role that is more focused on analysis and communication should have at least one summary report.
The Culture of Portfolios
There are a few unspoken rules about portfolios that you should keep in mind. Throughout the Career Path, we’ve tried to be explicit about when these things come up, and we have presented best practices throughout. However, we want to call out some of the potential pitfalls and make sure you avoid them.
Datasets
There are a few datasets that are used for teaching a lot. Specifically, the iris dataset and the titanic dataset are the most common. We use them in the Path, and you will likely find them in the wild. These are so popular that almost everyone who works in data science has seen them. These do not make good portfolio projects because employers might read them as highly scaffolded student projects or as simply replicating a tutorial, and they won’t set you apart from other candidates.
Formatting
It is important to pay close attention to presentation. Using the right file types, and presenting content in the right format is essential to getting that first interview.
For example, if you have written some code to analyze a dataset, it should be stored as a .py
, .ipynb
, .R
, or .Rmd
file. We teach you how to do this in the Career Paths, but it is worth mentioning again that code should almost never be stored in a text document (e.g., .txt
, .doc
, etc.) unless it is formatted as a code snippet and used as an example or evidence.
Another example is Tableau (or PowerBI) visualizations and dashboards. There are two ways to store these that demonstrate proficiency with these tools. First, both platforms offer repositories. Tableau Public is both a great place to show off your work and see what other people are making. The other way is to embed them into a website with an iframe.
You should not take screenshots, download them as jpegs, pdfs, or otherwise transform the rich, interactive dashboards you’ve made into static images.
You can (and should) embed static visualizations (that you make in Python, R, or even Excel or Tableau) into written reports, but the skill you are demonstrating is communicating visually about data (not proficiency in Tabeleau).
Finally, reports and summaries can be woven into your Jupyter Notebooks, presented as written documents (and stored as PDFs), or compiled into slide decks (also stored as PDFs).
Code Hygeine
It takes a long time and a lot of practice to write clear, well-structured code. However, you can take massive steps in that direction by writing well-commented and organized code. Just like writing an essay, writing good code often requires editing. Not all of your projects need to be edited. However, if you choose to highlight one or two (which you can do by pinning them in GitHub), you’ll want to be sure that you’ve written in some comments and reorganized your code for clarity as much as possible.
Grammar
Just like in your resume and cover letter, you want to proofread your portfolio projects to be sure they are free from grammatical or spelling errors. Unlike in your resume/cover letter, a couple typos probably won’t disqualify you. However, every Data Scientist needs to communicate about data, and you want to demonstrate that you can do that clearly. Even simply running any reports you write through a spellchecker will help in this respect.
Conclusion
Ultimately, your portfolio is how you verify that you have all of the skills you’ve worked so hard to acquire. Have fun with it by including some projects that excite you, and be sure to be ready to talk about those projects with recruiters when you get that first interview!
Author
'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'
Meet the full team