Last month, Codecademy and Warby Parker came together to work on a special Learn SQL from Scratch Capstone Project. It was during this time when I met Ryan Tuck, a Data Engineer at Warby, who played a major part in this partnership. So when he decided to drop by our office for the final QA round, I had to break out my notebook and ask some questions. Enjoy.
Hey Ryan, let’s start off with a question I’ve had for a while — what is a Data Engineer? (Is it similar to a Data Analyst or a Software Engineer?)
At Warby Parker, data engineers are responsible for creating and maintaining the plumbing required to support the data and reporting needs of the business. We use software engineering practices to automate the work of data cleaning, normalizing, and model building so that data is always ready to be consumed by data analysts in every department.
What languages/frameworks do you use at Warby?
On data engineering, we use Python as our general purpose programming language, as do most of the other teams in our Technology department. When it comes to databases, we use PostgreSQL for the majority of our SQL needs, and are beginning to use Amazon Athena and Google BigQuery for some of our larger datasets. We use Looker as our exclusive business intelligence entry point to all of this data.
What are some of the projects you worked on?
I’ve had the privilege of working with a lot of of smart people in every department at our company to help them solve their varied data needs, from reconciling financial data with the Accounting team to automating and modeling standardized performance metrics for our team of over 200 customer experience advisors.
As part of a team of five supporting the data needs of a rapidly growing company, I’ve tried where possible to focus on helping our analysts solve their own problems. This includes helping people learn Python and commit to our codebase, guiding the creation of data models in SQL, and encouraging people to submit pull requests to add features in Looker, our BI tool.
Seeing dozens of otherwise "non-technical" colleagues opening up PRs on a daily basis, and consequently being part of the democratization of tech that we value at Warby Parker, is probably the most rewarding "project" I've been a part of.
One project finished recently during our first annual “Hackweek” is called Pipes, which allows anyone at the company to easily move large amounts of data from wherever to wherever (Looker, Google Sheets, PostgreSQL, BigQuery, etc) on a regular cadence, or manually through a simple one-line chatbot interface. The adoption has been overwhelmingly positive and we’re looking to grow this sort of tooling out even more.
"We use software engineering practices to automate the work of data cleaning, normalizing, and model building so that data is always ready to be consumed by data analysts in every department."
What got you into the data field?
I’ve always been drawn to analytical fields like math, and became pretty proficient in Excel during some internships in college. Once I had learned to program and learned more about data science and its applications in artificial intelligence, I knew that anything I could do to immerse myself in the world of data would be a step in the right direction.
Three and a half years ago, I landed a job as a junior software engineer at Warby Parker not fully knowing what I was in for, but am so glad I got the opportunity to help build tools to support an interesting and ever-changing data-driven culture here.
Where did you learn SQL and Python?
I had a background in C++, and was exposed to Python through an Intro to Data Science course. When Warby Parker hired me onto the Data team in 2015, I had never written a SQL query in my life, but picked it up quickly and within a few months started up internal SQL training classes, which I still teach on a monthly basis.
What does your tattoo say?
This is Bayes’ Theorem, which is an equation that describes how to update probabilities given new evidence. Two summers ago I worked on building a tool to help predict weekly fantasy football performance. Some colleagues suggested a Bayesian approach would be appropriate, since there aren’t really enough data points in an NFL season to be able to use statistical approaches that require larger datasets, and I’d want to regularly update my predictions after each player’s latest performance.
I did a deep dive into understanding the (simple) math underlying Bayes’ Theorem and came out of that experience with a whole new worldview, understanding my entire knowledge of the world as a big and intricate probabilistic model that I was continuously updating with every experience I ever have. It was pretty transformative, and I figured that was worth a tattoo.
What is a concept in SQL/Python that’s essential to your work?
Donald Knuth said, “Premature optimization is the root of all evil.” I’ve generally found this to be true, and try to live by it in my work. For example, I’ll generally prefer to keep a data model simple by rebuilding it for all time on a daily basis using a single SQL query instead of making a more complicated model that requires iteratively adding to a table, keeping track of state, updated timestamps, when something last ran, etc.
A wise man once said, “Duplicating data makes things go fast,” but databases are already impressively fast to begin with, without implementing anything to improve performance. Ultimately, I almost always approach a problem thinking about optimizing for my time over machine time, for readability over performance, and for introducing as little cognitive overhead as is required by the problem at hand. Only once performance issues or readability issues present themselves will some code be worth a rewrite.
Last question! Since you wrote Warby Parker’s internal SQL training courses, I know there gotta be some inner Curriculum Developer in you. Can you teach a SQL concept in 2 minutes?
Sure! Have you ever written a query that yields some result set and you think, “I’d love to query the stuff I just produced like it was a table?” Enter the
Suppose I have a mega query that gives the transaction summaries:
select transactions.date as transaction_date, sum(items.price) as total_cost, count(*) as number_of_items from transactions inner join customers on customers.id = transactions.customer_id inner join transaction_items on transactions.id = transaction_items.transaction_id inner join items on items.id = transaction_items.item_id
WITH, I can create a temporary table within my query that I can
SELECT from and treat it just like a regular old table.
I will put everything from the previous query in a parentheses and use
WITH to give it the name
Then I’ll apply the date and customer filtering down below for a more readable query, to separate out all the
JOIN logic from the actual
WHERE filters that I want to apply on that data.
with transaction_summaries as ( select transactions.date as transaction_date, sum(items.price) as total_cost, count(*) as number_of_items from transactions inner join customers on customers.id = transactions.customer_id inner join transaction_items on transactions.id = transaction_items.transaction_id inner join items on items.id = transaction_items.item_id ) select * from transaction_summaries where first_name = 'beyonce' and transaction_date > '2018–01–01' order by total_cost desc limit 5
If you’re familiar with subqueries, this does a similar thing but makes the SQL far more readable, even if your query isn’t quite as performant as it would have been. This is essentially an implementation of the mantra “Don’t Repeat Yourself” that’s common in the world of programming.
Incredible. And love the SQL styling! 😍
Huge shout out to Ryan and the whole Warby Parker team for making this partnership happen. Special hat tips for behind-the-scenes support from:
- Lon Binder, Chief Technology Officer, Warby Parker
- Maddie Tierney, Executive Assistant, Warby Parker
- Kayla Robbins, Executive Assistant, Warby Parker
- Kaki Read, Senior Communications Manager, Warby Parker
- Isabel Seely, Senior Brand Manager, Warby Parker
It’s been an absolute pleasure. And of course, the fam at Codecademy. You know who you are. Couldn’t do it without you.