Why building a data science team is deceptively hard

More and more startups are looking to hire data scientists who can work autonomously to derive valuable insights from data. In principle, this sounds great: engineers and designers build the product, while data scientists crunch the numbers to gain insights. In practice, finding these data scientists and enabling them to be productive are very challenging tasks.

Before diving further, it's useful to note a few trends in data and product development that have emerged over the past decade:
Companies such as Google, Amazon and Netflix have shown that proper storage and analysis of data can lead to tremendous competitive advantages.

It’s now feasible for startups to instrument and collect vast amounts of usage data. Mobile is ubiquitous, and apps are constantly emitting data. Big data infrastructure has matured, which means large-scale data storage and analysis are affordable.

The widely adopted lean startup philosophy has shifted product development to be much more data-driven. Startups now face the challenges of defining Key Performance Indicators (KPI), designing and implementing A/B tests, understanding growth and engagement funnel conversion, building machine learning models, etc.

Because of these trends, startups are eager to develop in-house data science capabilities. Unfortunately many of them have the wrong ideas about how to build such a team. Let me describe three popular misconceptions.

Misconception #1: It's okay to compromise the engineering bar for statistical skills.

For data scientists to work productively and independently, they must be able to navigate the entire technical stack and work effectively with existing systems to extract relevant data. The only exception is if a startup has already built out its data infrastructure. But in reality, very few startups have their infrastructure in place before building a data science team. In these cases, a data science team without strong engineering skills or engineering support will have a hard time doing their job. At best, they will produce suboptimal solution that will be rewritten by another team for production.

To illustrate this, take the example of building the KPI dashboard at Codecademy. Before visualizing the data in d3, I had to extract and join (a) user data from MongoDB, (b) cohort data from Redis, and (c) pageview data from Google Analytics. The data collection alone would've been near impossible without an engineering background, let alone making the dashboard real-time, modularized and reusable.

Misconception #2: It's okay to compromise the statistics bar for engineering skills.

Proper interpretation of data is not easy, and misinterpreted data can do more damage than data that's not interpreted at all (check out "Statistics done wrong"). Building useful machine learning (ML) models is trickier than most people expect. A popular but misguided view holds that ML problems can be solved either by applying some black box algorithm (a.k.a magic), or by hiring interns who are PhD students. In practice, hundreds of decisions and tradeoffs are made in solving such problems, and knowing which decisions to make requires a lot of experience. (I’ll expand upon this more in a future post titled “Machine learning done wrong”.) For a given ML problem, there are tens if not hundreds of way to solve it. Each solution makes different assumptions and it's not obvious how to navigate and identify which assumptions are reasonable and which model should be used. Some would argue: why not just try all different approaches, cross validate, and see which one works the best? In reality, you never have the bandwidth to do so. A strong data scientist might be able to produce a sensible model right off the bat, while a weak one might spend months optimizing the wrong model without knowing where the problem is.

Misconception #3: It's okay to hire data scientists who lack product thinking.

Imagine asking someone who doesn't have a holistic view of the product to optimize your business KPIs. They may prematurely optimize the sign up funnel before making sure the product has reasonable retention, which would lead to more unretained users. Some think data-driven product development is a local optimization. This criticism is only correct when those who drive product development with data fail to think about the product holistically.

To sum up, a productive data science team requires data scientists that are strong in engineering, statistics, and product thinking. It's hard. And it becomes even harder to look for the first data science hire who will be spearheading data efforts in a startup. For startups that don't have the luxury to wait and hire these rare data scientists, it's important to be aware of the compromises made especially in terms of the hiring bar. Before the data team is strong enough across all three areas, make sure they have strong support for the skills they lack, and don't expect them to work autonomously.

If you like the post, you can follow me (@chengtao_chu) on Twitter or subscribe to "ML in the Valley". Also, special thanks Ian Wong (@ihat), Leo Polovets, and Bob Ren (@bobrenjc93) for reading a draft of this.

Codecademy Reimagined

We’ve been working hard over the past four months trying to reimagine Codecademy and we couldn’t be happier to finally unveil it to the world. We have redefined every component under our brand, from a single button on our dashboard to our email template, business cards, slides and even apparel.

We had been discussing a design refresh for a while, but somehow it always ended up being pushed to the side. Finally, in October last year, after completing a user segmentation project that brought to live the main user archetypes of Codecademy.com, it quickly became apparent that if we wanted to grow and mature as a brand, we required a thorough redesign of our entire product.


Why a redesign?

Reason #1 – Start fresh

First, there was the obvious problem of design incoherence and variation. This happened primarily because we lacked a well-defined color and font palette, a uniform visual language for our badges, a unified layout scheme (page types), and a cohesive strategy for all print materials – business cards, postcards, posters, etc. After two and a half years of multiple nip and tuck design fixes and additions, it was time to clean up the house and start fresh. This meant we could finally create an extensible UI pattern library (used and shared by designers and developers) and optimize our new face across multiple platforms by embracing a responsive design layout.

Some of our old webpages
A random sampling of pages within our old web ecosystem, showcasing some visual design inconsistency.

Reason #2 – Brand matureness

Secondly, was the realization that our young startup look and feel was slowly becoming incompatible with our future goals and aspirations. In a time when we are engaged in several partnerships with schools, companies, and governments across the globe, while also continuing to fulfill the needs of our growing user base, our brand should feel a bit more mature, inviting, professional, and sophisticated.

Our old logo
Codecademy’s quirky and undistinguished old logo was created by one of our co-founders in a few minutes by browsing through various fonts in a word processor. The logo featured the giddy lobster font, which has become so popular that is at times compared with Microsoft’s Comic Sans.


Our new look

Phase 1

Back in early November last year, we partnered with our friend Eddie Opara, and his immensely talented team of designers at Pentagram, in order to create a new visual identity that could better reflect the company’s age, ambition, and main attributes.

The first thing we tackled was the logo, as the key centerpiece of our new look. We spent some time talking to our users, colleagues, and our founders Zach and Ryan, to have a solid grasp of Codecademy's perception and future aspirations. After this important research period, we went through several revisions, continuously narrowing down on the mark that best represented our main traits.

While putting the finishing touches to our new logo, we began creating a complementary color, font and iconography palette. It was important to handle all these components simultaneously, so we could delineate a consistent design thread through all of them. Phase 1 gave us the most critical building blocks of our new brand, through our partnership with Pentagram, and marked the beginning of an exclusive in-house development period.

Early directions of our logo
Various early directions for our new logo.

narrowing down on the logo
Narrowing down on a few favorite visual marks.

our final logo
Our final logo with its underlying grid.

our new graphical language
Our new graphical language used across the site to indicate different types of content (symbols), actions and controls (icons), and learning achievements (badges).

our new font
FF DIN Rounded - Our primary typeface.

our new color palette
Our new color palette.

Phase 2

After defining the main brand pieces with Pentagram (logo, typography, iconography, color), we started applying it internally to our entire web ecosystem by building a comprehensive number of reusable design patterns. For two weeks we built a sizable UI toolkit covering a variety of elements (see below).

our first UI toolkit
Our first attempt at the UI toolkit encompassing only a short amount of elements.

our final toolkit
Our growing UI toolkit covering every element, such as header, footer, form fields, button styles, sign up modules, grid, padding, typography, colors, and interactions.

Phase 3

This was the longest, and perhaps most exhausting, of all phases, where we redesigned 70+ webpages in tandem with other collateral material (email templates, slides, apparel, etc). Fortunately, we imposed ourselves a very well defined timeline, with multiple cycles and milestones, which helped us guide through this large task (see sitemap below).

sitemap
First, we created a comprehensive sitemap of Codecademy.com and then divided the sitemap into four groups, each representing a 2-week delivery cycle. As we redesigned the various pages in each cycle, our brilliant team of developers built our UI styleguide and constructed many of the pages based on the shared design patterns.

examples 1
Examples of our redesigned pages, from left to right: Enterprise, Stories, Profile.

examples 2
Examples of our redesigned pages, from left to right: Blog, About, Jobs.

examples 3
Examples of our redesigned pages, from left to right: Help Center, UK Curriculum, Hour of Code.

Phase 4

Our final phase was all about making sure we were building the thing right. We implemented and tested our new redesign, while in the process getting feedback from our community. We created a huge amount of redlines for all the new material, started experimenting with some versions live on the site, and listened to dozens of comments from our selected users and moderators.

redlines
An example of the various comps created to support the accurate implementation of all our redesigned pages.


The beginning

Even though we spent a long time rethinking Codecademy, this hefty work is still unfinished. It certainly provides our team and product with a much-needed fresh face, one that we can feel proud of, and most importantly, one that our users can thoroughly enjoy. But this is just the beginning. We would love to hear what you have to say about our redesign and how we can continue improving our product. We have dozens of ideas to continue pushing this brand foreword. Please keep coming back for more!

We're Learning Too: A New Codecademy

Two years ago, we started building a product that would help teach people the skills they needed to succeed in a digital world. As more than 24 million people took Codecademy courses on our web and iOS platforms, we too learned and grew. Now, we’re excited to show you our latest project — a new Codecademy designed from the ground up, aimed to help you learn skills hands-on, with real projects, and constant feedback. Better yet, the new Codecademy experience helps to connect you with the real skills you’ll need to succeed in today’s workplace.

Logo

Learning isn’t just about one exercise or “class,” but instead is a gateway to community, opportunities, ideas, and a better life. We’ve witnessed this through the millions of learners on Codecademy and through the thousands of inspiring teachers who have shared their knowledge with the world with our course creator. We listened to them while building what we think is the best learning experience — for anyone, anywhere — to learn the most important skills of today.

In two years, Codecademy has scaled to become larger than we had ever imagined. Our learners, spread across the globe in every country in the world, have:



Today, we’re proud to show off the results of all of that to a few friends and, within days, the rest of the world. The first fruits of this effort are an experience that gets you from knowing nothing to building a website — in this case, Airbnb’s homepage. Along the way, you’ll experiment with blocks of code, see the results of adding and subtracting different parts of a page, and use the real terminology that developers and designers all over the world over use to create websites just like Airbnb’s.

Build a professional website

Our new platform leaves you not just with new knowledge, but with a portfolio of projects you can share with your friends, enabling them to learn from you. We’ve even built the capability for you to share your work with future employers, and to demonstrate your new skills. We’ve been testing our new learning interface for weeks and we’ve seen it applied in an amazing number of ways — from designers at major firms winning new consulting work because of their ability to build their designs to students in high school making personal webpages for themselves.

Learning environment

Codecademy’s learning experience comes not just from the data behind 24 million learners and billions of lines of code, but also from the individual stories we’ve heard from our wealth of committed learners. Former book critic Juliet Waters, for instance, started learning with her 11-year old son as part of our Code Year program in 2012. Since then, she’s gone on to chronicle her journey in a book that’s coming soon, noting that programming helped her feel “more connected with others in our tech-driven society.” A parent named Shari told us that her 11 and 13-year old sons had a “reaction to what they are learning [that] beats their enthusiasm for the [video games].” We work hard everyday to deliver a similar experience for our users all around the world — with more than 65% of users outside the US, it’s important to us that we’re democratizing access to the fundamental blocks of knowledge that can improve peoples’ lives.

Panel

Tommy Nicholas’ story is just the sort we’re hoping our new learning environment will foster: he began with almost no programming knowledge at all, and gained enough skills to develop a website, Coffitivity, that was named one of TIME Magazine’s Top 50 websites in 2013.



Billions of lines of code, millions of users, and years after our founding, we’ve been astonished by what people can do when they can easily learn the fundamental skills that can transform their lives. Today, we’ve redesigned Codecademy to reflect that potential — and hopefully to help more people reach their goals and build the future they want to live in.

If you want to help us, we're hiring!

Codecademy's First iPhone App

Codecademy started to help anyone learn the skills they need in order to succeed in the twenty-first century. We want Codecademy to be with you everywhere - learning shouldn't be confined to a classroom or a desktop computer.

We launched our first iPhone app, Codecademy: Hour of Code, for anyone in the world to get started learning to code on the go. We've built an entirely new Codecademy experience for mobile that includes the same things that make Codecademy on the web great - interactivity, "snack" sized content, and fun lessons. Our first app gives you the basics of programming and should help absolutely anyone get started with programming - it's almost too easy not to try!

alt text

We'll send content updates to this app with more courses for you to complete as time goes on. You'll see more feature updates as well.

Perhaps best of all, this is just the beginning of Codecademy on the go. We want to help you learn the skills that can help you change your life - anywhere and anytime. Download our first app and let us know what you think!

Codecademy & English Computing Curriculum

Today, Codecademy is really pleased to announce our partnership with Computing at School (CAS), the leading authority on the new computing curriculum in England. England is the first country to mandate programing in schools. Starting in September 2014, students aged between 5-16 will learn HTML, CSS, Python and JavaScript. CAS were largely responsible for designing the curriculum objectives, and are leading the way in teacher training.

Teacher training is critically important and Codecademy is pleased to partner with CAS to be a part of the solution. Teachers can use Codecademy resources after they've attended training sessions to continue building their skills, and remote teachers can access the platform if they are unable to attend in person training sessions. Below is the official announcement:

From September 2014 schools in England will teach a new statutory computing curriculum, which aims to ensure all students can understand and apply the fundamental principles and concepts of computer science. This will make England the educational envy of almost every other country in the world, but it will also be a major step change from what schools currently teach. Not surprisingly this has left many teachers looking for support and further training.

CAS is running a national Network of Excellence for Teaching Computer Science, that aims to provide exactly this support and training. Codecademy, based in New York, will complement CAS’s in-person approach with a free online platform and interactive learning resources specifically designed to support the programming aspects of the new computing curriculum in England. Teachers can use it to learn programming themselves, or as a way to teach programming to their students.

Simon Peyton-Jones, Chair of CAS says “The UK has tens of thousands of teachers who need support and encouragement to deliver the new Computing curriculum with confidence and enthusiasm. Codecademy offers us the scalability of an online platform, and teachers can move smoothly from learning programming themselves to Codecademy to teach their students. I’m delighted to have this support.”

Codecademy Goes to Back to School

alt text

It's that time of the year again. The air is crisper, lattes are spiced with pumpkin-flavor extract, and Codecademy is coming to a campus near you!

Codecademy is taking to the road for a whirl-wind college tour up and down the eastern seaboard. We've had a great time sharing stores from our college fellowship program and spilling the technical details on how we evaluate up to 5,000 code submissions a second (over 25 million a day!)

alt text

So far, we've had a turnout at Brown that would scare any fire marshall and we've stayed up all night to hack with the best and brightest at MIT's blow-out HackMIT event. But we're just getting started, so come join us if we're visiting a campus near you!

Hope to see you soon!


MIT tech talk
10/8 at 7pm
Bldg 5, Room 233

Olin tech talk
10/9 at 7pm
Academic Center 126

Princeton talk
10/11 at 9pm
Code@Night

MIT On campus interviews
10/25
Apply on careerbridge, if interested

Codecademy at HackMIT

A few of us at Codecademy spent the weekend at MIT for HackMIT - a hackathon involving over 1000 college students from around the world. It's been an awesome experience seeing so many coders in one location hacking away at amazing projects.

The atmosphere inspired us to start hacking ourselves. For our hack, we decided to programmatically find the most active Codecademy users at HackMIT and give them some love and swag.

By querying our database for all of the emails of attendees at HackMIT, we were able to find all the Codecademy users in attendance. We then sorted the list of users by points and achievements to find the top 3 most active Codecademy users at HackMIT.

Here is a photo of us with our top HackMIT user - spiltpeasoup!

We were surprised to find out that over 25% of hackers at HackMIT have a Codecademy account! If you are at HackMIT this weekend, stop by and say hi!

Codecademy Site Downtime Friday 9/13

As some of you may have realized, Friday morning at about 10:00am, our site was not operable for 2 hours. We apologize for the inconvenience and wanted to explain to you why this happened.

Our hosting provider, Amazon Web Services (AWS), was having networking issues. This affected our app servers, app load balancer, and redis boxes. Some of you may have noticed a 503 error, which was thrown by our CDN (content delivery network). During these two hours, we are able to restore the site, but because of the networking issues, the site was very slow. At 12:07 Amazon Web Services restored the issue, and the site was back up and running as normal. Because this was a networking issue, no content or progress was lost.

Again we apologize for any inconvenience caused by this downtime. Unfortunately this particular issue was out of our control. We're investigating ways we can add greater redundancy to Codecademy to help ensure we're protected from similar issues in the future.

See what we posted on twitter Friday morning.

Glossary Redesign

We're always asking ourselves how we can help users when they get stuck while working through our courses. I had an idea at a company hackathon that turned into a big project.

For the past few weeks we've been running my idea as an experiment: when certain users write code that returns a common syntax error, we show them a snippet of code from the glossary that's an example of what they were trying to do.

The idea was that people need to see examples of good code before they can learn how to write good code.

In order to do this we had to redo the internals of the glossary to be more dynamic. With that done, another great result has been that we were able to hand over editing privileges to our moderators, who have been thrilled to take ownership of this part of the site and improve the content. They've been doing a good job of cleaning it up and adding to it, and this will be an on-going thing.

The best part of this is that as the moderators improve our glossary, the code suggestion feature will get smarter and show more examples! They come right from the glossary.

The overhaul was also aesthetic; the glossary is now much prettier and easier to read. The typography improved thanks to some expertise from our designer Jason and the code samples now match the color of our code editor.

Go take a look!

codecademy.com/glossary/javascript
codecademy.com/glossary/python
codecademy.com/glossary/ruby
codecademy.com/glossary/html
codecademy.com/glossary/css

Learn Page Redesign

We’ve recently launched our new Learn Page. In case you haven’t noticed, please see it in detail here.

alt text

This redesign was long overdue and we’ve collected many tips and ideas from our community over the past few months. The goals of the new layout were the following:

  • Provide a clearer order and grouping of content, divided between individual languages and goal-driven paths (Web Projects and APIs for now).
  • Suggest a clear starting point for newcomers (Web Fundamentals) placed at the very top of the page.
  • Deliver an indication of progress on all initiated tracks. Notice how the “Explore” button changes into a “Continue” button with corresponding completion percentage on all ongoing courses.
  • Offer a redesign more in line with our brand and existing color scheme.

We hope you like it, and as always please let us know if you have any thoughts or suggestions.