AI is everywhere. It influences which words we use in texts and emails, how we get our news on X (formerly Twitter), and what we watch on Netflix and YouTube. (It’s even built into the Codecademy platform you use to learn technical skills.) As AI becomes a seamless part of our lives and jobs, it’s crucial to consider how these technologies affect different demographics.
The consequences of racial biases in AI, for example, are well-documented. In healthcare, AI aids in diagnosing conditions and making decisions about treatment, but biases arise from incorrect assumptions about underrepresented patient groups, leading to inadequate care. Similarly, in law enforcement, predictive policing tools like facial recognition technology disproportionately target BIPOC communities, exacerbating racial inequities.
So, how do we prevent bias in AI in the first place? It’s a big question that all developers and people who interact with technology have a responsibility to think about.
There are avenues for bias to occur at every stage of the development process, explains Asmelash Teka Hadgu, a Research Fellow at the Distributed AI Research Institute (DAIR). From the very beginning, a developer could conceptualize a problem and identify a solution space that doesn’t align with the needs of a community or an affected group. Bias can also show up in the data that’s used to train AI systems, and it can be perpetuated through the machine-learning algorithms we employ.
With so much potential for bias to creep into AI, algorithmic discrimination can feel inevitable or insurmountable. And while undoing racial biases is not as simple as building a new feature for an app or fixing a bug, there are proactive measures we can all take to address possible risks and eliminate bias to the best of our abilities. Ahead, Asmelash breaks down how these biases manifest in AI and how to prevent bias when building and using AI systems.
Learn something new for free
How do racial biases manifest in AI, and what threats do they pose?
Asmelash: “If we zoom out a bit and look at a machine learning system or project, we have the builders or researchers who combine data and computing to create artifacts. Hopefully there’s also a community or people that their systems and research are intended to help. And this is where bias can creep in. From a builder’s perspective, it’s always good to assess (and possibly document) any biases or assumptions when solving a technical problem.
The second component is biased data, which is the first thing that comes to mind for most people when we talk about bias in machine learning. For example, big tech companies build machine learning systems by scraping the web; but we know that the data you find on the web isn’t really representative for many races and other kinds categorizations of people. So if people just amass this data and build systems on top of them, [those systems] will have biases encoded in them.
There are also biases that come from algorithm selection, which is less talked about. For example, if you have imbalanced data sets, you should strive to use the right kind of algorithms so you don’t misrepresent the data. Because, as we said, the underlying data might be skewed already.
The interplay between data and algorithms is difficult to tease apart, but in scenarios where you have class imbalance and you’re trying to do classification tasks, you should explore subsampling or upsampling of certain categories before blindly applying an algorithm. You could find an algorithm that was used in certain contexts and then, without assessing the scenarios where it works well, use it on a data set that doesn’t exhibit the same characteristics. That mismatch could exacerbate or cause racial bias.
Finally, there are the communities and people we’re targeting in machine learning work and research. The problem is, many projects don’t involve the communities they’re targeting. And if your target users aren’t involved, it’s very likely that you’ll introduce biases later on.”
How can AI developers and engineers help mitigate these biases?
Asmelash: “DAIR’s research philosophy is a great guide, and it’s been really helpful as I practice building machine learning systems in my startup, Lesan AI. They explain how, if we want to build something for a community, we have to get them involved early on — and not as data contributors, but as equal partners of the research that we’re doing. It takes time and trust to build this kind of community involvement, but I think it’s worth it.
There’s also accountability. When you’re building a machine learning system, it’s important to make sure that the output of that project isn’t misused or overhyped in contexts that it’s not designed for. It’s our responsibility; we should make sure that we’re accountable for whatever we’re building.”
What can organizations and companies building or employing AI tools do?
Asmelash: “There’s a push toward open sourcing AI models, and this is great for looking into what people are building. But in AI, data and computing power are the two key components. Take language technologies like automatic speech recognition or machine translation systems, for example. The companies building these systems will open source all of the data and algorithms they used, which is fantastic, but the one thing they’re not open sourcing is their computing resources. And they have tons of it.
Now, if you’re a startup or a researcher trying to do something meaningful, you can’t compete with them because you don’t have the computing resources that they have. And this leaves many people, especially in developing companies, at a disadvantage because we’re pushed to open source our data and algorithms, but we can’t compete because we lack the computing component and end up getting left behind.”
How about the average person using these tools — what can individuals do to help mitigate racial bias in AI?
Asmelash: “Say a company creates a speech recognition system. As someone from Africa, if it doesn’t work for me, I should call it out. I shouldn’t feel ashamed that it doesn’t work because it’s not my problem. And the same goes for other Black people.
Research shows that automatic speech recognition systems fail mostly on Black speakers. And when this happens, we should call them out as users. That’s our power. If we can call out systems and products and say ‘I’ve tried this, it doesn’t work for me’ — that’s a good way of signaling other companies to fill in that gap. Or letting policymakers know that these things don’t work for a certain type of people. It’s important to realize that we, as users, also have the power to shape this.
You can also contribute [your writing skills] to machine learning research. Research communication, for example, is such a big deal. When a researcher writes a technical research paper, they’re not always interested in communicating that research to the general public. If somebody’s interested in this space, but they’re not into coding and programming, this is a huge unfilled gap.”
Conversation has been edited for clarity and length.
Learn more about AI
Feeling empowered to pursue a career in AI or machine learning? Check out our AI courses to uncover more about its influence on the world. Start with the free course Intro to ChatGPT to get a primer on one of the most advanced AI systems available today and its limitations. Then explore how generative AI will impact our future in the free course Learn the Role and Impact of Generative AI and ChatGPT.