If you’ve been on social media this week, you’ve probably seen the bizarrely specific melty images popping up on your feed: a court sketch of Godzilla on trial, George Constanza holding multiple cats, a spaghetti parade — just to name a few. These are the product of DALL-E Mini, an AI application that will generate nine (often unintentionally cursed) images based on combinations of words that users type.
Here are the AI-generated images that DALL-E Mini created for “learn to code,” for example:
The mashups can be delightfully deranged, which is why DALL-E Mini has taken off on social media. Naturally, there’s a whole subreddit dedicated to weird DALL-E Mini creations.
If you’re learning how to code (or just thinking about it), you might be curious what goes into building this type of technology. Here’s how — plus the programming skills that enable you to build a tool as viral (and nightmarish) as DALL-E Mini.
The story of how DALL-E Mini was built
Machine Learning Engineer Boris Dayma built DALL-E Mini last summer as part of a month-long competition hosted by the AI community Hugging Face and Google. Boris was inspired by a more sophisticated AI called DALL-E, which was created by the lab OpenAI.
Determined to build a similar model to the OG, Boris spent months digging into early DALL-E research, but “I still had no clue how I would do it,” Boris said in an interview with YouTuber Abhishek Thakur.
“As I went through the code, I got a bit scared,” Boris said. “There’s a lot of things that were implemented — it’s kind of complex.”
Per the competition guidelines, Boris and his team had to use Google’s framework JAX to program their DALL-E Mini version. None of them had experience using JAX, which incorporates Python and NumPy programs, so they had to learn from scratch. (If you’ve never heard of the Python library NumPy, our beginner-friendly course Learn Statistics with Python covers how to use it.)
“We had to make the architecture as simple as possible, and we had to leverage a lot of existing code, leverage existing models, and try to write as little as possible,” Boris said. “It’s an approach I always have: When there’s a problem to solve, always try to see is there already an existing solution. If there is one, just use it, and if it works good enough — that’s it, you’re done.”
Judging by the enthusiastic response to DALL-E Mini, the internet thinks it works just fine. In fact, so many people have been trying to spawn their own DALL-E Mini images that the site can’t handle all of the requests due to too much traffic. “We want to keep the balance between people being able to access it while also being aware of costs,” Boris told the UK news outlet i. (Conjuring up DALL-E Mini’s images takes a lot of computing power, which costs money.)
How DALL-E Mini works
DALL-E is complicated — even for machine learning engineers like Boris and his team. But to put it as simply as possible, DALL-E Mini is trained to recognize pre-encoded images.
Training DALL-E Mini involves passing batches of images and descriptions through a system of encoders and decoders until a pre-trained neural network (a programming model inspired by the human brain) is able to create correlation between images and text.
“The model can only be as good as the data set,” Boris said in the YouTube interview. In this case, the team trained DALL-E Mini on 15 million pairs of images and text, which is relatively limiting and explains the warped and weird images, particularly on animals and faces. A bigger dataset and more time to train could improve DALL-E’s yield in the future. The creators noted that the longer DALL-E Mini trained, the better the image quality got.
DALL-E Mini uses a seq2seq model, which is typically used in natural language processing (NLP) for things like translation and conversational modeling. (You can learn how to use seq2seq in our Text Generation course.) “The same idea can be transferred to computer vision once images have been encoded into discrete tokens,” according to an article written by DALL-E Mini’s creators.
How to get started in machine learning and AI
Want to learn more about the world of AI engineering, but don’t know where to start? Our beginner-friendly path Machine Learning Fundamentals will walk you through how machine learning models are created to find patterns in data. In Build Deep Learning Models with TensorFlow, you’ll get a taste of deep learning, which is a type of machine learning inspired by the architecture of the human brain.
Apply Natural Language Processing with Python will teach you how to create your own NLP tools and help you better understand how computers work with human language. And if you already feel comfortable with Python, head to our path Get Started with Machine Learning to have fun with AI and machine learning. You can also experiment creating chatbots through our Build Chatbots with Python skill path.
If you’re fascinated by data science’s role in these models, our career path Data Scientist: Machine Learning Specialist will dive into how to apply machine learning to data and optimize algorithms. And if you’re really vibing with these courses, don’t write off the different types of careers that you can get in data science. Who knows? Maybe you’ll use AI to build the next viral sensation.