This AI Generates Photos Using Only Text Captions as a Guide

Researchers at the Allen Institute for Artificial Intelligence (AI2) have created a machine learning algorithm that can produce images using only text captions as its guide. The results are somewhat terrifying… but if you can look past the nightmare fuel, this creation represents an important step forward in the study of AI and imaging.

Unlike some of the genuinely mind-blowing machine learning algorithms we’ve shared in the past—see here, here, and here —this creation is more of a proof-of-concept experiment. The idea was to take a well-established computer vision model that can caption photos based on what it “sees” in the image, and reverse it: producing an AI that can generate images from captions, instead of the other way around.

This is a fascinating area of study and, as MIT Technology Review points out, it shows in real terms how limited these computer vision algorithms really are. While even a small child can do both of these things readily—describe an image in words, or conjure a mental picture of an image based on those words—when the Allen Institute researchers tried to generate a photo from a text caption using a model called LXMERT, it generated nonsense in return.

Blog