Meta Unveils CM3leon: A State-of-the-Art AI Tool for Text and Image Generation
Meta, the parent company of Instagram and Facebook, has introduced a cutting-edge AI tool called CM3leon, which is designed to generate text and images.
In a recent blog post, Meta announced CM3leon (pronounced "chameleon") and simultaneously published a white paper detailing the technological advancements of this tool. However, the company has not disclosed any specific plans for public release of CM3leon.
Meta's research marks a significant breakthrough in the development of multi-modal models capable of generating both text and images. Currently, there is a divide between AI image generators and AI text generators, such as OpenAI's ChatGPT. Merging the two has been challenging, and although OpenAI released its multi-modal GPT-4 in March, AI developers have not achieved substantial success in this area.
CM3leon bridges this gap by allowing input and generation of text and images, enabling the creation of captions (or image-to-text generation) and images with "super-resolution."
Unlike most AI image generators on the market, which employ diffusion models to generate images by adding and removing Gaussian noise from training data, Meta's approach utilizes a technique called supervised fine-tuning. This involves training text-based transformer models using a dataset of licensed images and captions from Shutterstock, enabling better parsing of complex text and objects to align with user inputs.
Meta researchers stated in their paper that "supervised fine-tuning is critical in training large language models like ChatGPT. Despite this, its application in multi-modal settings remains largely unexplored."
The result is text-to-image generation that produces "more coherent imagery that better follows the input prompts," according to Meta. The company showcased highly compositional examples generated by CM3leon based on prompts such as "a small cactus wearing a straw hat and neon sunglasses in the Sahara desert."
Notably, the model was able to generate a relatively realistic human hand, with only a few glitches, overcoming a historical challenge faced by AI generators.