Hobbyists discover how to insert custom fonts into AI-generated images

Enlarge / An AI-generated example of the Cyberpunk2077 LoRA, rendered with Flux dev.

Last week, a hobbyist experimenting with the new Flux AI image synthesis model discovered that it is surprisingly good at rendering individually trained reproductions of fonts. While far more efficient methods of displaying computer fonts have existed for decades, the new technique is useful for AI image tinkerers because Flux is able to render representations of precise text, and users can now insert words rendered in custom fonts directly into AI image generations.

We've had the technology to precisely create smooth, computer-generated fonts in custom shapes since the 1980s (in research since the 1970s), so creating an AI-replicated font isn't big news in itself. But thanks to a new technique, you could see a specific font in AI-generated images, such as on a menu board in a photorealistic restaurant or on a printed business card held by a cyborg fox.

Shortly after the emergence of common AI image synthesis models such as Stable Diffusion in 2022, some people started asking: How can I insert my own product, garment, figure or style into an AI-generated image? An answer to this came in the form of LoRA (Low-Rank Adaptation), a technique discovered in 2021 that allows users to extend the knowledge in a base AI model with modular add-ons that have been specially trained.

These LoRAs, as the modules are called, allow image synthesis models to create new concepts that were not originally present in the base model's training data (or were poorly represented). In practice, they are used by hobbyist image synthesis developers to represent unique styles (say, anything in chalk art) or subjects (e.g., detailed images of Spider-Man). Each LoRA must be specifically trained using examples provided by the user.

Until Flux, most AI image generators weren't very good at accurately rendering text within a scene. If you asked Stable Diffusion 1.5 to render a sign that said “Cheese,” it would return gibberish. OpenAI's DALL-E 3, released last year, was the first mainstream model that could render text reasonably well. Flux still sometimes makes mistakes with words and letters, but it's the most capable AI model we've seen yet at rendering “in-world text” (you could call it that).

Since Flux is an open model that is available for download and fine-tuning, last month was the first time that training a font with LoRA might make sense. This is exactly what a An AI enthusiast named Vadim Fedenko (who did not respond to an interview request by press time) was recently discovered. “I'm really impressed with how this turned out,” Fedenko wrote in a Reddit post. “Flux recognizes what letters look like in a certain style/font, which makes it possible to train Loras with specific fonts, typefaces, etc. I'll be training more of these soon.”

For his first experiment, Fedenko chose a bubbly “Y2K” style font, reminiscent of the fonts popular in the late 1990s and early 2000s, and released the resulting model on the Civitai platform on August 20. Two days later, a Civitai user named “AggravatingScree7189” published a second LoRA font that reproduces a font similar to one used in the Cyberpunk2077 video game.

“The text was so bad before that it never occurred to me that you could do something like that,” wrote a Reddit user named eggs-benedryl in response to Fedenko's post about the Y2K font. Another Redditor wrote, “I didn't know the Y2K Journal was fake until I zoomed in on it.”

Is it exaggerated?

Ein Beispiel für <em>Cyberpunk2077</em> LoRA rendered with Flux dev.” src=”https://cdn.arstechnica.net/wp-content/uploads/2024/08/without_with_2-640×357.jpg” width=”640″ height=”357″ srcset=” https://cdn.arstechnica.net/wp-content/uploads/2024/08/without_with_2.jpg 2x”/><figcaption class=
Enlarge / An example of the Cyberpunk2077 LoRA, rendered with Flux dev.

It's true that using a well-trained image synthesis neural network to render a plain old font on a plain background is probably overkill. You probably wouldn't want to use this method as a replacement for Adobe Illustrator when designing a document.

“This looks good, but it’s kind of funny how we’re reinventing the idea of ​​fonts as 300MB LoRAs,” wrote one Reddit commenter in a thread about the Cyberpunk2077 Font.

Generative AI is often criticized for its environmental impact, and that is a legitimate concern for huge cloud data centers. However, we find that Flux can inject these fonts into AI-generated scenes while running locally on an RTX 3060 in quantized (reduced in size) form (and the full development model can run on an RTX 3090). The power consumption is similar to playing a video game on the same PC. The same goes for creating LoRA: The creator of the Cyberpunk2077 Font trained LoRA in three hours on a 3090 GPU.

There are also ethical issues with using AI image generators, such as training them on collected data without the consent of the content owner. Although the technology is causing controversy among some artists, a large community uses it daily and shares the results online through social media platforms like Reddit, leading to new applications of the technology like this one.

At the time of writing, there are only two custom Flux font LoRAs, but we've already heard that people have plans to create more as of this writing. While it's still in the earliest stages, the technique for creating font LoRAs could become fundamental as AI image synthesis becomes more widespread in the future. Adobe, which has its own image synthesis models, is likely to be watching.

You may also like...