
What it actually takes to train frontier models
Unlocking Visual Intelligence: Black Forest Labs’ Pioneering Path in Frontier AI
(This article was generated with AI and it’s based on a AI-generated transcription of a real talk on stage. While we strive for accuracy, we encourage readers to verify important information.)
Black Forest Labs (BFL) is dedicated to building the foundational layer for visual intelligence, enabling models to understand, generate, and perform actions in the physical and visual world. Co-founder and research scientist Tim Dockhorn explained that BFL serves diverse clients, from major companies like Adobe and Meta to startups. Despite fierce competition, BFL’s origins in resource-constrained university research fostered a unique, efficient approach, allowing its models to compete effectively.
While specific training data is a competitive secret, Mr. Dockhorn confirmed some data is licensed. Crucially, BFL actively releases open model weights and publishes research, fostering community trust and collaboration. Initially focused on image generation and editing, BFL is now shifting towards comprehensive visual intelligence, integrating text, images, video, and audio into its multimodal models.
Human input is vital in the post-training stage, where individuals select preferred images or videos. Mr. Dockhorn acknowledged this introduces potential for bias, and BFL is actively seeking verifiable tasks for visual intelligence to mitigate this, similar to language models. The company maintains a high release cadence, launching approximately three new models annually to incorporate feedback and ensure continuous improvement.
New GPUs offer improved capabilities but come with increased costs. BFL maximizes efficiency, utilizing advanced rack-based systems, and employs tiered pricing for its models, from “ultra pro” to more affordable options, to maintain margins. Acknowledging the prevalence of low-quality AI-generated content, BFL is moving beyond mere image generation to focus on broader visual intelligence applications.
A particularly exciting application area for BFL’s models is robotics. They can serve as a foundation for understanding and as a cost-effective simulation tool for training robots in virtual environments before physical deployment. Visual intelligence encompasses understanding (e.g., image analysis), generation (e.g., instructional videos), and action (e.g., robots executing build instructions).
Black Forest Labs’ strong academic background, with many team members meeting at universities, underpins its research-intensive culture. Openly releasing models and research builds trust and empowers the community to create specific style adapters, expanding utility. Headquartered in Freiburg, Germany, with a growing San Francisco office, BFL is well-known in the AI community, collaborating with major customers like Meta and Adobe.

