Back to prompts
LLM pretraining data-mixture sankey

Example images

LLM pretraining data-mixture sankey 1
Charts & InfographicswuyoscarGPT-Image2-Skillcharts-infographics图表信息图

LLM pretraining data-mixture sankey

Landscape 16:9 sankey diagram of a pretraining data mixture, three stages with translucent colored ribbons. LEFT (8 source blocks, heights proportional to tokens): "Common Crawl (

Category
Charts & Infographics
Model
GPT Image 2
Creator
wuyoscar
Source language
en
Views0
Source ID
086
Use in StudioOpen source

Full prompt

Landscape 16:9 sankey diagram of a pretraining data mixture, three stages with translucent colored ribbons.

LEFT (8 source blocks, heights proportional to tokens): "Common Crawl (web) 540B" (muted navy, largest), "arXiv papers 180B" (dusty teal), "GitHub code 160B" (slate gray), "Wikipedia 40B" (soft terracotta), "StackExchange QA 30B" (warm copper), "Books (public domain) 25B" (pale olive), "Patents 18B" (pale navy), "Curated news & forums 15B" (dusty teal).

MIDDLE (3 processing blocks, stacked): "Deduplicated (MinHash + exact)", "Quality-filtered (classifier + heuristics)", "PII-scrubbed (regex + NER)".

RIGHT (3 final splits): "Pretraining set 1.4T tokens" (largest), "Instruction-tune pool 12B tokens", "RLHF preference pool 3B tokens".

Flow ribbons inherit source color with mid-labels showing token counts ("85B", "320B", "44B"). Legend strip at bottom.

Title: "LLM pretraining data mixture and downstream splits". Subtitle: "token counts after deduplication and quality filtering; ribbon thickness ∝ token flow."
Translations

LLM pretraining data-mixture sankey

en

Landscape 16:9 sankey diagram of a pretraining data mixture, three stages with translucent colored ribbons. LEFT (8 source blocks, heights proportional to tokens): "Common Crawl (web) 540B" (muted navy, largest), "arXiv papers 180B" (dusty teal), "GitHub code 160B" (slate gray), "Wikipedia 40B" (soft terracotta), "StackExchange QA 30B" (warm copper), "Books (public domain) 25B" (pale olive), "Patents 18B" (pale navy), "Curated news & forums 15B" (dusty teal). MIDDLE (3 processing blocks, stacked): "Deduplicated (MinHash + exact)", "Quality-filtered (classifier + heuristics)", "PII-scrubbed (regex + NER)". RIGHT (3 final splits): "Pretraining set 1.4T tokens" (largest), "Instruction-tune pool 12B tokens", "RLHF preference pool 3B tokens". Flow ribbons inherit source color with mid-labels showing token counts ("85B", "320B", "44B"). Legend strip at bottom. Title: "LLM pretraining data mixture and downstream splits". Subtitle: "token counts after deduplication and quality filtering; ribbon thickness ∝ token flow."

Prompt/Image Similar

12

LLM Persona Atlas

LLM Persona Atlas

Create a premium conceptual figure for an EMNLP / ACL paper, landscape 16:9, high-resolution, polished editorial-academic style. Theme: "LLM Persona Atlas". This should not look li

Charts & InfographicswuyoscarGPT-Image2-Skill
GPT Image 21 Views
Frontier LLM family tree (2018–2026)

Frontier LLM family tree (2018–2026)

Landscape 16:9 timeline / family tree of frontier LLMs 2018–2026, three vertically stacked lanes over a horizontal time axis. Time axis ticks: "2018", "2019", "2020", "2021", "202

Charts & InfographicswuyoscarGPT-Image2-Skill
GPT Image 20 Views
Multi-agent LLM system architecture

Multi-agent LLM system architecture

Landscape 16:9 high-fidelity systems figure of a multi-agent LLM architecture, in the style of a richly detailed AutoGen / LangGraph / Anthropic Managed Agents Figure 1. Subtle dro

Charts & InfographicswuyoscarGPT-Image2-Skill
GPT Image 21 Views
LLM 架构聊天截图

LLM 架构聊天截图

创建一张逼真的 AI 聊天截图,其中包含一张展示大语言模型工作原理的密集型蓝白配色技术信息图。

Charts & InfographicsYouMindcharts-infographics
GPT Image 21 Views
Indirect prompt-injection attack flow

Indirect prompt-injection attack flow

Landscape 16:9 security-paper figure of an indirect prompt-injection attack against a tool-using LLM agent. Four columns left-to-right, numbered flow markers ①②③④ along the main ar

Charts & InfographicswuyoscarGPT-Image2-Skill
GPT Image 21 Views
LLM 速成课程可视化工具

LLM 速成课程可视化工具

一份基于文本的指令,旨在引导 AI 使用 gpt-image-2 为 LLM 速成课程创建可视化信息图。

Charts & InfographicsYouMindcharts-infographics
GPT Image 20 Views
以贵宾犬为吉祥物的书店品牌项目

以贵宾犬为吉祥物的书店品牌项目

一份精致的书店品牌识别项目,展示了围绕坐在书堆上的玩具贵宾犬吉祥物所构建的 Logo 设计、配色方案、排版、文具、服务卡及店面效果图。

Charts & InfographicsYouMindcharts-infographics
GPT Image 20 Views
Greenery Day Chibi Infographic

Greenery Day Chibi Infographic

Generates a cute Japanese Greenery Day educational poster with a rabbit-eared chibi gardener, nature-themed text boxes, and three informational point cards.

Charts & InfographicsYouMindcharts-infographics
GPT Image 20 Views
周末旅行打包信息图

周末旅行打包信息图

一张温暖的编辑风格旅行打包海报,展示了行李袋的平铺场景,配有带标签的清单面板和步骤卡片,非常适合生活方式信息图、旅行指南和以打包为主题的社交内容。

Charts & InfographicsYouMindcharts-infographics
GPT Image 20 Views
电影感咖啡店视觉小说 UI 原型

电影感咖啡店视觉小说 UI 原型

此内容可生成一张以温馨夜间咖啡馆为背景的写实风格视觉小说游戏截图,适用于网页游戏原型、互动剧概念设计及产品演示。

Charts & InfographicsYouMindcharts-infographics
GPT Image 20 Views
情感动漫电影横幅

情感动漫电影横幅

一款电影级动漫风格的日本电影预告横幅,包含一位紫发少女、戏剧性的日落光影以及六个嵌入式剧情场景,非常适合用于电影海报或宣传主视觉生成。

Charts & InfographicsYouMindcharts-infographics
GPT Image 21 Views
电影感柴犬深蹲英雄海报

电影感柴犬深蹲英雄海报

一张超高细节、大片风格的电影海报,描绘了一只肌肉发达的柴犬力量举运动员正在深蹲巨大的杠铃。非常适合恶搞健身艺术、模因海报或戏剧性的 AI 生成宣传图像。

Charts & InfographicsYouMindcharts-infographics
GPT Image 21 Views