逐行对照 MiniMind 源码精读、并延伸到大模型技术体系的中文学习笔记 —— 预训练 / SFT / DPO / PPO / GRPO、训练机制、MiniMind2→3 版本对照、真实实验证据。
Terminal score: 0–1000 raw, weighted across 4 dimensions. Public score: 0–10 normalized (shown in the 30-day stars chart above).