Today, we officially release ERNIE 5.1. While inheriting the pre-training foundation of ERNIE 5.0, it compresses total parameters to approximately one-third and active parameters to approximately one-half, achieving leading foundational performance at its model scale using only about 6% of the pre-training cost of comparable models.

To advance the evolution of large models toward autonomous decision-making agents, we built an entirely new disaggregated fully-asynchronous reinforcement learning infrastructure, specifically addressing the global optimization challenges posed by training-inference divergence, low resource utilization, and long-tail effects.

On this foundation, through scaled agentic post-training combined with an end-to-end synergy strategy across environment, expert, and integration stages, we achieved a dual leap in both training efficiency and model capability, ensuring that the model maintains exceptional stability and outstanding performance even when handling complex long-tail tasks.

As one of the current cost-performance benchmarks among Chinese-developed large models, ERNIE 5.1 achieves a leap forward in parameter efficiency and training cost optimization while maintaining flagship-level intelligence. Its performance has been validated on internationally authoritative leaderboards: on May 9, ERNIE 5.1 scored 1,223 to claim 4th place globally and 1st among Chinese models on the Arena Search leaderboard.

ERNIE 5.1 on Arena Search Arena

Visit the official website at https://ernie.baidu.com to chat with the latest ERNIE 5.1 model and explore a new era of intelligence. Baidu AI Studio has also launched an ERNIE 5.1 Playground for hands-on experience: 飞桨AI Studio星河社区-人工智能学习与实训社区.

ERNIE 5.1: Outstanding Agent and Reasoning Capabilities, with World Knowledge Ranking Among Top-Tier Models

ERNIE 5.1 delivers strong results across multiple authoritative industry benchmarks, particularly in agentic capabilities, knowledge, reasoning, and deep search:

  1. Outstanding agentic capabilities on par with the world’s top models: On the τ³-bench and SpreadsheetBench-Verified agent evaluation tasks, ERNIE 5.1 surpasses DeepSeek-V4-Pro, with agentic capabilities approaching those of leading closed-source models. It also performs exceptionally well on the Search Arena leaderboard.
  2. Leading world knowledge and creative writing capabilities: On GPQA and MMLU-Pro evaluations, ERNIE 5.1 approaches the performance of leading closed-source models. In internal evaluations, ERNIE 5.1’s creative writing capabilities approach those of Gemini 3.1 Pro.
  3. Reasoning capabilities approaching leading closed-source models: On AIME26 (with tool use), a challenging mathematical competition benchmark, ERNIE 5.1 scores 99.6 — second only to Gemini 3.1 Pro.

ERNIE 5.1 Benchmark

Technical Features

Multi-Dimensional Elastic Pre-Training: Pre-training Compute Cost at Only 6% of Comparable Models

ERNIE 5.1 is derived from ERNIE 5.0, extracting the optimal sub-network architecture from ERNIE 5.0’s multi-dimensional elastic sub-model matrix to effectively inherit the knowledge and capabilities encoded in ERNIE 5.0 while significantly reducing pre-training cost. The R&D team proposed an innovative Once-For-All elastic training framework. While traditional approaches require separate pre-training runs for models at different scales, ERNIE 5.0 jointly optimizes a large number of sub-models with varying depths, expert capacities, and routing sparsity levels through a dynamic sampling mechanism within a single pre-training run, constructing a sub-model matrix that spans diverse parameter scales and computational budgets. Throughout this process, the model achieves elastic compression and expansion along three dimensions:

  • Elastic depth: During training, the number of active Transformer layers is randomly varied, enabling sub-models at different depths to share weights and adaptively learn a balance between deep and shallow representations.
  • Elastic width / expert capacity: The effective expert capacity in MoE layers is elastically controlled by varying the number of experts participating in routing. By dynamically sampling subsets of experts, the model learns to operate under both full and reduced expert-pool configurations, thereby improving expert utilization efficiency.
  • Elastic sparsity: Through a variable Top-k routing mechanism, the number of activated experts is flexibly adjusted. Activating fewer experts reduces inference cost and improves decoding efficiency, while activating more enhances model capability, achieving a dynamic trade-off between inference overhead and performance.

Building on this breakthrough, ERNIE 5.1 compresses total parameters to approximately one-third and activated parameters to approximately half those of ERNIE 5.0, with pre-training compute cost at only 6% of comparable models at the same scale. Compared to ERNIE 5.0, inference cost is significantly reduced while still achieving leading performance among models of comparable scale.

Illustration of ERNIE 5.0 Elastic Training

Decoupled Fully-Asynchronous Reinforcement Learning Training: Greater Efficiency, Stability, and Cost Reduction

We built a disaggregated reinforcement learning infrastructure on PaddlePaddle to support the multi-stage reinforcement learning training of ERNIE 5.1. To achieve more efficient, stable, and cost-effective training for long-horizon reinforcement learning tasks, we focused our optimizations in three key areas:

  • Disaggregated fully-asynchronous architecture: We designed and developed a disaggregated architecture centered on an RL Controller, fully decoupling the control plane across four major subsystems — training, inference, reward, and agent loop. The subsystems are bridged and interact through high-performance network-based data components, achieving separation of the control plane from the data plane. Under this architecture, each subsystem can be independently deployed and independently scaled, matched to its optimal compute configuration. Meanwhile, inference, training, and reward naturally form a pipeline that can be fully overlapped, establishing a highly scalable foundation for long-horizon asynchronous agentic RL training.
  • FP8 training-inference consistency optimization: Based on PaddlePaddle’s unified training-inference framework, we implemented a unified FP8 low-precision operator library to minimize precision divergence between training and inference in reinforcement learning. To address routing divergence between training and inference in MoE models, we performed in-depth optimization of the Rollout Router Replay (R3) technique — through two-stage computation-communication overlap, combined with dynamic bit-width communication compression and multi-level KV-Cache pooling, enabling R3 with near-zero additional training-inference latency overhead while reducing K3 KL divergence by 50%, providing a critical guarantee for stable long-horizon training of ERNIE 5.1.
  • Heterogeneous elastic resource scheduling: Thanks to the disaggregated architecture, we can flexibly assign optimal compute configurations on demand to each training, inference, and reward subsystem, fully leveraging the cluster’s elastic compute capacity to reduce end-to-end rollout latency. To address the widespread underutilization of CPU resources in AI clusters, we implemented an elastic CPU pooling strategy. This elastic mechanism fully utilizes idle CPU compute across the cluster to support logic-intensive computations such as code sandboxes and verifiers, improving resource utilization while reducing training iteration time.

A Multi-Stage Reinforcement Learning Training Pipeline Centered on OPD, Ensuring Comprehensive Capability Integration

The post-training of conventional large language models (LLMs) typically follows a sequential pipeline, progressing from supervised fine-tuning (SFT) to multi-stage mixed reinforcement learning (Mixed RL). However, as model capabilities continue to scale, this sequential training paradigm has increasingly become a bottleneck, severely hindering the efficiency of research, development, and iteration. Moreover, attempting to fuse all capabilities within a single training stage introduces severe multi-objective optimization conflicts, making it extremely difficult to balance performance across different domain tasks and achieve Pareto optimality — improvements in one capability often come at the cost of regressions in another (i.e., the “seesaw” effect).

To overcome these fundamental challenges, we propose a multi-stage reinforcement learning training pipeline centered on Multi-Teacher On-Policy Distillation (MOPD). This pipeline significantly accelerates the R&D cycle through parallelized expert model training while ensuring comprehensive and conflict-free capability integration. Specifically, the post-training pipeline of ERNIE 5.1 is a four-stage process that decouples expert training from unified capability fusion:

  • Stage 1: Unified Supervised Fine-Tuning (SFT). High-quality multi-domain instruction data is leveraged for fine-tuning, establishing the model’s foundational capabilities in instruction following and tool invocation, which serve as the initialization checkpoint for subsequent capability expansion.
  • Stage 2: Domain Expert Model Training. Multiple domain-specific expert models (e.g., code, reasoning, agentic tasks) are trained in parallel. Each direction independently customizes its dedicated reward signals and training algorithms, fundamentally eliminating mutual interference across heterogeneous tasks.
  • Stage 3: On-Policy Distillation (OPD). With the unified SFT model as the student and multiple domain expert models as teachers, the student samples from its own policy distribution and concurrently learns from multiple teachers’ capabilities via token-level reverse KL divergence, efficiently consolidating the capabilities of diverse experts into a unified parameter space.
  • Stage 4: General Online Reinforcement Learning (General-RL). Following the initial OPD stage, we deliberately introduce an online RL phase tailored for general-purpose conversational scenarios. Our experiments reveal that not all tasks are amenable to capability fusion via token-level KL-based OPD. Specifically, tasks characterized by high-entropy distributions — such as open-ended chat or creative writing — tend to suffer from low distillation efficiency and may cause excessive smoothing of the output probability distribution. To address this, we forgo distillation for this domain and instead apply online RL on top of the post-OPD model. This stage ensures the model’s instruction-following capability, generation diversity, and improved alignment with human preferences, substantially enhancing general-purpose competence while preserving the expert capabilities acquired in earlier stages.

Illustration of ERNIE 5.1 Post-Training Pipeline

Outstanding Creative Capabilities

Through iterative optimization of the technical architecture and targeted refinement of core technologies, ERNIE 5.1 delivers a comprehensive upgrade in foundational capabilities while also excelling in creative performance.

Whether it is the precise alignment of “inspiration–emotion–expression” in creative writing, the coordinated control of logic–character–pacing in long-form narrative, or the dual balance of knowledge accuracy–stylistic adaptability in professional content, ERNIE 5.1 consistently penetrates beyond users’ surface-level requests to capture their core intent, producing work that is warm, deep, and logical — exceeding expectations. This closed-loop capability from intent insight to content creation achieves not only precise synergy between comprehension and generation at the technical level, but has also earned widespread recognition from creative enterprises, content platforms, and professional writers — regarded as a benchmark creative model that understands users, understands content, and understands context.

ERNIE 5.1 Creative Capabilities

We are grateful for the evaluation feedback from leading content interaction enterprises, platforms, and writers/creators. In addition, starting today ERNIE 5.1 will be progressively rolled out on over ten creative production agent platforms, including ISEKAI ZERO (a leading global AI roleplay interactive platform), Mulan AI (a creative agent platform), Diting Huanliu (an AI-native creative canvas), and Storymaster (an AI short drama generation platform). Creators and users are welcome to try them out.


The continuous iteration and advancement of the ERNIE family of models would not be possible without the strong support of our technical foundation and a shared commitment to long-term value alongside our users.

We appreciate every developer and partner who has tested and used the model in our community — each of your suggestions drives model optimization forward. We appreciate the enterprises that have chosen to partner with us — your real-world use cases are what allow technology to truly take root. Above all, we appreciate every user who has been patient with the model’s imperfections and continued to place their trust in us — it is your trust that gives us the courage to push beyond boundaries.

The evolution of AI has no finish line, and every advance of the ERNIE family of models is driven by real-world needs. Going forward, we will continue to stay open, listen to every voice, and ensure that technology serves our users in the most grounded way possible.