Posts

ERNIE 5.0: A 2.4 Trillion-Parameter Unified Multimodal Foundation Model

We introduce ERNIE 5.0: a 2.4 trillion-parameter Unified Multimodal Model trained from scratch. Integrating text, image, video, and audio into a single autoregressive framework, it overcomes the limitations of late-fusion architectures to achieve seamless cross-modal understanding and generation.

PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing

🚀 We release PaddleOCR-VL-1.5, an upgraded model achieving a new state-of-the-art (SOTA)accuracy of 94.5% on OmniDocBench v1.5.

ERNIE-5.0 Tops LMArena Text Leaderboard as No.1 Chinese Model!

On January 15, LMArena released its latest rankings. ERNIE-5.0-0110 achieved a score of 1,460, ranking No. 1 among Chinese models and No. 8 globally on the LMArena Text Arena.

ERNIE-5.0-Preview-1220 Becomes the Sole Chinese Model in LMArena Vision Top 10!

On January 8, LMArena released its latest rankings. ERNIE-5.0-Preview-1220 achieved a score of 1226, ranking No. 1 in China and No. 8 globally on the LMArena Vision Arena.

Best Text model from China in LMArena is now ERNIE-5.0-Preview-1203!

Just now, LMArena released its latest rankings. Baidu’s ERNIE-5.0-Preview-1203 scored an impressive 1,451 points.

ERNIE-5.0-Preview-1103 landed on the LMArena Text Leaderboard!

We’ve just refreshed our standings with the latest ERNIE-5.0-Preview-1103 on LMArena. 🚀 ERNIE-5.0-Preview-1103 holds the top 20 in the most competitive Arena.

ERNIE-5.0-Preview-1120, ready for testing in LMArena!

ERNIE-5.0-Preview-1120 now ranks #1 in domestic on the LMArena Vision leaderboard

ERNIE-4.5-VL-28B-A3B-Thinking: A Breakthrough in Multimodal AI

We release ERNIE-4.5-VL-28B-A3B-Thinking, a multimodal reasoning model that achieves SOTA performance while activating only 3B parameters.

ERNIE-5.0-Preview-1022, ready for testing in LMArena!

ERNIE-5.0-Preview-1022 now ranks #2 globally on the LMArena Text leaderboard

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

We are excited to release PaddleOCR-VL, a SOTA and resource-efficient model tailored for document parsing.