Lang ZH

SMALL SIZE, SUPER POWER

Edge Model for Everyone, Everyday, Everywhere

MiniCPM InsidePhones

MiniCPM InsideAIPC

MiniCPM InsideIntelligent Cabins

MiniCPM InsideEmbodied Robots

MiniCPM InsideWearable Devices

Put ChatGPT, GPT-4V Level LLMs on Your Phone, Pad and PC

Learn More

The 'MiniCPM' edge model series is a world-leading, lightweight, and high-performance LLM. Since its release in February 2024, it has been widely tested and acclaimed by the global open-source community for its "achieving more with less" efficiency and outstanding on-device performance. It has repeatedly topped GitHub and Hugging Face trending charts, becoming one of the most popular LLMs on Hugging Face in 2024. The 'MiniCPM' has partnered with industry benchmark leaders, emerging as an indispensable player in driving innovation across sectors such as AIPC, AI phones, intelligent cabins, and embodied robots.

High Efficiency, Low Cost, Achieving More with LessFundation Model MiniCPM

4B2.4B1.2B

The On-Device ChatGPT Moment

8B Lightning Edition + 0.5B — Small But Powerful100+x Speed Boost Proficient in Long-Form On-Device Text

Fast!
Inference speed up to 220x ultra-acceleration 5x regular acceleration

Smooth!
Efficient dual-stream sliding window switching Sparse computation for long texts Dense computation for short texts

Powerful!
Punches above its weight with flagship-level performance Requires only 22% of training data to reach comparable quality

Compact!
25% ultra-low storage footprint 90% slimmed-down quantized version Optimized for on-device deployment

Unbelievably Strong for 4B size edge Model on your device!
ChatGPT-level Basic Performance Surpassing GPT-3.5, Qwen2-7B, GLM4-9B

New Architecture, New Benchmark of LLM Knowledge Density

Light! Fast! On-Device Friendly
Only 2GB of memory after quantization
Versatile and Sharp as a Swiss Army KnifeSurpassing Kimi! Infinite Long Text
32, 128, 256, 512K... Unlimited Context Expansion

GPT-4o-level Function Calling
Surpassing GPT-3.5, GLM4-9B, Close to GPT-4o

Superior RAG External Attachment Set Number One in Chinese Retrieval, Results Generation Surpassing Llama3-8B
Learn More

Leapfrogging Global Benchmark WorksSurpassing Mistral-7B, Llama2-13B, Gemma-7B-it, ChatGLM3-6B and other small open-source models.High Efficiency and Low CostSupports CPU inference, Fine-tuning with consumer-GPU Inference speed up to 33 tokens/s Inference cost as low as 1 $ = 12,500,000 tokens*Inference speed is the actual performance of the Intel Core ULTRA9 processor *Inference cost calculation: Snapdragon 855 chip (costing about 82 Dollars) with 7.5 tokens/sLearn More

Smaller Size, Everywhere ScenariosHalf the Size, Performance Surpassing Llama2-13B Inference speed 25 tokens/s, 25 times the human speaking speed 60% reduction in inference cost 1 $ = 30,300,000 tokens*Apple A17 Pro is $130. If using meta, the maximum speed is 25 tokens/s Assuming the chip is used for 5 years, the inference cost is (25x3600x24x365x5)/130 = 30.3 Million Tokens/DollarLearn More

View the detailed features of each version

GPT-4o level Omni Model runs on deviceMultimodal Model MiniCPM-V

8B Full-Modal8B Live Video8B2.8B

The On-Device GPT-4o New Era

Edge-Side GPT-4oReal-time streaming, end-to-end Full-modal, all SOTA The best edge visual general model The best audio general model
Continuous watching, real videos Not just a single frame-based model Real-time listening, truly smooth Hear clearly, understand distinctly Natural speaking, emotional engagement Real-time interruptions without confusionFull Capability, End-to-EndHigh performance, low latency More natural, more coherent Context understanding Interruptible at any time Noise resistance Easy deployment and maintenanceLearn More

Top On-Device Multimodal, Comprehensive PerformanceCompetitive to GPT-4V, SOTA performance in Real-time Video Understanding, Multiple Images Understanding, and Single Image Understanding, among models below 20B. It's the first appearance of real-time understanding on the edge model on the device.Light! Fast! On-Device Friendly!Only 6GB of memory on the device side after quantization On-Device inference speed up to 18 tokens/s, 33% faster Supports Llama.cpp, ollama, vllm inferenceOthersExtremely low hallucination, better than GPT-4o and GPT-4V, based on self-developed RLHF-V efficient alignment technology.Learn More

Strongest on-device multimodal general capabilitySurpassing Gemini Pro, GPT-4VStrongest on-device OCRComparable to GPT-4V benchmark model OCRBench surpasses GPT-4o, GPT-4V, Claude 3V Opus Gemini Pro and other benchmark models on ranking9 times clearer pixels Precise recognition on difficult image and long textOur self-developed high-definition image decoding technology enables on-device lossless recognition of 1.8 million pixel high-definition images, supporting any aspect ratio. It excels in interpreting and reasoning difficult images, as well as providing precise OCR recognition and extraction of long image and text content.

Multimodal acceleration of on-device systemImage encoding is 150 times faster! Efficient operation on mobile phones at 6-8 tokens/sSupports 30+ languagesAdded mainstream languages such as German, French, Spanish, Italian, Russian and moreLearn More

Strongest on-device OCR Breakthrough in multimodal model capabilityGPT-4V/ Recognition on image and text comparable to Gemini Pro.Hallucination level on par with GPT-4VLearn More

View the detailed features of each version

Compare the functionalities of various versions

Global Partner

On-Device Native

Personalized for All Scenarios

Chip-Level Fit

AI Native OA

PLAY

Technical Blog

Large Model

Agent

Infra

Ultra Alignment

Others

端侧GPT-4o来了！全新面壁小钢炮，流式全模态+端到端！是她！端侧GPT-4O，端到端，多模态。

双榜首！登顶Hugging Face和GitHub趋势榜Top1MiniCPM-Llama3-V 2.5： 8B 参数，8G 显存，4070 轻松推理，手机端 6-8 tokens/s 高效运行。当前 MiniCPM-V 系列下载总量已超 13 万，GitHub 星标 3k+。

感谢社区厚爱，面壁小钢炮 MiniCPM 免费商用感谢全球开源社区的朋友们，一路厚爱与支持！作为开源社区的贡献者和受益者，面壁智能, OpenBMB&清华 NLP 实验室认真讨论决定：将面壁「小钢炮」 MiniCPM 免费商用。

多图、视频首上端！3 SOTA 面壁小钢炮，创 GPT-4V 端侧全面对标新时代！再次刷新端侧多模态天花板，面壁「小钢炮」 MiniCPM-V 2.6 模型重磅上新！仅 8B 参数，取得 20B 以下单图、多图、视频理解 3 SOTA 成绩，一举将端侧AI多模态能力拉升至全面对标 GPT-4V 水平。更有多项功能首次上「端」：小钢炮一口气将实时视频理解、多图联合理解、多图 ICL 等能力首次搬上端侧多模态模型，更接近充斥着复杂、模糊、连续实时视觉信息的多模态真实世界，更能充分发挥端侧 AI 传感器富集、贴近用户的优势。

星标破万！小钢炮2.6登顶GitHub，Hugging Face TOP3，燃爆开源社区！想到了直升机，没想到的是火箭！ MiniCPM-V 2.6 一经发布，火箭登顶全球著名开源社区 GitHub 与 HuggingFace 趋势榜 Top 3。至此，面壁小钢炮 MiniCPM-V系列，GitHub 星标破万! 小钢炮MiniCPM系列自今年2月1日面世以来，累计下载量已超百万！

双登顶！面壁小钢炮3.0 GitHub Top 1，Hugging Face Top 3面壁小钢炮 MiniCPM 3.0 持续引领端侧 ChatGPT 时代！

端侧 ChatGPT 时刻到来！面壁小钢炮 3.0 重磅发布面壁发布小钢炮3.0

WAIC 2024，面壁打开大模型新定律、新架构、新生态！面壁智能联合创始人、首席科学家刘知远在WAIC 2024 “模型即服务（Mass）加速大模型应用落地”论坛进行了《大模型时代的摩尔定律，迈入更高效的大模型时代》主题演讲，并首次对外介绍

面壁新模型：早于Llama3、比肩 Llama3、推理超越 Llama3！面壁发布领先的开源大模型「Eurux-8x22B 」。相比口碑之作 Llama3-70B，发布时间更早，综合性能相当，尤其是拥有更强的推理性能——刷新开源大模型推理性能 SOTA，堪称开源大模型中「理科状元」。激活参数仅 39B，支持 64k 上下文，相比 Llama3 速度更快、可处理更长文本。

Efficiency FirstWe believe the best model is the one with superior power, faster speed and lower costEfficiency comes from mastering the science of large language models (LLMs), with knowledge density as the key principle. As knowledge density grows, it becomes a core competitive advantage, unlocking vast potential for edge intelligence and applications.

Modelbest LawMoore’s Law

Model capability density increases exponentially over time, with the number of parameters required to reach a certain intelligence level halving every 3.3 months.Capability density: The ratio of effective parameter size to actual parameter size. Effective parameter size refers to the minimum number of parameters required for the reference model (e.g., MiniCPM) to achieve performance equivalent to the given target model.

News

A G I F O R L I V E S