Deep LearningMinerU2.5 Revolutionizes High-Resolution Document Parsing with a Lightweight 1.2B Parameter Vision-Language Model
MinerU2.5 introduces a highly efficient 1.2B-parameter decoupled vision-language model that solves the resolution curse in document parsing. By leveraging a novel coarse-to-fine strategy, it achieves state-of-the-art accuracy while drastically reducing compute costs.







