Codesota · Papers · Multimodal2024-04-25
Paper

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

arXiv ↗
§ 01 · Benchmark results

4 results reproduced from this paper.

#ModelVendorBenchmarkMetricValueSOTADateSource
01InternVL2-76BShanghai AI LabMMBenchaccuracy86.5%2024-04-25source ↗
02InternVL2-76BShanghai AI LabMMMUaccuracy67.4%2024-04-25source ↗
03InternVL2-76BShanghai AI LabTextVQAaccuracy84.4%2024-04-25source ↗
04InternVL2-76BShanghai AI LabVQA v2.0accuracy87.2%2024-04-25source ↗
§ 02 · Models

1 model from this paper.

evaluates
InternVL2-76B
Shanghai AI Lab
§ 04 · Related papers

Other Multimodal papers tracked on Codesota.

  1. 2025-02-19 · 3 results
    Qwen2.5-VL Technical Report
  2. 2025-01-22 · 2 results
    InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
  3. 2025-01-15 · 1 result
    Gemini 2.0 Flash Technical Report
  4. 2024-10-25 · 4 results
    SWE-bench Verified
  5. 2024-10-22 · 1 result
    Claude 3.5 Sonnet Model Card
  6. 2024-09-18 · 4 results
    Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Read next

Three places to go from here.

Index
All papers
Every ML paper with a sourced benchmark result, sorted by publication date.
Replacement
Papers with Code is dead — alternatives
What replaced PWC for each use case: LLMs, OCR, speech, vision, robotics.
Top hub
Multimodal
Every benchmark in Multimodal.