Recent Papers / arXiv:2605.18565
MINTEval: Evaluating Memory under Multi-Target Interference in Long-Horizon Agent Systems
Authors pending
Abstract
15.6k QA pairs over long contexts averaging 138.8k tokens for multi-target memory
Tasks
editResults
No benchmark results recorded yet.
Benchmark results referencing this paper haven't been added to the registry yet. If you have a reproduction, submit it →
CodeSOTA extraction
Benchmark evidence
- Verify that MINTEval average accuracy across all systems is 27.9%.