MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use

1University of North Carolina, Chapel Hill 2Allen Institute for Artificial Intelligence (AI2) 3Vercept AI
UNC NLP Logo AI2 Logo Vercept Logo
MutaGReP Overview
MutaGReP Overview: Given a user request that requires writing code against a specific codebase, MutaGReP searches for realizable plans to solve the user's request using LLM-guided tree search. The search procedure explores viable solutions by mutating plans while constraining them to symbols available in the codebase. The user request and detailed plan serve as an enriched query that provides necessary context from the repo in a structured form to downstream coding systems, improving repo-level code generation performance.

Here's an example of a plan created by MutaGReP for a user query in the DeepMind/ACME repository.

Full Plan
Scroll to see more

Abstract

When a human requests an LLM to complete a coding task using functionality from a large code repository, how do we provide context from the repo to the LLM?

One approach is to add the entire repo to the LLM's context window. However, most tasks involve only a fraction of symbols from a repo, longer contexts are detrimental to the LLM's reasoning ability, and context windows are not unlimited. Alternatively, we could emulate the human ability to navigate a large repo, pick out the right functionality, and form a plan to solve the task.

We propose MutaGReP (Mutation-Guided Grounded Repository Plan Search), an approach to search for plans that decompose a user request into natural language steps grounded in the codebase. MutaGReP performs neural tree search in plan space, exploring by mutating plans and using a symbol retriever for grounding. On the challenging LongCodeArena benchmark, our plans use less than 5% of a 128K context window for GPT-4o but rival the coding performance of GPT-4o with a context window filled with the repository. Plans produced by MutaGReP allow Qwen 2.5 Coder 32B and 72B to match the performance of GPT-4o with full repo context and enable progress on the hardest LongCodeArena tasks.

Plan Search

MutaGReP Plan Search Process

Each node in the tree is a repo-grounded plan. At every time step, a node is chosen for growing the tree and successors are created by mutating the chosen plan. We use an LLM to implement the successor function.

Mutation and Grounding

MutaGReP Mutation and Grounding Process

The successor function mutates a plan (left-most column) to generate new plans (right-most column). For each modified intent, the grounding function maps the intent to symbols that might be used to implement the intent.

System-Level Comparison

System Level Comparison

Using a fraction of the context, Plan Search (driven by MutaGReP) is competitive with adding the entire codebase into the LLM context and significantly outperforms ReAct based planning.

Enhancing other LLMs with Plans

Model Comparison

Plans produced by MutaGReP consistently improve performance across all models. Qwen 2.5 Coder 32B with our plans exceeds GPT-4o's full-repo performance despite conditioning on 120k fewer context tokens. Even models stronger than GPT-4o (e.g., O1) benefit from our GPT-4o-generated plans.

Making Progress on Hardest 10% of Tasks

Performance on Hard Tasks

Plans found by MutaGReP enable progress on hard tasks where even full-repo context performed poorly. Conditioning on plans produced by MutaGReP shows gains on the hardest 10% of tasks where GPT-4o with a context window filled with the repository performs poorly: — only finding less than 20% of the symbols used in the reference code.

Test-time Scaling

Unconstrained Mutation

Successor Ablation Results

Unconstrained mutation outperforms monotonic mutation, especially at lower budgets. The graph shows the symbol recall of each mutation strategy using best-first search with the oracle scoring function and branching factor of 3.

Informed Search

Comparison of Search Strategies

Informed (best-first) search outperforms uninformed (depth-first) and linear search strategies and performance improves with branching factor (BF), especially for informed search.

Citation

@article{khan2025mutagrep,
  title={MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use},
  author={Khan, Zaid and Farhadi, Ali and Krishna, Ranjay and Weihs, Luca and Bansal, Mohit and Gupta, Tanmay},
  journal={arXiv preprint arXiv:xxxx.xxxx},
  year={2025}
}