Warp icon indicating copy to clipboard operation
Warp copied to clipboard

Add Support for indexing of large codebases

Open MovGP0 opened this issue 7 months ago • 5 comments

Pre-submit Checks

Describe the solution you'd like?

The "Codebase Index" feature should be able to index large codebases.

Image

Is your feature request related to a problem? Please describe.

Most of my codebases are very large and the agent can't handle it. Having an vector index would be helpful.

Additional context

Running a local encoding model for indexing is likely required.

Operating system (OS)

Windows

How important is this feature to you?

3

Warp Internal (ignore) - linear-label:39cc6478-1249-4ee7-950b-c428edfeecd1

None

MovGP0 avatar Jun 11 '25 08:06 MovGP0

Thanks for this feature request!

To anyone else interested in this feature, please add a 👍 to the original post at the top to signal that you want this feature, and subscribe if you'd like to be notified.

dannyneira avatar Jun 11 '25 22:06 dannyneira

I just created a new project for backing up my warp dev envt and the "Codebase Index" states that the Codebase is too large ..

I would be happy if it could be indexed ... however with a better understanding on the functionality, I might be better able to avoid this in future projects.

Operating system (OS) Linux; Ubuntu 24.04

How important is this feature to you? 3

Image

sworley avatar Jun 13 '25 00:06 sworley

@sworley @MovGP0 We would appreciate the following information about the larger codebases to better help us understand the use case:

  • its depth
  • number of files
  • type of files it contains
  • how it's structured
  • etc., anything else you think is important for us to know

Also note that large files like virtual disk images or ISO's that aren't used for coding may affect the ability to index the codebase as well, so we recommend keeping those seperate.

dannyneira avatar Jun 13 '25 20:06 dannyneira

@sworley

I'll give you exact numbers next week. I think the most important part is that build artifacts and packages (which is most of the folder size) need to be ignored, so I'd recommend to respect the .gitignore files in a given Git repository (including ignore files in subfolders).

Further, support for MCP and/or A2A might also mitigate the problem, since users would be able to provide their own indexes.

MovGP0 avatar Jun 13 '25 21:06 MovGP0

Some statistics about the codebase I am working on currently:

Languages:
- C# (99.6 %)
- PowerShell (0.2 %)
- Visual Basic (0.2 %)
Lines of code: 1,706,418
Number of files: 23,612 files
Size: 1.274 GiB

Structure

Depth: up to 6 directory sublevels
Structure: one subfolder per project (ie. *.csproj file)

File types

Most Common File Types:
•  .cs (C# source files)
•  .resx (Resource files)
•  .png (PNG images)
•  .xml (XML files)

Other Notable File Types:
-  .svg (SVG images)
-  .csproj (C# project files)
-  .bmp (Bitmap images)
-  .ps1 (PowerShell scripts)
-  .config (Configuration files)

Development & Build Files:
-  .nsi/.nsh (NSIS installer files)
-  .settings (Settings files)
-  .licx (License files)

Documents & Media:
-  .txt (Text files)
-  .ico (Icon files)
-  .jpg (JPEG images)
-  .xlsx (Excel files)
-  .docx (Word documents)

Other file types
- various CAD file formats (IGES, STP/STEP, DWG, etc.)

[!Note] The codebase is actually bigger, because it's ditributed about multiple git repositories; referencing other indices for the agent might be required for understanding the full context

[!Note] Build artifacts (/obj and /bin) directories have been excluded for this statistics/size calculations

MovGP0 avatar Jun 17 '25 12:06 MovGP0

I'm in the same boat as @MovGP0 except:

  • 8M lines of code split: Java (5M), JavaScript + TypeScript (3M)
  • DB Migration (Flyway): 1925 migrations
  • Single repository, not mono
  • Atlassian toolchain
  • Multiple build artifacts distributed across 20+ servers (in-house, our own DC)

psalvitti avatar Jun 21 '25 00:06 psalvitti

The way I'm doing it currently is having an index using the Repomix CLI to create an local index and using Repomix`s MCP API for LLM queries. Unfortunately that is something that Warp does not support yet.

MovGP0 avatar Jun 21 '25 10:06 MovGP0

I don't even have a large codebase (I think). Cursor says it's only 475 files after .gitignore and .cursorignore but Warp cannot index my codebase. Is it actually large or it doesn't respect my .gitignore? Also having an additional ignore file like cursor would be great.

smeeklai avatar Jun 26 '25 05:06 smeeklai

Cursor can index my Unity project, but Warp doesn't.

densy07 avatar Jun 26 '25 07:06 densy07

I can't use it in my Unity project at present. The total number of files in my Asset folder exceeds 20,000. Even the Turbo plan can't handle this situation. It contains a large number of meta files that should be ignored.

densy07 avatar Jun 28 '25 09:06 densy07

What's missing is the ability to ignore certain paths besides .gitignore.

Also the "Codebase index" settings page could display the total number of files it wants to index, not just "too large".

utapyngo avatar Jun 28 '25 16:06 utapyngo

What's missing is the ability to ignore certain paths besides .gitignore.

Also the "Codebase index" settings page could display the total number of files it wants to index, not just "too large".

This! Game projects have a lot of asset files that can't be indexed. I'm not sure already Warp ignores binary files, but at least with Unity it's very common that these files are all text.

Also ignoring certain paths won't be enough, there are meta files along the regular files. We need a .warpignore.

robertocaldas avatar Jul 14 '25 02:07 robertocaldas

There is a .warpindexingore that I've successfully used to index a git repo that was reported as too large. I excluded some directories I didn't care about and it reduced the file count to less than 10k which is the limit for the pro plan.

Missatge de Roberto Caldas @.***> del dia dg., 13 de jul. 2025 a les 20:30:

robertocaldas left a comment (warpdotdev/Warp#6586) https://github.com/warpdotdev/Warp/issues/6586#issuecomment-3067577611

What's missing is the ability to ignore certain paths besides .gitignore.

Also the "Codebase index" settings page could display the total number of files it wants to index, not just "too large".

This! Game projects have a lot of asset files that can't be indexed. I'm not sure already Warp ignores binary files, but at least with Unity it's very common that these files are all text.

Also ignoring certain paths won't be enough, there are meta files along the regular files. We need a .warpignore.

— Reply to this email directly, view it on GitHub https://github.com/warpdotdev/Warp/issues/6586#issuecomment-3067577611, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAHWPSCKEASQCF5LW24U7L3IMI4ZAVCNFSM6AAAAAB7B4EB2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANRXGU3TONRRGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

mateu avatar Jul 14 '25 04:07 mateu

There is a .warpindexingore that I've successfully used to index a git repo that was reported as too large. I excluded some directories I didn't care about and it reduced the file count to less than 10k which is the limit for the pro plan.

Missatge de Roberto Caldas @.***> del dia dg., 13 de jul. 2025 a les 20:30:

That didn't work for me, can you please provide more information? I added the file in the repo root, pressed the sync button in Preferences, closed and opened Warp, it doesn't seem to update.

robertocaldas avatar Jul 27 '25 23:07 robertocaldas

What you did should work if you've excluded directories to reduce the file count. Here's what my .warpindexingignore looks like:

ansible/ schema/ web/ infrastructure/

That excludes those directories and brings the index count under the threshold. Make sure you have the ignore filename spelled properly and the directories.

mateu avatar Jul 28 '25 01:07 mateu

There was a typo in my ignore filename .warp indexing ignore I was using .warpindexignore

I changed the filename to .warpindexingignore and it works.

robertocaldas avatar Jul 28 '25 01:07 robertocaldas

Yes, that's one of the documented ignore files

mateu avatar Jul 28 '25 02:07 mateu

The "Ingore file" section lists the options down in this page: https://docs.warp.dev/code/codebase-context

mateu avatar Jul 28 '25 02:07 mateu

For game projects, I think an option should be provided to specify which files to include only. There are too many other types of files in game engines, and there are actually only a few types of code files.

Codebase index file limit is given too little. The cursor shows a limit of 50000 files, but in reality, it can still run with more than 100000 files.

eefan000 avatar Sep 08 '25 02:09 eefan000