continue icon indicating copy to clipboard operation
continue copied to clipboard

Repository map not working as expected with different programming languages

Open pf-tjung opened this issue 1 year ago • 2 comments

Before submitting your bug report

Relevant environment info

- OS: Windows 10
- Continue: v0.8.51
- IDE: VS Code 1.93.0

Description

If a C/C++ Project is opened the repository map used as context using the @Repository Map Context Provider, only files are listed in the map, no function signatures. The opened project also has lots of helper files like python and assembler files: (Output shortened)

Below is a repository map. 
For each file in the codebase, this map contains the name of the file, and the signature for any classes, methods, or functions in the file.
[...]
src\submodules\romtable\test\CMakeLists.txt
src\submodules\romtable\test\romtable_test.cpp
src\submodules\romtable\base\CMakeLists.txt
src\submodules\romtable\base\romtable.h
src\submodules\romtable\base\romtablebase.h
src\submodules\romtable\base\romtableinterface.h
[...]

If a Python Project is used, the repository map shows files as well as function signatures in that files. But there are multiple lines with the exact same signature. Moreover, some files seem not to have one but in reality they have function signatures (e.g. change_password.py, command.py):

Below is a repository map. 
For each file in the codebase, this map contains the name of the file, and the signature for any classes, methods, or functions in the file.

antennas.py:
	rfid_antennas (channel: int = 0xff, method: str = "GET", payload: str = "") 
	...
	rfid_antennas (channel: int = 0xff, method: str = "GET", payload: str = "") 
	...
	rfid_antennas (channel: int = 0xff, method: str = "GET", payload: str = "") 
	...
	rfid_antennas (channel: int = 0xff, method: str = "GET", payload: str = "") 
	...
	rfid_antennas (channel: int = 0xff, method: str = "GET", payload: str = "") 
	...
	rfid_antennas (channel: int = 0xff, method: str = "GET", payload: str = "") 
	...
	rfid_antennas (channel: int = 0xff, method: str = "GET", payload: str = "") 
	...
	rfid_antennas (channel: int = 0xff, method: str = "GET", payload: str = "") 
	...
	rfid_antennas (channel: int = 0xff, method: str = "GET", payload: str = "") 
	...
	rfid_antennas (channel: int = 0xff, method: str = "GET", payload: str = "") 
	...
	rfid_antennas (channel: int = 0xff, method: str = "GET", payload: str = "") 
	...
	rfid_antennas (channel: int = 0xff, method: str = "GET", payload: str = "") 
	...
	rfid_antennas (channel: int = 0xff, method: str = "GET", payload: str = "") 
	...
	rfid_antennas (channel: int = 0xff, method: str = "GET", payload: str = "") 
	...
	rfid_antennas (channel: int = 0xff, method: str = "GET", payload: str = "") 
	...
	rfid_antennas (channel: int = 0xff, method: str = "GET", payload: str = "") 
	...
	rfid_antennas (channel: int = 0xff, method: str = "GET", payload: str = "") 
	...
	rfid_antennas (channel: int = 0xff, method: str = "GET", payload: str = "") 

count_lines.py:
	count_lines_in_directory (path) 
	...
	count_lines_in_directory (path) 
	...
	count_lines_in_directory (path) 
	...
	count_lines_in_directory (path) 
	...
	count_lines_in_directory (path) 
	...
	count_lines_in_directory (path) 
	...
	count_lines_in_directory (path) 
change_password.py
command.py

To reproduce

No response

Log output

No response

pf-tjung avatar Sep 09 '24 10:09 pf-tjung

Hi @tobiajung , thanks for calling this out, I was able to confirm the behavior on my end. Will plan to take a look shortly!

Patrick-Erichsen avatar Sep 12 '24 19:09 Patrick-Erichsen

Alright, seems like with version 0.8.52 it is much better, but there are lots of duplicates in functions for every source file.

pf-tjung avatar Sep 25 '24 06:09 pf-tjung

Same thing with GoLand in Go project

Only file names, no structures/methods

checorone avatar Oct 16 '24 12:10 checorone

Thanks for the +1 @checorone.

It looks like that is actually the intended behavior at the moment, e.g. only include file names. I believe we made that decision since we haven't written the logic to selectively choose which signatures (i.e. methods, interfaces, etc) to include. Even for small codebases if we include all of them it will take 10s of thousands of tokens.

The near term solution is to use PageRank to filter down to the most relevant portion of the repo map, including signatures, based on the query. Will hopefully be circling around to that soon!

Patrick-Erichsen avatar Oct 17 '24 18:10 Patrick-Erichsen

Thanks for the answer @Patrick-Erichsen I will then close the issue since it isn't a bug. File names are shown for the files.

pf-tjung avatar Oct 18 '24 04:10 pf-tjung