Discussion about Support for Multiple Programming Languages
@magaton @Major-wagh @sandeshchand @alikinir Adapted from issue #60 for continued discussion in this thread.
Background
Currently, the repository primarily supports Python by leveraging jedi and custom-built modules to construct the Repository Map. Developers with the necessary skills are encouraged to understand, modify, and adapt this functionality for their specific needs. Due to the substantial workload required, this project will not consider supporting additional languages in the short term. The functionality constructs the Repository Map and saves it to .project_doc_record, which serves as the basis for subsequent features in the repoagent project:
https://github.com/OpenBMB/RepoAgent/blob/825d988127d7bfd757237d9c4e8678d9104030f0/repo_agent/doc_meta_info.py#L296-L391
Community Interest
Multiple users have expressed interest in extending this capability to additional programming languages and have provided valuable feedback. We sincerely appreciate their contributions and ideas. Below are some of my thoughts on this topic.
First, the aider project utilizes tree-sitter to implement their Repository Map feature. However, it has limitations. Since different programming languages have unique features, supporting multiple languages simultaneously requires developers to possess strong cross-language expertise.
I have personally attempted to use tree-sitter to mimic jedi for constructing reference relationships here.
However, this effort has been temporarily shelved.
Additionally, RepoGraph has made progress in this area and published a related paper.
The work introduces an effective plug-in, repo-level module that provides the desired context and significantly enhances LLM-based AI software engineering capabilities.
Another approach is the Language Server Protocol (LSP). With implementations for many languages, LSP can assist in static analysis.
Proposal
Based on the above, my personal suggestion is to adopt different implementations for different programming languages to assist in analysis. The results can then be stored in a unified format to integrate seamlessly with repoagent.
This is a preliminary idea intended to spark further discussion.
Hello, you are referring to RepoGraph. How is that different / better from Aider's repomap? This is what Aider brings:
- support for all the languages from py-treesitter-languages
- max-token as parameter
- construct networkx graph based on the call references
- use pagerank to detect most important nodes
To @Umpire2018 ‘s two concerns in #87 :
- Planing to support more language (C/C++, Java...) in next step, so importing all tree-sitter-languages.
- Will submit another commit to replace jedi with tree-sitter too.
Branch to complete this job?
To @Umpire2018 ‘s two concerns in #87 :
- Planing to support more language (C/C++, Java...) in next step, so importing all tree-sitter-languages.
- Will submit another commit to replace jedi with tree-sitter too.
Branch to complete this job?
@Umpire2018 Would you like to create another branch for tree-sitter migration?
@Umpire2018 Would you like to create another branch for tree-sitter migration?
@st01cs You can continue working on your PR targeting main for developer convenience meanwhile users using pip will not be affected by this change. However, as a small suggestion, it would be helpful to create a new branch in your repo to handle the tree-sitter migration, while keeping your main branch in sync with repoagent/main.
@Umpire2018 Would you like to create another branch for tree-sitter migration?
@st01cs You can continue working on your PR targeting
mainfor developer convenience meanwhile users usingpipwill not be affected by this change. However, as a small suggestion, it would be helpful to create a new branch in your repo to handle the tree-sitter migration, while keeping your main branch in sync withrepoagent/main.
Another commit to replace jedi with tree-sitter. It still just handle Python Language.