incubator-devlake icon indicating copy to clipboard operation
incubator-devlake copied to clipboard

fix(gitextractor): subtask Clone Git Repo ended unexpectedly

Open caioq opened this issue 1 year ago • 1 comments

Summary

  • According to github documentation the installation access token expire after 1 hour. This commit generates a new installation access token at the start of each gitextractor task avoiding the auth error that happens for pipelines that last more than 1 hour.

Does this close any open issues?

Closes https://github.com/apache/incubator-devlake/issues/7958

Screenshots

Before the fix: image

After the fix: The pipeline lasts more than 1 hour and the gitextractor tasks keep working image

Other Information

I understand that this solution is not the most efficient, since for each gitextractor task it will generate new tokens even when the current token is still valid. However, I believe it can be used as a temporary solution to enable the use of Github App without problems.

For a more efficient solution, we could generate a new token only when it has reached its expiration time. Analyzing the code, I believe that the expiration time of this token needs to be persisted in the database in order to be accessed when preparing the task. What do you think? I may be evolving this solution in another new PR.

caioq avatar Oct 09 '24 12:10 caioq

The current implementation assumes all repositories belong to GitHub, which is incorrect. We need to decouple GitExtractor plugin from specific data source platforms like GitHub and GitLab.

Here's a suggested approach:

Define an interface: Create an interface named DynamicGitUrl (or a more descriptive name) within the gitextractor plugin. This interface should define a method to retrieve the latest Git URL based on a given connection ID and scope ID.

Implement PrepareTaskData: In the gitextractor.PrepareTaskData function, if a plugin, connection ID, and scope ID are provided, use core.GetPlugin to fetch the plugin instance and dynamically cast it to the DynamicGitUrl interface. Then, call the interface's method with the connection ID and scope ID to retrieve the latest Git URL.

Implement DynamicGitUrl in Data Source plugins: Each data source plugin (like GitHub) should implement the DynamicGitUrl interface, providing its own logic for determining the Git URL based on connection and scope information.

This approach allows for a more flexible and extensible design. The GitExtractor plugin remains agnostic to specific data sources, while each data source plugin is responsible for providing the appropriate Git URL retrieval logic.

Hi @klesh, thanks for the review! I agree with you and liked your suggestion. I implemented it here https://github.com/apache/incubator-devlake/pull/8136/commits/26b716f9880aff1ff58c1730f1c4260112cef110 for the Github plugin, let me know if this is what you had in mind.

caioq avatar Oct 17 '24 20:10 caioq