incubator-devlake icon indicating copy to clipboard operation
incubator-devlake copied to clipboard

[Question][Module Name] Gitextractor cant clone repos from gitlab

Open claudious05 opened this issue 1 year ago • 7 comments

Question

Im trying to collect data from gitlab, im running devlake(v0.21.0@ac4f7fe) using docker compose. The data collection pipeline fails everytime at the gitextractor job and i get this error:

github.com/apache/incubator-devlake/core/runner.runSubtask | /app/core/runner/run_task.go:332 | github.com/apache/incubator-devlake/core/runner.RunPluginSubTasks | /app/core/runner/run_task.go:273 | github.com/apache/incubator-devlake/core/runner.RunPluginTask | /app/core/runner/run_task.go:159 | github.com/apache/incubator-devlake/core/runner.RunTask | /app/core/runner/run_task.go:133 | github.com/apache/incubator-devlake/server/services.runTaskStandalone | /app/server/services/task_runner.go:113 | github.com/apache/incubator-devlake/server/services.RunTasksStandalone.func1 | /app/server/services/task.go:186 | runtime.goexit | /usr/local/go/src/runtime/asm_amd64.s:1598 Wraps: (4) plain clone git error Wraps: (5) plain clone git error Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString

I really apreciate some help.

Screenshots

If applicable, add screenshots to help explain.

Additional context

Our Gitlab is running On premise, we tried to mount the CA Certificate into the devcontainer but unfortunately nothing changed.

claudious05 avatar May 27 '24 14:05 claudious05

Hello @claudious05! I'm here to help you with your issue. Feel free to ask me anything.

To resolve the "plain clone git error" in the gitextractor job when collecting data from an on-premise GitLab using DevLake (v0.21.0@ac4f7fe) with Docker Compose, you can follow these steps:

  1. Check the Proxy Configuration: Ensure that the proxy settings are correctly configured if you are using a proxy. The code supports both HTTP/HTTPS and SSH proxies.

  2. Verify the Private Key: If you are using SSH with a private key, make sure the private key is correctly handled and the passphrase (if any) is correctly set.

  3. Disable Shallow Clone: If the error persists, try disabling the shallow clone by setting NoShallowClone to true in the CloneRepoConfig.

Here is an example of how you can modify the CloneRepoConfig to disable shallow cloning:

type CloneRepoConfig struct {
    UseGoGit        *bool
    SkipCommitStat  *bool
    SkipCommitFiles *bool
    NoShallowClone  bool
}

// Example usage
config := CloneRepoConfig{
    NoShallowClone: true,
}
  1. Check SSL Verification: If you are using HTTPS and facing SSL verification issues, you can disable SSL verification by setting the IN_SECURE_SKIP_VERIFY configuration to true.

  2. Repack and Deepen Commits: The code already includes a mechanism to repack and deepen commits to avoid certain errors. Ensure this part of the code is executed correctly.

Here is the relevant part of the code for repacking and deepening commits:

// deepen the commits by 1 more step to avoid https://github.com/apache/incubator-devlake/issues/7426
if since != nil {
    // fixes error described on https://stackoverflow.com/questions/63878612/git-fatal-error-in-object-unshallow-sha-1
    // It might be caused by the commit which being deepen has multiple parents (e.g., a merge commit), not sure.
    repackCmd := exec.CommandContext(ctx.GetContext(), "git", "-C", localDir, "repack", "-d")
    if err := repackCmd.Run(); err != nil {
        return errors.Default.Wrap(err, "failed to repack the repo")
    }
    deepenCmd := exec.CommandContext(ctx.GetContext(), "git", "-C", localDir, "fetch", "--deepen=1")
    if err := deepenCmd.Run(); err != nil {
        return errors.Default.Wrap(err, "failed to deepen the cloned repo")
    }
}
  1. Ensure Proper Cleanup: The CloneGitRepo function includes a cleanup callback to remove the cloned directory after the operation. Make sure this cleanup is properly set to avoid any residual issues.

Here is the relevant part of the CloneGitRepo function:

func CloneGitRepo(subTaskCtx plugin.SubTaskContext) errors.Error {
    taskData, ok := subTaskCtx.GetData().(*parser.GitExtractorTaskData)
    if !ok {
        panic("git repo reference not found on context")
    }
    op := taskData.Options
    storage := store.NewDatabase(subTaskCtx, op.RepoId)
    var err errors.Error
    logger := subTaskCtx.GetLogger()

    // temporary dir for cloning
    localDir, e := os.MkdirTemp("", "gitextractor")
    if e != nil {
        return errors.Convert(e)
    }

    // clone repo
    repoCloner := parser.NewGitcliCloner(subTaskCtx)
    err = repoCloner.CloneRepo(subTaskCtx, localDir)
    if err != nil {
        if errors.Is(err, parser.ErrNoData) {
            taskData.SkipAllSubtasks = true
            return nil
        }
        return err
    }

    // We have done comparison experiments for git2go and go-git, and the results show that git2go has better performance.
    var repoCollector parser.RepoCollector
    if *taskData.Options.UseGoGit {
        repoCollector, err = parser.NewGogitRepoCollector(localDir, op.RepoId, storage, logger)
    } else {
        repoCollector, err = parser.NewLibgit2RepoCollector(localDir, op.RepoId, storage, logger)
    }
    if err != nil {
        return err
    }

    // inject clean up callback to remove the cloned dir
    cleanup := func() {
        _ = os.RemoveAll(localDir)
    }
    if e := repoCollector.SetCleanUp(cleanup); e != nil {
        return errors.Convert(e)
    }

    // pass the collector down to next subtask
    taskData.GitRepo = repoCollector
    subTaskCtx.TaskContext().SetData(taskData)
    return nil
}

By following these steps, you should be able to resolve the "plain clone git error" in the gitextractor job [1][2].

Details

About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

dosubot[bot] avatar May 27 '24 14:05 dosubot[bot]

In which file is the shallow clone option located?

claudious05 avatar May 28 '24 06:05 claudious05

The shallow clone option is located in the CloneRepoConfig struct. You can find and modify this configuration in the clone_gitcli.go file within the backend/plugins/gitextractor/parser directory. Look for the CloneRepoConfig struct and set the NoShallowClone field to true.

Details

About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

dosubot[bot] avatar May 28 '24 06:05 dosubot[bot]

@dosu can't I just enable it disable it using the .env file or a environment variable?

claudious05 avatar May 28 '24 06:05 claudious05

Yes, you can enable or disable the shallow clone option using the .env file or an environment setting. Specifically, you can set the NoShallowClone option in the .env file to control this behavior.

Here is how you can do it:

  1. Open your .env file.
  2. Add the following line to set the NoShallowClone option:
NO_SHALLOW_CLONE=true

This will ensure that the NoShallowClone option is enabled, and a full clone will be performed instead of a shallow clone.

Make sure to restart your Docker Compose services after making changes to the .env file to apply the new settings.

Details

About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

dosubot[bot] avatar May 28 '24 06:05 dosubot[bot]

@dosu thanks it worked, I have now another question. I want to add a gitlab project to my Scope but i cant cause it has sub projects and I get "no data to select" as message.

claudious05 avatar May 28 '24 07:05 claudious05

To add a GitLab project with sub-projects to your scope in DevLake when you receive a "no data to select" message, you need to ensure that the PutScopes function is correctly creating or updating the GitLab project data. Here is the relevant part of the code:

package api

import (
	"github.com/apache/incubator-devlake/core/errors"
	"github.com/apache/incubator-devlake/core/plugin"
	"github.com/apache/incubator-devlake/helpers/pluginhelper/api"
	"github.com/apache/incubator-devlake/plugins/gitlab/models"
)

type PutScopesReqBody api.PutScopesReqBody[models.GitlabProject]
type ScopeDetail api.ScopeDetail[models.GitlabProject, models.GitlabScopeConfig]

// PutScopes create or update gitlab project
// @Summary create or update gitlab project
// @Description Create or update gitlab project
// @Tags plugins/gitlab
// @Accept application/json
// @Param connectionId path int false "connection ID"
// @Param scope body PutScopesReqBody true "json"
// @Success 200  {object} []models.GitlabProject
// @Failure 400  {object} shared.ApiBody "Bad Request"
// @Failure 500  {object} shared.ApiBody "Internal Error"
// @Router /plugins/gitlab/connections/{connectionId}/scopes [PUT]
func PutScopes(input *plugin.ApiResourceInput) (*plugin.ApiResourceOutput, errors.Error) {
	return dsHelper.ScopeApi.PutMultiple(input)
}

Ensure that the PutScopes function is correctly set up to create or update the GitLab project data. If the data is not being fetched correctly, you might need to check the connection settings and the project ID being used.

Additionally, you can use the GetScopeList function to retrieve the list of GitLab projects and ensure they are being correctly fetched:

// GetScopeList get Gitlab projects
// @Summary get Gitlab projects
// @Description get Gitlab projects
// @Tags plugins/gitlab
// @Param connectionId path int false "connection ID"
// @Param searchTerm query string false "search term for scope name"
// @Param blueprints query bool false "also return blueprints using these scopes as part of the payload"
// @Success 200  {object} []ScopeDetail
// @Failure 400  {object} shared.ApiBody "Bad Request"
// @Failure 500  {object} shared.ApiBody "Internal Error"
// @Router /plugins/gitlab/connections/{connectionId}/scopes/ [GET]
func GetScopeList(input *plugin.ApiResourceInput) (*plugin.ApiResourceOutput, errors.Error) {
	return dsHelper.ScopeApi.GetPage(input)
}

Ensure that your configuration in the UI is set up correctly. The GitLabConfig in the config-ui/src/plugins/register/gitlab/config.tsx file should be properly configured to handle GitLab connections and data scopes:

import { ExternalLink } from '@/components';
import { DOC_URL } from '@/release';
import { IPluginConfig } from '@/types';

import Icon from './assets/icon.svg?react';

export const GitLabConfig: IPluginConfig = {
  plugin: 'gitlab',
  name: 'GitLab',
  icon: ({ color }) => <Icon fill={color} />,
  sort: 6,
  connection: {
    docLink: DOC_URL.PLUGIN.GITLAB.BASIS,
    initialValues: {
      endpoint: 'https://gitlab.com/api/v4/',
    },
    fields: [
      'name',
      {
        key: 'endpoint',
        multipleVersions: {
          cloud: 'https://gitlab.com/api/v4/',
          server: '(v11+)',
        },
        subLabel:
          'If you are using GitLab Server, please enter the endpoint URL. E.g. https://gitlab.your-company.com/api/v4/',
      },
      {
        key: 'token',
        label: 'Personal Access Token',
        subLabel: (
          <ExternalLink link={DOC_URL.PLUGIN.GITLAB.AUTH_TOKEN}>
            Learn how to create a personal access token
          </ExternalLink>
        ),
      },
      'proxy',
      {
        key: 'rateLimitPerHour',
        subLabel:
          'By default, DevLake uses dynamic rate limit around 12,000 requests/hour for optimized data collection for GitLab. But you can adjust the collection speed by entering a fixed value.',
        learnMore: DOC_URL.PLUGIN.GITLAB.RATE_LIMIT,
        externalInfo:
          'The maximum rate limit for GitLab Cloud is 120,000 requests/hour. Tokens under the same IP address share the rate limit, so the actual rate limit for your token will be lower than this number.',
        defaultValue: 12000,
      },
    ],
  },
  dataScope: {
    searchPlaceholder: 'Please enter at least 3 characters to search',
    title: 'Projects',
    millerColumn: {
      columnCount: 2.5,
      firstColumnTitle: 'Subgroups/Projects',
    },
  },
  scopeConfig: {
    entities: ['CODE', 'TICKET', 'CODEREVIEW', 'CROSS', 'CICD'],
    transformation: {
      envNamePattern: '(?i)prod(.*)',
      deploymentPattern: '',
      productionPattern: '',
    },
  },
};

Furthermore, the listGitlabRemoteScopes function in the backend code is responsible for listing GitLab projects and sub-projects. Ensure this function is correctly implemented to fetch the necessary data:

func listGitlabRemoteScopes(
	connection *models.GitlabConnection,
	apiClient plugin.ApiClient,
	groupId string,
	page GitlabRemotePagination,
) (
	children []dsmodels.DsRemoteApiScopeListEntry[models.GitlabProject],
	nextPage *GitlabRemotePagination,
	err errors.Error,
) {
	if page.Page == 0 {
		page.Page = 1
	}
	if page.PerPage == 0 {
		page.PerPage = 100
	}
	if page.Step == "" {
		page.Step = "group"
	}

	// load all groups unless groupId is user's own account
	if page.Step == "group" && !strings.HasPrefix(groupId, USERS_PREFIX) {
		children, nextPage, err = listGitlabRemoteGroups(connection, apiClient, groupId, page)
		if err != nil {
			return
		}
	}
	if groupId == "" || nextPage != nil {
		return
	}
	// no more groups, start to load projects under the group
	var moreChild []dsmodels.DsRemoteApiScopeListEntry[models.GitlabProject]
	moreChild, nextPage, err = listGitlabRemoteProjects(connection, apiClient, groupId, GitlabRemotePagination{
		Page:    1,
		PerPage: page.PerPage,
		Step:    "project",
	})
	if err != nil {
		return
	}
	children = append(children, moreChild...)
	return
}

By ensuring these functions and configurations are correctly implemented and the connection settings are accurate, you should be able to add a GitLab project with sub-projects to your scope in DevLake [1][2][3][4].

Details

About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

dosubot[bot] avatar May 28 '24 07:05 dosubot[bot]

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar Jul 28 '24 00:07 github-actions[bot]

@claudious05 This seems like another issue to me, please file a new issue if the problem persists.

klesh avatar Jul 29 '24 06:07 klesh