incubator-devlake [Feature][Customize] Enhance Customize Plugin: Support CSV Import for issue_changelogs, issue_worklogs, sprints, and Add sprints Field to issues.csv

Search before asking

[x] I had searched in the issues and found no similar feature requirement.

Use case

Many organizations use diverse or custom-built issue tracking systems that are not natively supported by dedicated DevLake plugins. The "Customize" plugin currently allows importing issues and issue_repo_commits via CSV, which is a valuable first step. However, to gain comprehensive insights into the issue tracking domain (similar to what's possible with natively supported tools like Jira), users need to import related data such as issue history, worklogs, and sprint information.

By expanding the Customize plugin's CSV import capabilities, users could:

Ingest data from any unsupported third-party or in-house issue tracking system.
Perform a complete analysis of their issue lifecycle, including status transitions, resolution times, and bottlenecks (using issue_changelogs).
Track and analyze effort spent on tasks and projects (using issue_worklogs).
Monitor sprint progress, team velocity, and scope changes (using sprints and linking them to issues).
Fully leverage DevLake's dashboards and analytical features for the issue tracking domain, regardless of the source system.

Description

The "Customize" plugin is a powerful tool for ingesting data into DevLake's domain layer. Currently, for the issue tracking domain, it supports CSV import for the issues and issue_repo_commits entities.

This feature request proposes to extend the capabilities of the Customize plugin to allow CSV import for the following additional issue tracking domain entities:

issue_changelogs: To capture the historical changes of issues (e.g., status updates, assignee changes, priority modifications). This would require defining a CSV schema for issue_changelogs.
issue_worklogs: To import time tracking data associated with issues (e.g., who logged time, how much time, and when). This would require defining a CSV schema for issue_worklogs.
sprints: To import sprint information (e.g., sprint name, start date, end date, status). This would require defining a CSV schema for sprints.

Additionally, to effectively link issues to their respective sprints, we propose adding a new field to the existing issues.csv schema:

sprints field in issues.csv: This field (e.g., storing a list of sprint IDs or names) would allow an issue to be associated with one or more sprints during the CSV import process.

Expected Outcome: With these enhancements, users will be able to:

Prepare CSV files for issues, issue_changelogs, issue_worklogs, and sprints extracted from their unsupported issue tracking systems.
Use the Customize plugin's API to import this data into the corresponding DevLake domain layer entities.
Utilize DevLake's full suite of issue tracking dashboards and analytical capabilities (e.g., lead time, cycle time, sprint velocity, bug trends) for data sourced via CSV.

This will significantly increase the flexibility of DevLake and allow a broader range of users to consolidate and analyze their development data for engineering excellence.

Related issues

No response

Are you willing to submit a PR?

[x] Yes I am willing to submit a PR!

Code of Conduct

[x] I agree to follow this project's Code of Conduct

May 21 '25 07:05 narrowizard

Based on the proposed plan, here's a detailed breakdown:

1. New CSV Template Structures:

We will implement the following new CSV template structures:

sprints.csv:
- id: varchar
- name: varchar
- url: varchar (Iteration web link, optional)
- started_date: datetime (Planned start time of iteration)
- ended_date: datetime (Planned end time, optional)
- completed_date: datetime (Actual completion time of iteration)
- status: enum (CLOSED | ACTIVE | FUTURE)
issue_changelogs.csv:
- id: varchar(255)
- issue_id: varchar(255)
- author_name: varchar(255) (Will generate account record via author name)
- field_name: enum (status | Sprint | assignee)
  - When field_name is status, original_from_value and original_to_value will be status values (e.g., Pending, In Progress, Done).
  - When field_name is Sprint, original_from_value and original_to_value will be sprint IDs, comma-separated (e.g., sprint_id_1,sprint_id_2,sprint_id_3). An empty value indicates no iteration set.
  - When field_name is assignee, original_from_value and original_to_value will be assignee names, and will be converted to account_id during data import.
- original_from_value: text (Original value, different values based on field_name)
- original_to_value: text (Changed value, refers to original_from_value)
- created_date: datetime (Creation time)
issue_worklogs.csv:
- id: varchar(255)
- author_name: varchar(255) (Author name, will create account record and convert to id)
- comment: text (Worklog description, optional)
- time_spent_minutes: int (Work time, in minutes)
- logged_date: datetime (Log time)
- started_date: datetime (Start time)
- issue_id: varchar(255)

2. issues table new field and sprint_issues table data import:

issues csv: We will add a new field sprint_id (type: varchar) to the existing issues csv. This field will represent the current sprint(s) an issue belongs to. It can be empty, and multiple sprint IDs will be separated by commas.
sprint_issues table: We will implement the logic to import data into sprint_issues table. This table will be populated based on the sprint_id field in issues.csv, establishing the many-to-many relationship between sprints and issues.

This plan addresses the core requirements of importing comprehensive issue tracking data, including historical changes, worklogs, and sprint associations. We will ensure proper data parsing, transformation, and linkage to DevLake's domain layer entities.

All fields and their formats are designed with reference to the existing issue tracking domain layer

May 29 '25 01:05 narrowizard

Implemented by #8456

Jul 26 '25 07:07 narrowizard