[feature request] Counting Co-authored-by tags in commit messages
GitHub supports the Co-authored-by tag in commit messages:
I would like devstats.cncf.io to keep track of contributions made by several authors.
In the source code (gha2db.go), I see that both the main author and commit message containing the Co-authored-by tags are inserted in the gha_commits table. So the information is there, but it is not indexed.
That's a big change, touching all commits related dashboards and all commits analysis. There is no such field in GitHub API and git logs (but it is stored in the commit message). It requires parsing commit messages, eventually finding coauthor and updating database structure. Will work on that but there is no ETA for this: cc @dankohn
@lukaszgryglicki any progress on this?
+1 on figuring this out (medium priority)
On Wed, Feb 8, 2023 at 12:04 PM Davanum Srinivas @.***> wrote:
@lukaszgryglicki https://github.com/lukaszgryglicki any progress on this?
— Reply to this email directly, view it on GitHub https://github.com/cncf/devstats/issues/234#issuecomment-1423031292, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPSIK4GXRHYJZL4VWKOMLWWPN3HANCNFSM4KS7WYUQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Cheers,
Chris Aniszczyk https://aniszczyk.org
@caniszczyk Context for the question was https://github.com/containerd/project/pull/106/files FYI
@dims and @caniszczyk - this requires rather big amount of work, I can start on Friday, then I have two week of PTO and when I'm back I hope to get new servers, becasuae we're running on the limits right now - my plans were to add any new functionalityie (this and supprt for last 6 months period) after we switchover to new servers. I can figure this out before switchover if needed, but again, this is not a 1 or 2 days task.
Also if I add such authors as a new column in gha_commits then current dashboards won't be aware of that so those commits won't be counted. If I add them as new rows (same message/SHA) but with some special column marking this a co-author then some dashboards can double count commits (unless they use distinct sha), finally there can be multiple co-authors, so maybe it's better to add something like gha_commit_co_authors table which refers to gha_commits. In that case, no current dashboard will be aware of that too.
My suggestion is to use gha_commit_co_authors or we can even create gha_commit_roles (having commit_id, author, and role columns, the role can be co-author, but also other roles like Reviewed-by, Approved-by and so on). And then add a special dashboard that will use this extra table to get more detailed data for commits... make sense @dims & @caniszczyk ?
On it with the gha_commits_roles approach.
gha_commit_roles sounds more future proof! :)
➕1️⃣
Support in code is now added (see a few last commits on cncf/devstatscode) now I'm populating this data for historical commits on all projects.
In the meantime I would like to get some feedback about the dashboard to display this - do you have any ideas @dims ? Now, even if I process all historical commits across all CNCF projects, there is no dashboard showing data from this new gha_commits_roles table, ideally, I would like to have an example dashboard suggestion to implement (probably some tabular view?).
@lukaszgryglicki let's start with something straight forward, let's update the following as follows:
Add a "Co Authored PR" column and display the total number of times they showed up in Co-authored-by tags in PRs.
- https://containerd.devstats.cncf.io/d/22/prs-authors-table?orgId=1 :
- https://containerd.devstats.cncf.io/d/55/company-prs-in-repository-groups-table?orgId=1
We could add "Co authored PRs" in the drop down in this:
- https://containerd.devstats.cncf.io/d/66/developer-activity-counts-by-companies?orgId=1
Do these make sense?
not sure yet, will think about this, for now I'm trying to get all data first, also to do some sanity checks.
Right now I have this:
allprj=# select count(*) from gha_commits_roles;
count
---------
1218678
(1 row)
allprj=# select role, count(distinct actor_name || actor_email) as actors from gha_commits_roles group by role order by actors desc;
role | actors
----------------+--------
Co-authored-by | 23291
Signed-off-by | 21556
Reviewed-by | 192
Reported-by | 136
Informed-by | 33
Tested-by | 21
Influenced-by | 1
Approved-by | 1
(8 rows)
allprj=# select role, count(*) as actors from gha_commits_roles group by role order by actors desc;
role | actors
----------------+--------
Co-authored-by | 619304
Signed-off-by | 601764
Reviewed-by | 1334
Reported-by | 639
Tested-by | 237
Informed-by | 78
Influenced-by | 2
Approved-by | 1
(8 rows)
@caniszczyk @dims I almost have all data now, found some bugs that I needed to fix on the way, but I'm very excited that we will now have all projects all commits trailers/roles in our database (this means parsing every single commit message across all projects on both test & prod and summary projects like All CNCF and CNCF itself).
This is very interesting for Linux, because Linux uses commits trailers very heavy:
linux=# select count(*) from gha_commits;
count
--------
320787
(1 row)
linux=# select count(*) from gha_commits_roles;
count
--------
744047
(1 row)
linux=# select role, count(distinct actor_name || actor_email) as actors from gha_commits_roles group by role order by actors desc;
role | actors
----------------+--------
Co-authored-by | 12006
Signed-off-by | 12000
Reported-by | 5279
Reviewed-by | 4575
Informed-by | 4096
Tested-by | 3713
Influenced-by | 45
Resolved-by | 10
(8 rows)
linux=# select role, count(*) as actors from gha_commits_roles group by role order by actors desc;
role | actors
----------------+--------
Co-authored-by | 281728
Signed-off-by | 281727
Reviewed-by | 86447
Informed-by | 46921
Reported-by | 26497
Tested-by | 20551
Influenced-by | 153
Resolved-by | 23
(8 rows)
I'm mapping various non-standard trailers/roles names into standardized ones, see this code.
@dims regarding
Add a "Co Authored PR" column and display the total number of times they showed up in Co-authored-by tags in PRs.
https://containerd.devstats.cncf.io/d/22/prs-authors-table?orgId=1 :
https://containerd.devstats.cncf.io/d/55/company-prs-in-repository-groups-table?orgId=1
Those two dashboards are for PRs not commits - they count users who opened PRs, not quite sure what you mean to add commit roles there - they're not related to commits at all, they count GitHub (not even git) PR openers. This is why I suggested an additional dashboard/dashboards. Even if we connect PR with commit somehow, there can be multiple commits within a single PR and I also (think) there can be multiple PRs opened for the same commit(s).
aha, so let's just do the last one, see how it looks and i'll ask other folks where this info would be good to be displayed.
OK will do that then. Thanks! Tomorrow is my last day before 2 weeks of PTO - can't promise anything, but it will be ready after my PTO in the worst case.
have a great vacation! see you when u get back.
Update One dashboard and regenerated its full data for containerd here.
This dashboard now lists the number of commits per developer, which also includes commits co-authored.
Will update other projects as well.
cc @dims @caniszczyk @alban
OK, this is implemented, now closing this issue. If you want to add counting co-authors in other dashboards please create separate feature request(s).
thanks @lukaszgryglicki !