devstats.archive icon indicating copy to clipboard operation
devstats.archive copied to clipboard

[feature request] Counting Co-authored-by tags in commit messages

Open alban opened this issue 5 years ago • 1 comments

GitHub supports the Co-authored-by tag in commit messages:

I would like devstats.cncf.io to keep track of contributions made by several authors.

In the source code (gha2db.go), I see that both the main author and commit message containing the Co-authored-by tags are inserted in the gha_commits table. So the information is there, but it is not indexed.

alban avatar Feb 11 '20 12:02 alban

That's a big change, touching all commits related dashboards and all commits analysis. There is no such field in GitHub API and git logs (but it is stored in the commit message). It requires parsing commit messages, eventually finding coauthor and updating database structure. Will work on that but there is no ETA for this: cc @dankohn

lukaszgryglicki avatar Feb 11 '20 13:02 lukaszgryglicki

@lukaszgryglicki any progress on this?

dims avatar Feb 08 '23 18:02 dims

+1 on figuring this out (medium priority)

On Wed, Feb 8, 2023 at 12:04 PM Davanum Srinivas @.***> wrote:

@lukaszgryglicki https://github.com/lukaszgryglicki any progress on this?

— Reply to this email directly, view it on GitHub https://github.com/cncf/devstats/issues/234#issuecomment-1423031292, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPSIK4GXRHYJZL4VWKOMLWWPN3HANCNFSM4KS7WYUQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Cheers,

Chris Aniszczyk https://aniszczyk.org

caniszczyk avatar Feb 08 '23 18:02 caniszczyk

@caniszczyk Context for the question was https://github.com/containerd/project/pull/106/files FYI

dims avatar Feb 08 '23 19:02 dims

@dims and @caniszczyk - this requires rather big amount of work, I can start on Friday, then I have two week of PTO and when I'm back I hope to get new servers, becasuae we're running on the limits right now - my plans were to add any new functionalityie (this and supprt for last 6 months period) after we switchover to new servers. I can figure this out before switchover if needed, but again, this is not a 1 or 2 days task.

lukaszgryglicki avatar Feb 09 '23 06:02 lukaszgryglicki

Also if I add such authors as a new column in gha_commits then current dashboards won't be aware of that so those commits won't be counted. If I add them as new rows (same message/SHA) but with some special column marking this a co-author then some dashboards can double count commits (unless they use distinct sha), finally there can be multiple co-authors, so maybe it's better to add something like gha_commit_co_authors table which refers to gha_commits. In that case, no current dashboard will be aware of that too.

My suggestion is to use gha_commit_co_authors or we can even create gha_commit_roles (having commit_id, author, and role columns, the role can be co-author, but also other roles like Reviewed-by, Approved-by and so on). And then add a special dashboard that will use this extra table to get more detailed data for commits... make sense @dims & @caniszczyk ?

lukaszgryglicki avatar Feb 09 '23 06:02 lukaszgryglicki

On it with the gha_commits_roles approach.

lukaszgryglicki avatar Feb 09 '23 08:02 lukaszgryglicki

gha_commit_roles sounds more future proof! :)

➕1️⃣

dims avatar Feb 09 '23 12:02 dims

Support in code is now added (see a few last commits on cncf/devstatscode) now I'm populating this data for historical commits on all projects. In the meantime I would like to get some feedback about the dashboard to display this - do you have any ideas @dims ? Now, even if I process all historical commits across all CNCF projects, there is no dashboard showing data from this new gha_commits_roles table, ideally, I would like to have an example dashboard suggestion to implement (probably some tabular view?).

lukaszgryglicki avatar Feb 09 '23 14:02 lukaszgryglicki

@lukaszgryglicki let's start with something straight forward, let's update the following as follows:

Add a "Co Authored PR" column and display the total number of times they showed up in Co-authored-by tags in PRs.

  • https://containerd.devstats.cncf.io/d/22/prs-authors-table?orgId=1 :
  • https://containerd.devstats.cncf.io/d/55/company-prs-in-repository-groups-table?orgId=1

We could add "Co authored PRs" in the drop down in this:

  • https://containerd.devstats.cncf.io/d/66/developer-activity-counts-by-companies?orgId=1

Do these make sense?

dims avatar Feb 09 '23 14:02 dims

not sure yet, will think about this, for now I'm trying to get all data first, also to do some sanity checks.

lukaszgryglicki avatar Feb 09 '23 15:02 lukaszgryglicki

Right now I have this:

allprj=# select count(*) from gha_commits_roles;
  count  
---------
 1218678
(1 row)

allprj=# select role, count(distinct actor_name || actor_email) as actors from gha_commits_roles group by role order by actors desc;
      role      | actors 
----------------+--------
 Co-authored-by |  23291
 Signed-off-by  |  21556
 Reviewed-by    |    192
 Reported-by    |    136
 Informed-by    |     33
 Tested-by      |     21
 Influenced-by  |      1
 Approved-by    |      1
(8 rows)

allprj=# select role, count(*) as actors from gha_commits_roles group by role order by actors desc;
      role      | actors 
----------------+--------
 Co-authored-by | 619304
 Signed-off-by  | 601764
 Reviewed-by    |   1334
 Reported-by    |    639
 Tested-by      |    237
 Informed-by    |     78
 Influenced-by  |      2
 Approved-by    |      1
(8 rows)

lukaszgryglicki avatar Feb 09 '23 15:02 lukaszgryglicki

@caniszczyk @dims I almost have all data now, found some bugs that I needed to fix on the way, but I'm very excited that we will now have all projects all commits trailers/roles in our database (this means parsing every single commit message across all projects on both test & prod and summary projects like All CNCF and CNCF itself).

lukaszgryglicki avatar Feb 09 '23 17:02 lukaszgryglicki

This is very interesting for Linux, because Linux uses commits trailers very heavy:

linux=# select count(*) from gha_commits;
 count  
--------
 320787
(1 row)

linux=# select count(*) from gha_commits_roles;
 count  
--------
 744047
(1 row)

linux=# select role, count(distinct actor_name || actor_email) as actors from gha_commits_roles group by role order by actors desc;
      role      | actors 
----------------+--------
 Co-authored-by |  12006
 Signed-off-by  |  12000
 Reported-by    |   5279
 Reviewed-by    |   4575
 Informed-by    |   4096
 Tested-by      |   3713
 Influenced-by  |     45
 Resolved-by    |     10
(8 rows)

linux=# select role, count(*) as actors from gha_commits_roles group by role order by actors desc;
      role      | actors 
----------------+--------
 Co-authored-by | 281728
 Signed-off-by  | 281727
 Reviewed-by    |  86447
 Informed-by    |  46921
 Reported-by    |  26497
 Tested-by      |  20551
 Influenced-by  |    153
 Resolved-by    |     23
(8 rows)

I'm mapping various non-standard trailers/roles names into standardized ones, see this code.

lukaszgryglicki avatar Feb 09 '23 17:02 lukaszgryglicki

@dims regarding

Add a "Co Authored PR" column and display the total number of times they showed up in Co-authored-by tags in PRs.

https://containerd.devstats.cncf.io/d/22/prs-authors-table?orgId=1 :
https://containerd.devstats.cncf.io/d/55/company-prs-in-repository-groups-table?orgId=1

Those two dashboards are for PRs not commits - they count users who opened PRs, not quite sure what you mean to add commit roles there - they're not related to commits at all, they count GitHub (not even git) PR openers. This is why I suggested an additional dashboard/dashboards. Even if we connect PR with commit somehow, there can be multiple commits within a single PR and I also (think) there can be multiple PRs opened for the same commit(s).

lukaszgryglicki avatar Feb 09 '23 18:02 lukaszgryglicki

aha, so let's just do the last one, see how it looks and i'll ask other folks where this info would be good to be displayed.

dims avatar Feb 09 '23 18:02 dims

OK will do that then. Thanks! Tomorrow is my last day before 2 weeks of PTO - can't promise anything, but it will be ready after my PTO in the worst case.

lukaszgryglicki avatar Feb 09 '23 18:02 lukaszgryglicki

have a great vacation! see you when u get back.

dims avatar Feb 09 '23 18:02 dims

Update One dashboard and regenerated its full data for containerd here. This dashboard now lists the number of commits per developer, which also includes commits co-authored. Will update other projects as well. cc @dims @caniszczyk @alban

lukaszgryglicki avatar Feb 10 '23 09:02 lukaszgryglicki

OK, this is implemented, now closing this issue. If you want to add counting co-authors in other dashboards please create separate feature request(s).

lukaszgryglicki avatar Feb 27 '23 06:02 lukaszgryglicki

thanks @lukaszgryglicki !

dims avatar Feb 27 '23 14:02 dims