Blank commit_ids
⚠️ Issue: It looks like commit_id is not stored properly.
Query:
select commit_id from github_events where repo_id = 1181927 order by event_day desc limit 10;
Response:
{
"requestedAt": "2022-11-05T09:20:25.860+00:00",
"finishedAt": "2022-11-05T09:20:28.086+00:00",
"spent": 2.226,
"sql": "SELECT `commit_id` FROM `github_events` WHERE `repo_id` = 1181927 ORDER BY `event_day` DESC LIMIT 10",
"fields": [
{
"name": "commit_id"
}
],
"data": [
{
"commit_id": ""
},
{
"commit_id": ""
},
{
"commit_id": ""
},
{
"commit_id": ""
},
{
"commit_id": ""
},
{
"commit_id": ""
},
{
"commit_id": ""
},
{
"commit_id": ""
},
{
"commit_id": ""
},
{
"commit_id": ""
}
]
}
Blank UI:

Another example:
Query:
SELECT
DISTINCT commit_id
FROM
github_events
WHERE
repo_id = 100401665;
Response:
{
"requestedAt": "2022-11-05T09:54:45.242+00:00",
"finishedAt": "2022-11-05T09:54:45.286+00:00",
"spent": 0.044,
"sql": "SELECT DISTINCT `commit_id` FROM `github_events` WHERE `repo_id` = 100401665 LIMIT 100",
"fields": [
{
"name": "commit_id"
}
],
"data": [
{
"commit_id": ""
}
]
}
Possible cause:
Example Github event:
{
"id": "25044688494",
"type": "PushEvent",
"actor": {
"id": 5184102,
"login": "CocoaPodsBot",
"display_login": "CocoaPodsBot",
"gravatar_id": "",
"url": "https://api.github.com/users/CocoaPodsBot",
"avatar_url": "https://avatars.githubusercontent.com/u/5184102?"
},
"repo": {
"id": 2365549,
"name": "CocoaPods/Specs",
"url": "https://api.github.com/repos/CocoaPods/Specs"
},
"payload": {
"push_id": 11561212689,
"size": 1,
"distinct_size": 1,
"ref": "refs/heads/master",
"head": "dad5d1730d00b686775e038cc7428692867fd849",
"before": "62a80dcdc175f423b5cb6d4f6a2e9c5dd6030304",
"commits": [
{
"sha": "dad5d1730d00b686775e038cc7428692867fd849",
"author": {
"email": "[email protected]",
"name": "Jorge Luis Rivera Ladino"
},
"message": "[Add] CommonUIKitComponents 1.2.3",
"distinct": true,
"url": "https://api.github.com/repos/CocoaPods/Specs/commits/dad5d1730d00b686775e038cc7428692867fd849"
}
]
},
"public": true,
"created_at": "2022-11-05T04:41:23Z",
"org": {
"id": 1189714,
"login": "CocoaPods",
"gravatar_id": "",
"url": "https://api.github.com/orgs/CocoaPods",
"avatar_url": "https://avatars.githubusercontent.com/u/1189714?"
}
}
I'd assume that the commit_id be picked from payload.commits[][sha] but looks like it instead gets picked from
event.dig("payload", "comment", "commit_id") || ''
at: https://github.com/pingcap/ossinsight/blob/22356832e00106162d9516a435db398cb9cbcbb7/etl/config/initializers/fetch_event.rb#L64?
For the CommitCommentEvent, the commit_id is the commented commit, and this data is correct.
For PushEvent, we really should extract commit_ids or commit_shas, this field will be added later, thank you for your reminder 👍
id | 22127996623
type | CommitCommentEvent
created_at | 2022-06-02 15:23:40
repo_id | 498029471
repo_name | luis-unlam/Trabajo-React-GM2-
actor_id | 35613825
actor_login | vercel[bot]
language |
additions | 0
deletions | 0
action |
number | 0
commit_id | b48fee47ee26207a358186d89a2955618c60fb2c
comment_id | 75183688
org_login |
org_id | 0
state |
closed_at | 1970-01-01 00:00:00
comments | 0
pr_merged_at | 1970-01-01 00:00:00
pr_merged | 0
pr_changed_files | 0
pr_review_comments | 0
pr_or_issue_id | 0
event_day | 2022-06-02
event_month | 2022-06-01
event_year | 2022
push_size | 0
push_distinct_size | 0
creator_user_login | vercel[bot]
creator_user_id | 35613825
pr_or_issue_created_at | 1970-01-01 00:00:00
this field will be added later
Can the existing field not be used? Why add a new field altogether and not reuse the existing field?
Assuming that CommitCommentEvent and PushEvent are mutually exclusive, using the same commit_id field should be possible, assuming the rest of the fields are redundant, i.e. multiple rows with same id but different commit_id. Looking at the schema, the id is not-unique.