kamu-cli
kamu-cli copied to clipboard
Struct not supported in nested data
I hit an issue while trying to pull a nested data source structured as follows:
{
"starred_at": "2020-07-05T01:34:55Z",
"user": {
"login": "onyalcin",
"id": 7300802
}
}
Received the following error, showing STRUCT is not supported:
Failed to pull com.github.kamu-cli.stargazers: 0: Internal error 1: This feature is not implemented: Unsupported SQL type Custom(ObjectName([Ident { value: "STRUCT", quote_style: None }]), ["id", "BIGINT", "login", "STRING"])
Currently we need to add a preprocessing step with jq to handle this, which is too complex. Can we support this feature during the read phase in DataFusion?
Below is the dataset definition I worked with:
kind: DatasetSnapshot
version: 1
content:
name: com.github.kamu-cli.stargazers
kind: Root
metadata:
- kind: SetPollingSource
fetch:
kind: Url
url: https://api.github.com/repos/kamu-data/kamu-cli/stargazers
headers:
- name: User-Agent
value: kamu
- name: Accept
value: application/vnd.github.star+json
read:
kind: Json
schema:
- starred_at TIMESTAMP
- user STRUCT(id BIGINT, login STRING)
preprocess:
kind: Sql
engine: datafusion
query: |
SELECT
starred_at as event_time,
user.id as user_id,
user.login as user_name
FROM input
merge:
kind: Snapshot
primaryKey:
- event_time
- user_id
- kind: SetInfo
description: Stars of the selected github repository.