Vlad Isayko
Vlad Isayko
@theycallmeswift are there any files in the '/data' dir?
@theycallmeswift @jerpelea Hello, the problem is really outdated and incomplete documentation. We will fix this in the coming days. I'll keep you posted
At the moment, this is the current way to start 1. ```python3 osci-cli.py get-github-daily-push-events -d 2020-01-01``` 2. ```python3 osci-cli.py process-github-daily-push-events -d 2020-01-01``` 3. ```python3 osci-cli.py daily-active-repositories -d 2020-01-01``` 4. ```python3...
@jerpelea can you also share what version of pyspark and spark do you have?
@jerpelea may be there are some problems with parquet file. We need to check it
@jerpelea we use the same libraries with the same versions. Can you share some files that generated in staging area?
@jerpelea Is there any files in `/staging/github/events/push/2021/01/01/`? Before step 6 there should be files in directories: - `/staging/github/raw-events/push/2021/01/01/` - `/staging/github/repository/2021/01/` - `/staging/github/events/push/2021/01/01/`
@jerpelea Can you rerun step 5 `python3 osci-cli.py filter-unlicensed -d 2020-01-01` and share logs from this command? I think that there some problem at this step.
@jerpelea Ok, it's strange that repository file in staging is empty... Is there this file `/landing/github/repository/2021/01/2021-01-01.csv`? Can you share it?
@jerpelea So the error occurred at step 4 when getting information about the repositories from the Github API. I ran this step on my own with your source file and...