1708 adjust data cleaning script to prune data points outside of la neighborhood districts
Fixes #1708
- [ ] Up to date with
mainbranch - [ ] Branch name follows guidelines
- [ ] All PR Status checks are successful
- [ ] Peer reviewed and approved
Any questions? See the getting started guide
Check out this pull request on ![]()
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Hi @mru-hub, will review before Sunday 6/9. Thank you!
Hi @mru-hub, I reviewed the script and did some testing. I think we need a video chat session to discuss further on how to implment this before I can add documentation for my review here. I've send you a slack message, please let me know in Slack when would be a good time to chat.
Hi @mru-hub I pushed the modified script that we discussed to this PR.
@Skydodle Connected with Johnny and decided to test the code locally by directly passing a filtered Parquet file to the logic instead of using the Hugging Face file.
Hi @Skydodle I pushed the modified script to this PR. Kindly review.
Update: enabled testing on browser by pointing the file registration to the filtered csv file in public folder. @mru-hub will continue in-depth testing and implement finalized script to integrate with cron job. Thanks.
Based on new requirements, @mru-hub will just provide 1 script (containing instructions on how to test locally)
Cleaning Script: updateHfDataset_FilterByBoundaries.py :
Added a new Python script, updateHfDataset_FilterByBoundaries.py, to streamline the data filtering based on geographic boundaries.
Local Testing Instructions: This script contains instructions on how to test the functionality locally.
Integration Testing Steps: Included steps for integration testing to facilitate future updates via cron jobs.
Functionality Changes from current/base 'updateHfDataset.py' script:
- Introduced a new function 'hfFilter{}' - filters points from a Parquet file based on provided GeoJSON boundary.
- Maintained all existing code and functions from the previous version for consistency.