311-data icon indicating copy to clipboard operation
311-data copied to clipboard

1708 adjust data cleaning script to prune data points outside of la neighborhood districts

Open mru-hub opened this issue 1 year ago • 7 comments

Fixes #1708

  • [ ] Up to date with main branch
  • [ ] Branch name follows guidelines
  • [ ] All PR Status checks are successful
  • [ ] Peer reviewed and approved

Any questions? See the getting started guide

mru-hub avatar Jun 03 '24 04:06 mru-hub

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Hi @mru-hub, will review before Sunday 6/9. Thank you!

Skydodle avatar Jun 04 '24 17:06 Skydodle

Hi @mru-hub, I reviewed the script and did some testing. I think we need a video chat session to discuss further on how to implment this before I can add documentation for my review here. I've send you a slack message, please let me know in Slack when would be a good time to chat.

Skydodle avatar Jun 15 '24 01:06 Skydodle

Hi @mru-hub I pushed the modified script that we discussed to this PR.

Skydodle avatar Jun 19 '24 00:06 Skydodle

@Skydodle Connected with Johnny and decided to test the code locally by directly passing a filtered Parquet file to the logic instead of using the Hugging Face file.

mru-hub avatar Jun 19 '24 07:06 mru-hub

Hi @Skydodle I pushed the modified script to this PR. Kindly review.

mru-hub avatar Jun 20 '24 18:06 mru-hub

Update: enabled testing on browser by pointing the file registration to the filtered csv file in public folder. @mru-hub will continue in-depth testing and implement finalized script to integrate with cron job. Thanks.

Skydodle avatar Jun 22 '24 03:06 Skydodle

Based on new requirements, @mru-hub will just provide 1 script (containing instructions on how to test locally)

ryanfchase avatar Oct 19 '24 17:10 ryanfchase

Cleaning Script: updateHfDataset_FilterByBoundaries.py :

Added a new Python script, updateHfDataset_FilterByBoundaries.py, to streamline the data filtering based on geographic boundaries.

Local Testing Instructions: This script contains instructions on how to test the functionality locally.

Integration Testing Steps: Included steps for integration testing to facilitate future updates via cron jobs.

Functionality Changes from current/base 'updateHfDataset.py' script:

  • Introduced a new function 'hfFilter{}' - filters points from a Parquet file based on provided GeoJSON boundary.
  • Maintained all existing code and functions from the previous version for consistency.

mru-hub avatar Oct 24 '24 02:10 mru-hub