Jingxuan H
Jingxuan H
# Description Implementation of `clean_language` as described in #664. # How Has This Been Tested? By testing the default setting, specifying formats and changing knowledge bases. # Snapshots:  ...
## Summary Implement `clean_language()` function to clean a table containing language. ## Design-level Explanation Actions - [x] Investigate prior art solutions for cleaning and validating language. - [x] follow the...
**Describe the bug** Some countries, such as Scotland and Yugoslavia, cannot be recognized by `clean_country`. **To reproduce** from this [dataset](https://www.kaggle.com/martj42/international-football-results-from-1872-to-2017):  Currently "Scotland", "England", "Wales" cannot be recognized by Dataprep.clean....
**Describe the bug** Given a sets of email addresses with potential typos, `clean_email` cannot fix domains as expected. **To Reproduce** The examples are given by [this websites](https://help.xmatters.com/ondemand/trial/valid_email_format.htm), or just see...
Currently `clean_lat_long` and `validate_lat_long` do not support NMEA format, which is given by GPS modules. **Is your feature request related to a problem? Please describe.** Here is an example from...
Currently, `remove_auth` in `clean_url` supports two scenarios: 1. remove tokens in the default list, 2. remove tokens in the union of default list and the list user provided. However, there...
**Describe the bug** Some addresses with a single building name cannot be recognized by `clean_address` even if `must_contain` is set to null. **To reproduce** from this [dataset](https://www.kaggle.com/dustincm/chinese-delivery-drive), or see the...