edsnlp
edsnlp copied to clipboard
Rule-based pseudonymisation
Description
Add pipeline components to handle rule-based pseudonymisation.
- Handle static rules, such as detection of phone numbers, mail addresses, SSN, etc
- Handle contextual information, using
nlp.pipe(zip(docs, contexts), as_tuples=True)
Checklist
- [ ] If this PR is a bug fix, the bug is documented in the test suite.
- [ ] Changes were documented in the changelog (pending section).
- [ ] If necessary, changes were made to the documentation (eg new pipeline).
Codecov Report
Merging #24 (e08ae3e) into master (a39ea23) will increase coverage by
0.14%. The diff coverage is98.70%.
@@ Coverage Diff @@
## master #24 +/- ##
==========================================
+ Coverage 95.37% 95.52% +0.14%
==========================================
Files 145 155 +10
Lines 3289 3443 +154
==========================================
+ Hits 3137 3289 +152
- Misses 152 154 +2
| Impacted Files | Coverage Δ | |
|---|---|---|
| edsnlp/pipelines/misc/dates/factory.py | 100.00% <ø> (ø) |
|
| ...ipelines/misc/pseudonymisation/pseudonymisation.py | 96.15% <96.15%> (ø) |
|
| edsnlp/pipelines/misc/context/context.py | 97.22% <97.22%> (ø) |
|
| edsnlp/pipelines/core/context_matcher/__init__.py | 100.00% <100.00%> (ø) |
|
| edsnlp/pipelines/core/context_matcher/factory.py | 100.00% <100.00%> (ø) |
|
| edsnlp/pipelines/core/context_matcher/matcher.py | 100.00% <100.00%> (ø) |
|
| edsnlp/pipelines/factories.py | 100.00% <100.00%> (ø) |
|
| edsnlp/pipelines/misc/context/__init__.py | 100.00% <100.00%> (ø) |
|
| edsnlp/pipelines/misc/dates/dates.py | 96.26% <100.00%> (+0.26%) |
:arrow_up: |
| edsnlp/pipelines/misc/dates/models.py | 80.67% <100.00%> (+0.84%) |
:arrow_up: |
| ... and 7 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update a39ea23...e08ae3e. Read the comment docs.
Closing since most of these pipelines have been integrated into https://github.com/aphp/eds-pseudo and will be developed / improved there.