edsnlp icon indicating copy to clipboard operation
edsnlp copied to clipboard

Rule-based pseudonymisation

Open bdura opened this issue 4 years ago • 1 comments

Description

Add pipeline components to handle rule-based pseudonymisation.

  1. Handle static rules, such as detection of phone numbers, mail addresses, SSN, etc
  2. Handle contextual information, using nlp.pipe(zip(docs, contexts), as_tuples=True)

Checklist

  • [ ] If this PR is a bug fix, the bug is documented in the test suite.
  • [ ] Changes were documented in the changelog (pending section).
  • [ ] If necessary, changes were made to the documentation (eg new pipeline).

bdura avatar Mar 18 '22 08:03 bdura

Codecov Report

Merging #24 (e08ae3e) into master (a39ea23) will increase coverage by 0.14%. The diff coverage is 98.70%.

@@            Coverage Diff             @@
##           master      #24      +/-   ##
==========================================
+ Coverage   95.37%   95.52%   +0.14%     
==========================================
  Files         145      155      +10     
  Lines        3289     3443     +154     
==========================================
+ Hits         3137     3289     +152     
- Misses        152      154       +2     
Impacted Files Coverage Δ
edsnlp/pipelines/misc/dates/factory.py 100.00% <ø> (ø)
...ipelines/misc/pseudonymisation/pseudonymisation.py 96.15% <96.15%> (ø)
edsnlp/pipelines/misc/context/context.py 97.22% <97.22%> (ø)
edsnlp/pipelines/core/context_matcher/__init__.py 100.00% <100.00%> (ø)
edsnlp/pipelines/core/context_matcher/factory.py 100.00% <100.00%> (ø)
edsnlp/pipelines/core/context_matcher/matcher.py 100.00% <100.00%> (ø)
edsnlp/pipelines/factories.py 100.00% <100.00%> (ø)
edsnlp/pipelines/misc/context/__init__.py 100.00% <100.00%> (ø)
edsnlp/pipelines/misc/dates/dates.py 96.26% <100.00%> (+0.26%) :arrow_up:
edsnlp/pipelines/misc/dates/models.py 80.67% <100.00%> (+0.84%) :arrow_up:
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a39ea23...e08ae3e. Read the comment docs.

codecov-commenter avatar Mar 18 '22 10:03 codecov-commenter

Closing since most of these pipelines have been integrated into https://github.com/aphp/eds-pseudo and will be developed / improved there.

percevalw avatar Dec 12 '22 14:12 percevalw