extruct icon indicating copy to clipboard operation
extruct copied to clipboard

Make the html cleaning for microdata faster

Open jakubwasikowski opened this issue 6 years ago • 2 comments

Hey @kmike! Here is a small tweak to the https://github.com/scrapinghub/extruct/pull/119.

However, according to my recent performance tests, the code from https://github.com/scrapinghub/extruct/pull/119 doesn't affect performance and the code from this PR doesn't improve anything, so from my point of view, we may just close it.

Re technicals - it turned out that we can clean HTML just a single time, but without cleaning and tags.

jakubwasikowski avatar Aug 02 '19 08:08 jakubwasikowski

Codecov Report

Merging #123 into master will decrease coverage by 0.05%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #123      +/-   ##
==========================================
- Coverage   87.78%   87.73%   -0.06%     
==========================================
  Files          11       11              
  Lines         475      473       -2     
  Branches      103      103              
==========================================
- Hits          417      415       -2     
  Misses         52       52              
  Partials        6        6
Impacted Files Coverage Δ
extruct/w3cmicrodata.py 99.13% <100%> (-0.02%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8683981...d8c03b7. Read the comment docs.

codecov[bot] avatar Aug 02 '19 11:08 codecov[bot]

Codecov Report

Merging #123 into master will decrease coverage by 0.05%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #123      +/-   ##
==========================================
- Coverage   87.78%   87.73%   -0.06%     
==========================================
  Files          11       11              
  Lines         475      473       -2     
  Branches      103      103              
==========================================
- Hits          417      415       -2     
  Misses         52       52              
  Partials        6        6
Impacted Files Coverage Δ
extruct/w3cmicrodata.py 99.13% <100%> (-0.02%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8683981...d8c03b7. Read the comment docs.

codecov[bot] avatar Aug 02 '19 11:08 codecov[bot]