Nathan Thomas
Nathan Thomas
Looks like this project is currently dead
I'm curious, would the fix for this to be to add a space if one doesn't exist for certain tags or classes that would be presented in html as having...
This is actually a reduced version of a real webpage. However, in the real webpage only the squishing of words together occurs, not the doubling issue.
Here is an example that only includes the word squishing issue, where whitespace between words is sometimes removed: ``` def test_white_space_issue(): from trafilatura import extract html_string = """ First This...