boilerpipe3 icon indicating copy to clipboard operation
boilerpipe3 copied to clipboard

KeepEverythingWithMinKWordsExtractor not working

Open derlin opened this issue 7 years ago • 0 comments

First, thanks for the port.

When trying to use KeepEverythingWithMinKWordsExtractor, I get the error:

Traceback (most recent call last):
  File "test.py", line 4, in <module>
    extractor = Extractor(extractor='KeepEverythingWithMinKWordsExtractor', url=url, kMin=20)
  File "/private/tmp/html_extract/venv/lib/python3.6/site-packages/boilerpipe/extract/__init__.py", line 62, in __init__
    "de.l3s.boilerpipe.extractors."+extractor).INSTANCE
AttributeError: type object 'de.l3s.boilerpipe.extractors.KeepEverythingWithMin' has no attribute 'INSTANCE'

The problem is that the KeepEverythingWithMinKWordsExtractor constructor takes an argument (see the java code).

To fix this, line 60 in extract/__init__.py should be replaced with:

if extractor == "KeepEverythingWithMinKWordsExtractor":
   # handle argument
    kMin = kwargs.get("kMin", 1)  # set default to 1
    self.extractor = jpype.JClass(
            "de.l3s.boilerpipe.extractors."+extractor)(kMin)
else:
    self.extractor = jpype.JClass(
        "de.l3s.boilerpipe.extractors."+extractor).INSTANCE

derlin avatar Mar 25 '18 06:03 derlin