cx-extractor-python icon indicating copy to clipboard operation
cx-extractor-python copied to clipboard

基于行块分布函数的通用网页正文抽取算法的Python版本实现,添加了英文支持/ Web page content extraction algorithm, support both Chinese and English

Results 2 cx-extractor-python issues
Sort by recently updated
recently updated
newest added

``` Traceback (most recent call last): File "testEnglish.py", line 11, in textfile.write(text) TypeError: write() argument must be str, not coroutine ``` print则会显示内存地址,而不是文本 不管是在testEnglish.py还是我自己写的脚本中都有这个问题 多线程小白不知道怎么处理 提前谢谢

每篇网页是不是应该总共有LinesNum(content)-K+1个block?如果是LinesNum(content)-K的话,最后一块就没有加入判断