cola issues

docs: Fix a few typos

There are small typos in: - cola/cluster/master.py - cola/core/bloomfilter/__init__.py - cola/core/opener.py Fixes: - Should read `experimentally` rather than `experimently`. - Should read `entries` rather than `enteries`. - Should read `continuously`...

timgates42

Developed 1 new feature, fixed 2 bugs

1

hi chineking, Issue #64 中讨论的功能(size设置为auto)，功能己实现。另外修复了两个BUG: 1. MQ中消息存储的中文转码错误 2. URL模式如果parser返回值，executor中会报错 ``` list(res) ``` thanks, brightgems

brightgems

任务执行完成后为什么始终不退出

5

Task类的run方法内有两个循环，最外面循环只有在stop事件出现后才出退出，为什么？ ``` def run(self): try: curr_priority = 0 while not self.stopped.is_set(): priority_name = 'inc' if curr_priority == self.n_priorities \ else curr_priority is_inc = priority_name == 'inc' while not self.nonsuspend.wait(5):...

brightgems

任务现场保存问题，任务现场保存在tmp里面，重启pc tmp会被清空

任务现场保存问题，1. 任务现场保存在tmp里面，重启pc tmp会被清空 2. 提供一个删除任务目录的接口

tottilin

分布式爬取中，worker的主备mq同步问题

分布式爬取中，worker的主备mq同步问题，目前缺少主备同步机制，也就是说在主正常运行过程中，备一直接收url或者bundle放在mq中，一旦主worker挂了，备又会把主之前跑过的url或者bundle执行一遍。这样会比较耗时，可以做一个定时同步机制，这样备中的数据不会有很多冗余

tottilin

看了下，和上一个issues的log是一样的，应该是mq没有保护好的问题把

Exception in thread Thread-2: Traceback (most recent call last): File "/usr/local/lib/python2.7/threading.py", line 551, in **bootstrap_inner self.run() File "/usr/local/lib/python2.7/threading.py", line 504, in run self.__target(_self.__args, *_self.__kwargs) File "/usr/crawl/code/cola-code/cola/core/mq/__init**.py", line 103, in _init_process...

tottilin

抓取网页出现HTTP ERROR处理问题

1.http error 404 没有丢弃url 2.其他错误在爬虫执行完后，继续尝试，但不能无限次尝试（有的时候会出现爬虫任务根本停不下来）

tottilin

question

在parser中获取网页html信息时卡住出不来

之前是抓取pinterest网站的数据，大概重现需要五个小时，所以在您的帮助下，开启一个daemon线程去openUrl，到期报错来打断（使其不被卡住）。现在在抓取tumblr，发现很容易出现这种情况几分钟就出来一次

tottilin

question

在CentOS 6中无法运行

我用的是[IUS](https://iuscommunity.org/)的库安装的Python 2.7（因为这个库提供pip）。运行Master或者Worker但不提供IP时无法启动： ``` [root@localhost ~]# coca master -s unknown command options ``` 附上IP后正常启动 ``` [root@localhost ~]# coca master -s 地址 start master at: 地址:11103 ``` 但是就算是附上IP也无法关闭： ``` [root@localhost ~]#...

fengkaijia

遇到执行weibosearch的时候包不存在包问题

1

具体报错如下： Traceback (most recent call last): File "**init**.py", line 62, in from cola.worker.loader import load_job ImportError: No module named worker.loader

liwei123o0

cola
cola copied to clipboard

Metadata

docs: Fix a few typos

Developed 1 new feature, fixed 2 bugs

任务执行完成后为什么始终不退出

任务现场保存问题，任务现场保存在tmp里面，重启pc tmp会被清空

分布式爬取中，worker的主备mq同步问题

看了下，和上一个issues的log是一样的，应该是mq没有保护好的问题把

抓取网页出现HTTP ERROR处理问题

在parser中获取网页html信息时卡住出不来

在CentOS 6中无法运行

遇到执行weibosearch的时候包不存在包问题

← Metadata

Owner

Metadata

cola cola copied to clipboard

Metadata

← Metadata

Owner

Metadata

cola
cola copied to clipboard