Yu Pan

Results 7 comments of Yu Pan

前面的错误解决了,是因为yum安装的go/thrift版本太低。不过还是build错误 src/trade_server/trade_server.go:33: cannot use handler (type *TradeServiceHandler) as type trade_service.TradeService in argument to trade_service.NewTradeServiceProcessor: *TradeServiceHandler does not implement trade_service.TradeService (wrong type for Buy method) have Buy(*trade_service.Trade) error want Buy(context.Context, *trade_service.Trade)...

submit code again: - move UrlExecutor.retury_error_times to Url.error_times - auto close job if idle for 5 times and auto budget enabled - fix lint style

我在执行weibo数据抓取时需要知道每次执行需要多少时间,因此当所有的URL抓取完毕时希望马上看到任务结束。 这里我在branch的版本上增加了一个功能,允许配置文件里的job.size设置成auto,这时budgets值正好等于需要抓取的URL/BUNDLE任务数量,并且在URL处理结束后将调用budget_client.finish;如果有新的URL产生时调用inc_budget增加budgets。在单机和分布模式下测试通过 请教大神上面的思路是否有错

和counter_client打印信息的目的不太一样. 在没有auto size之前,如果size值设置大了任务总是无法结束;如果设置小了又会出现no budget的日志导致不能处理完所有的URL. size设置成auto后,任务可以很快结束,这样能显现分步式爬虫的性能优势 ``` All objects have been fetched, try to finish job Counters during running: {'finishes': 4, 'pages': 5, 'processed_weibo_list_page': 5, 'secs': 42.1710000038147} Processing shutting down Shutdown...

In my knowledge, the best way to install on windows is disable compile variance_reduction. Current version of setup.py allow to achieve this purpose by set env LIBACT_BUILD_VARIANCE_REDUCTION=0.

我也遇到了,英文分词有明显错误,无解