ck issues

Results 26 issues of

ck

用“raw”模式 post 成功后，返回内容在“内容”页签没有输出，但切换为“x-www-form-urlencoded”时会重放打印“raw”的响应内容

**1. What went wrong `(哪里出错)`?** 用“raw”模式 post 成功后，返回内容在“内容”页签没有输出，但切换为“x-www-form-urlencoded”时会重放打印“raw”的响应内容 **2. Steps to reproduce the problem `(重现步骤)`:** raw发送 ![image](https://user-images.githubusercontent.com/21599896/85496186-81b2d680-b60e-11ea-8b31-63a03ef19870.png) 切换x-www-form-urlencoded ![image](https://user-images.githubusercontent.com/21599896/85496265-9abb8780-b60e-11ea-9387-c77de11ccd36.png) **3. What is the expected behavior `(期望表现)`?** 希望raw能正常显示“内容”页签内容的影响内容 **4. Did this...

ww

百度百科查询结果url的异常域名拼接+新闻搜索的提取逻辑更新。

首先，感谢你来提交 PR！🎉🎉🎉 请你填写下面的信息，然后提交该请求。 **你在该请求中做了些什么？** 修复 Bug **你的代码的缺点（可能出现的bug）** 百度百科查询结果url的异常域名拼接 **对请求的详细描述** 百度百科查询结果url的异常域名拼接，按拼接意图分析目的应该是给缺失域名信息的url前拼接百科域名。

V2 father

自动同步问题

服务启动时的全量同步有调用e_pipeline，但运行期间mongo数据修改后仅仅是同步了指定的m_collectionname，没有执行e_pipeline。 PS。我在自定义e_pipeline里对m_collectionname里的指定属性做了多个拷贝并正则替换成新文档，然而在自动同步过程中这些字段都没被更新 bulkDataAndPip 里的日志： --启动时的bulk ``` [ { "index":{ "_index":"corpus", "_type":"contents", "_id":"ImQs6IdHp" } }, { "title":"doc2019-03-24-2", "comments":"11111" } ] ``` --更新时的bulk ``` [ { "update":{ "_index":"corpus", "_type":"contents", "_id":"ImQs6IdHp" } },...

AutoMLSearchException: All pipelines in the current AutoML batch produced a score of np.nan on the primary objective

I just put the problem_ Type="binary" becomes "multiclass" #### ***************************** * Beginning pipeline search * ***************************** Optimizing for Log Loss Multiclass. Lower score is better. Using SequentialEngine to train and...

bug

How to call the feature processing code of the trained model to process the input in production？

In production, it is necessary to obtain a feature processing method consistent with the trained model

爬虫统计数据不对

**Bug 描述** 1、当我删除了爬虫下的任务记录后，回到爬虫列表，爬虫列表里的任务统计数据还是原来的任务数据量。 2、当我开启数据去重功能后，查看爬虫数据条目列表的总数和爬虫列表里的任务统计数据的数据总数也对不上。统计数据的数据依然把被去重的数据给加进去了。 **复现步骤** 该 Bug 复现步骤如下 1. 随机启动多次任务，然后删除任务 2. 启动去重功能，然后启动任务重复爬取，实验去重功能 3. 返回项目下的爬虫列表，查看“统计数据”，进入爬虫明细，查看对应的数据。 **截屏** ![image](https://github.com/crawlab-team/crawlab/assets/21599896/7175c232-ea76-4441-815c-e0d6bbaf5610) ![image](https://github.com/crawlab-team/crawlab/assets/21599896/41c1a9b7-f5c5-49b7-ba7c-b55fb56d6fdb) ![image](https://github.com/crawlab-team/crawlab/assets/21599896/37ad8e19-c77a-4516-8231-edc02b138bbd)

bug

results

创建项目级数据库，项目下爬虫共享该库。

**请描述该需求尝试解决的问题** 项目管理目前这是起到了个爬虫的分类管理，在功能上其实啥都没有。希望同一个项目下的多个爬虫都共享同一个数据库表。具备项目级的夸爬虫的去重功能。现在我在爬10个新闻站点的新闻，各爬各的最后还得导出数据合并一起合并 **请描述您认为可行的解决方案** 例如，创建统一的项目级items.py，项目下的爬虫统统都得继承这个类，这样就能统一了项目下的多个爬虫的输出。

enhancement

spider

[CU-86934hnmg] How to remove ’related‘ from the returned result

/api/comments/api::article.article:1209/flat I am searching for comment results under the article by Ids, but each result has a 'related' attribute. Is it necessary to constantly replay the article object?

enhancement

feature request

backlog

in progress