bitJoy comments

Results 19 comments of


                                            bitJoy

大家好，搜狐那个新闻很久不更新了，我换了[中国新闻网](http://www.chinanews.com/scroll-news/news1.html)，并且将搜索引擎部署到线上了：[http://news.bitjoy.net/](http://news.bitjoy.net/)，感兴趣的可以看看我的[新博文](https://bitjoy.net/2020/04/05/introduction-to-building-a-search-engine-8/)。

/code/setup.py 更新之后搜索不到更新的

对，“推荐阅读”部分确实写得不够好，我直接调用了sklearn的pairwise_distances函数计算两两相似度，但是因为文档词项矩阵是一个稀疏矩阵，数据量大之后很容易爆内存。解决方法有两个： 1. 不用pairwise_distances，而是手动计算两两相似度，对于每一条新闻，计算它和其他新闻的相似度，维护一个top-k的堆保留相似度排名前k的新闻。因为每条新闻是独立计算的，所以内存峰值会很低。 2. 更好一点的做法是用深度学习计算句子的稠密向量表示，以解决稀疏矩阵的问题，但是要用深度学习对硬件要求很高啊。

/code/setup.py 更新之后搜索不到更新的

AVG_L算的是所有新闻的平均长度，用在BM25打分中做归一化的，具体可以看[这篇博客](https://bitjoy.net/2016/01/07/introduction-to-building-a-search-engine-4/)。说实话我对数据库这方面不是很了解，mongodb是专门针对文本存储的数据库吗，相比于sqlite有什么优势？另外，如何高效存储倒排项也是一个值得思考的问题，我这里直接把文档列表序列化存储到一起肯定不是最优的方案，不知道改用mongodb会不会有更好的解决方案。

请问一下项目介绍的网址down掉了吗？

不好意思，网站被墙了，需要翻墙访问。。。

请问一下项目介绍的网址down掉了吗？

现在可以墙内访问了：https://bitjoy.net/category/0%e5%92%8c1/%e5%92%8c%e6%88%91%e4%b8%80%e8%b5%b7%e6%9e%84%e5%bb%ba%e6%90%9c%e7%b4%a2%e5%bc%95%e6%93%8e/

jreadability-1.3.jar — 1.3 release of JReadability Does not Work

@smecsia I am facing the same problem:`Sorry, readability was unable to parse this page for content.` Where is 1.4-SNAPSHOT version's jar. I can't find it in [downloads section](https://github.com/wuman/JReadability/downloads).

jreadability-1.3.jar — 1.3 release of JReadability Does not Work

@smecsia Thank you, I have chosen [`jsoup`](http://jsoup.org/), but i will try what you said.

endless `onMoreAsked` callback?

@oneenam Hi, I solved my problem by using [android-pull-to-refresh](https://github.com/naver/android-pull-to-refresh) instead. You can try it.

pLINK 2 is stucked and I cannot go on with my analyses.

Please check whether the file exists: C:\Users\ieo4635\Desktop\PCGF3\pLink_task_2020.03.27.12.03.42\QEPlus_200307_ST_DP_P3_XL_3ul_01_HCDFT.pf2. If not, please install MSFileReader according to this wiki: https://github.com/pFindStudio/pLink2/wiki/FAQ#how-to-install-msfilereader