FScrawler 2.10 dose not update index when more than one elatic search specified.
Describe the bug
When more than one elastic search is added to the config file fs crawler generates an error see logs. Indexing very slow
Job Settings
name: "eng_drawings"
fs:
url: "E:/Eng Drawings"
update_rate: "10m"
# includes:
# - "*/*.pdf"
excludes:
- "*/~*"
- "/PDF"
- "/_UNKNOWN"
json_support: false
filename_as_id: false
add_filesize: true
remove_deleted: true
add_as_inner_object: false
store_source: false
index_content: false
attributes_support: false
raw_metadata: false
xml_support: false
index_folders: false
lang_detect: false
continue_on_error: false
# ocr:
# language: "eng"
# enabled: false
# pdf_strategy: "ocr_and_text"
follow_symlinks: false
elasticsearch:
nodes:
- url: "http://172.17.2.6:9200"
- url: "http://172.17.2.94:9200"
username: "elastic"
password: "Password"
bulk_size: 50
flush_interval: "10s"
byte_size: "5mb"
ssl_verification: false
Logs
11:10:47,946 DEBUG [f.p.e.c.f.c.ElasticsearchClient] More than one node is available so we pick node number 0 from [http://172.17.2.6:9200, http://172.17.2.94:9200].
11:10:48,134 DEBUG [f.p.e.c.f.f.b.FsCrawlerSimpleBulkProcessorListener] Going to execute new bulk composed of 15 actions
11:10:48,134 DEBUG [f.p.e.c.f.c.ElasticsearchEngine] Sending a bulk request of [15] documents to the Elasticsearch service
11:10:48,134 DEBUG [f.p.e.c.f.c.ElasticsearchClient] bulk a ndjson of 8106 characters
11:10:48,134 DEBUG [f.p.e.c.f.c.ElasticsearchClient] More than one node is available so we pick node number 1 from [http://172.17.2.6:9200, http://172.17.2.94:9200].
11:10:48,181 DEBUG [f.p.e.c.f.f.b.FsCrawlerSimpleBulkProcessorListener] Executed bulk composed of 15 actions
11:10:48,728 WARN [f.p.e.c.f.s.FsCrawlerManagementServiceElasticsearchImpl] Can't find stored field name to check existing filenames in path [E:/Eng Drawings/002001_004000]. Please set store: true on field [file.filename]
11:10:48,728 WARN [f.p.e.c.f.FsParserAbstract] Error while crawling E:/Eng Drawings: Mapping is incorrect: please set stored: true on field [file.filename].
11:10:48,728 WARN [f.p.e.c.f.FsParserAbstract] Full stacktrace
java.lang.RuntimeException: Mapping is incorrect: please set stored: true on field [file.filename].
at fr.pilato.elasticsearch.crawler.fs.service.FsCrawlerManagementServiceElasticsearchImpl.getFileDirectory(FsCrawlerManagementServiceElasticsearchImpl.java:107) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.getFileDirectory(FsParserAbstract.java:362) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:322) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:304) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstract.java:152) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
at java.lang.Thread.run(Thread.java:832) ~[?:?]
Expected behavior
A clear and concise description of what you expected to happen.
Versions:
- OS: Windows Server 2012 R
- Version [e.g. 2.5]
Attachment
If the bug is related to a given file, please share this file so we can reuse it in tests to reproduce the problem and may be use it in our integration tests.
The problem is here:
Mapping is incorrect: please set stored: true on field [file.filename].
you need to remove the 2 indices, and restart from scratch I think.
it is set to true. I am not sure I understand removing two indices. The document said I could provide multiple elastic search nodes.
Can you run:
GET /eng_drawings*/_mapping
And share the result?
The document said I could provide multiple elastic search nodes.
Yes. The problem is not related to this. Unless the nodes don't belong to the same cluster.
No answer on this one. Please feel free to open a new discussion about this if you are still hitting this problem.