fscrawler icon indicating copy to clipboard operation
fscrawler copied to clipboard

FScrawler 2.10 dose not update index when more than one elatic search specified.

Open ashmidt opened this issue 3 years ago • 4 comments

Describe the bug

When more than one elastic search is added to the config file fs crawler generates an error see logs. Indexing very slow

Job Settings

name: "eng_drawings"
fs:
  url: "E:/Eng Drawings"
  update_rate: "10m"
#  includes:
#  - "*/*.pdf"
  excludes:
  - "*/~*"
  - "/PDF"
  - "/_UNKNOWN"
  json_support: false
  filename_as_id: false
  add_filesize: true
  remove_deleted: true
  add_as_inner_object: false
  store_source: false
  index_content: false
  attributes_support: false
  raw_metadata: false
  xml_support: false
  index_folders: false
  lang_detect: false
  continue_on_error: false
#  ocr:
#    language: "eng"
#    enabled: false
#    pdf_strategy: "ocr_and_text"
  follow_symlinks: false
elasticsearch:
  nodes:
  - url: "http://172.17.2.6:9200"
  - url: "http://172.17.2.94:9200"
  username: "elastic"
  password: "Password"
  bulk_size: 50
  flush_interval: "10s"
  byte_size: "5mb"
  ssl_verification: false

Logs

11:10:47,946 DEBUG [f.p.e.c.f.c.ElasticsearchClient] More than one node is available so we pick node number 0 from [http://172.17.2.6:9200, http://172.17.2.94:9200].
11:10:48,134 DEBUG [f.p.e.c.f.f.b.FsCrawlerSimpleBulkProcessorListener] Going to execute new bulk composed of 15 actions
11:10:48,134 DEBUG [f.p.e.c.f.c.ElasticsearchEngine] Sending a bulk request of [15] documents to the Elasticsearch service
11:10:48,134 DEBUG [f.p.e.c.f.c.ElasticsearchClient] bulk a ndjson of 8106 characters
11:10:48,134 DEBUG [f.p.e.c.f.c.ElasticsearchClient] More than one node is available so we pick node number 1 from [http://172.17.2.6:9200, http://172.17.2.94:9200].
11:10:48,181 DEBUG [f.p.e.c.f.f.b.FsCrawlerSimpleBulkProcessorListener] Executed bulk composed of 15 actions
11:10:48,728 WARN  [f.p.e.c.f.s.FsCrawlerManagementServiceElasticsearchImpl] Can't find stored field name to check existing filenames in path [E:/Eng Drawings/002001_004000]. Please set store: true on field [file.filename]
11:10:48,728 WARN  [f.p.e.c.f.FsParserAbstract] Error while crawling E:/Eng Drawings: Mapping is incorrect: please set stored: true on field [file.filename].
11:10:48,728 WARN  [f.p.e.c.f.FsParserAbstract] Full stacktrace
java.lang.RuntimeException: Mapping is incorrect: please set stored: true on field [file.filename].
	at fr.pilato.elasticsearch.crawler.fs.service.FsCrawlerManagementServiceElasticsearchImpl.getFileDirectory(FsCrawlerManagementServiceElasticsearchImpl.java:107) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
	at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.getFileDirectory(FsParserAbstract.java:362) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
	at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:322) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
	at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:304) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
	at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstract.java:152) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
	at java.lang.Thread.run(Thread.java:832) ~[?:?]

Expected behavior

A clear and concise description of what you expected to happen.

Versions:

  • OS: Windows Server 2012 R
  • Version [e.g. 2.5]

Attachment

If the bug is related to a given file, please share this file so we can reuse it in tests to reproduce the problem and may be use it in our integration tests.

ashmidt avatar Jul 27 '22 16:07 ashmidt

The problem is here:

Mapping is incorrect: please set stored: true on field [file.filename].

you need to remove the 2 indices, and restart from scratch I think.

dadoonet avatar Jul 27 '22 19:07 dadoonet

it is set to true. I am not sure I understand removing two indices. The document said I could provide multiple elastic search nodes.

ashmidt avatar Jul 29 '22 01:07 ashmidt

Can you run:

GET /eng_drawings*/_mapping

And share the result?

dadoonet avatar Jul 29 '22 08:07 dadoonet

The document said I could provide multiple elastic search nodes.

Yes. The problem is not related to this. Unless the nodes don't belong to the same cluster.

dadoonet avatar Jul 29 '22 08:07 dadoonet

No answer on this one. Please feel free to open a new discussion about this if you are still hitting this problem.

dadoonet avatar Aug 22 '22 15:08 dadoonet