importer
importer copied to clipboard
Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows...
Bumps [org.apache.tika:tika-core](https://github.com/apache/tika) from 1.27 to 1.28.3. Changelog Sourced from org.apache.tika:tika-core's changelog. Release 1.28.3 - 5/23/2022 General dependency upgrades (TIKA-3770). Release 1.28.2 - 4/26/2022 General dependency upgrades (TIKA-3688). Upgrade to PDFBox...
 I already install all the importer classes by downloading them but still, it shows the error. If someone has any idea then please help asap Note: I am using...
I mistakenly posted an issue on Collector about this problem; turns out that Collector is pulling in Importer as a transitive dependency which in turn pulls Tika 1.27; My application...
hello Pascal, I'd like to use several methods (e.g. `csv` and `regex`) in the `KeepOnlyTagger`, but it seems, only one `fieldMatcher` is allowed: ```xml crawl_date,type,content,collector.depth,document.language (thumbnailImage|imagePHash).* ``` Error: ``` 1...
This is the web page that I want to extract text from http://www.jornada.unam.mx/2018/02/21/politica/005n1pol If I use the `complex-config.xml` file from the examples, I get all content of the web page....
I created an OpenSearch domain on AWS inside a VPC. To read from or write data to the domain from my laptop, I have to run SecureCRT, create a session...
JDK 17 deprecates the Nashorn JS engine. My colleague Tracy, who is actively using the product, is not using the script tagger as she is more of a Java programmer...
I set up a domain named "mysearch" on AWS OpenSearch. Its end point is https://search-mysearch-abcd1234.us-east-1.es.amazonaws.com/ and it's set to be public accessible. I installed Norconex http collector 3.0.0 and ElasticSearch...
I have a field such as the one below: ``` /ip/Bolthouse-Farms-Organics-Premium-Matchstix-Julienne-Carrots-10-oz/44933639 ``` With the configuration below, I expect "44933639" to be written back to the same field. Instead, the original...
hello Pascal, one quick question: do you plan to develop an external API tagger / transformer? Similar to the existing ones, but no starting an executable, but calling an external...