importer icon indicating copy to clipboard operation
importer copied to clipboard

Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows...

Results 15 importer issues
Sort by recently updated
recently updated
newest added

Bumps [org.apache.tika:tika-core](https://github.com/apache/tika) from 1.27 to 1.28.3. Changelog Sourced from org.apache.tika:tika-core's changelog. Release 1.28.3 - 5/23/2022 General dependency upgrades (TIKA-3770). Release 1.28.2 - 4/26/2022 General dependency upgrades (TIKA-3688). Upgrade to PDFBox...

dependencies

![image](https://user-images.githubusercontent.com/109271998/218655516-359e10f6-26cc-4a84-a5d5-28f7ae3672f8.png) I already install all the importer classes by downloading them but still, it shows the error. If someone has any idea then please help asap Note: I am using...

I mistakenly posted an issue on Collector about this problem; turns out that Collector is pulling in Importer as a transitive dependency which in turn pulls Tika 1.27; My application...

feature-request

hello Pascal, I'd like to use several methods (e.g. `csv` and `regex`) in the `KeepOnlyTagger`, but it seems, only one `fieldMatcher` is allowed: ```xml crawl_date,type,content,collector.depth,document.language (thumbnailImage|imagePHash).* ``` Error: ``` 1...

feature-request

This is the web page that I want to extract text from http://www.jornada.unam.mx/2018/02/21/politica/005n1pol If I use the `complex-config.xml` file from the examples, I get all content of the web page....

resolved
feature-request

I created an OpenSearch domain on AWS inside a VPC. To read from or write data to the domain from my laptop, I have to run SecureCRT, create a session...

JDK 17 deprecates the Nashorn JS engine. My colleague Tracy, who is actively using the product, is not using the script tagger as she is more of a Java programmer...

question

I set up a domain named "mysearch" on AWS OpenSearch. Its end point is https://search-mysearch-abcd1234.us-east-1.es.amazonaws.com/ and it's set to be public accessible. I installed Norconex http collector 3.0.0 and ElasticSearch...

feature-request

I have a field such as the one below: ``` /ip/Bolthouse-Farms-Organics-Premium-Matchstix-Julienne-Carrots-10-oz/44933639 ``` With the configuration below, I expect "44933639" to be written back to the same field. Instead, the original...

hello Pascal, one quick question: do you plan to develop an external API tagger / transformer? Similar to the existing ones, but no starting an executable, but calling an external...