jetnet

Results 10 issues of jetnet

it is possible to configure a log-file per crawler as it worked in v.2? I tried the following config, but `sd:type` does not get resolved. Thanks ``` %d{HH:mm:ss.SSS} [%t] %-5level...

feature-request
resolved

we need to crawl many Internet sites and encountered an issue with `www` prefix: some sites redirect to their domains without `www`, some other way round. Unfortunately, such case cannot...

feature-request

## Describe the bug **Command Name** `az aks command invoke -n $AKS_NAME -c "kubectl cluster-info"` **Errors:** ``` (KubernetesOperationError) Failed to run command due to cluster perf issue, container command-0be71db980254f398cdecce07419fbed in...

Service Attention
AKS
customer-reported
Service
needs-team-attention
feature-request
Auto-Assign

hello Pascal, I'd like to use several methods (e.g. `csv` and `regex`) in the `KeepOnlyTagger`, but it seems, only one `fieldMatcher` is allowed: ```xml crawl_date,type,content,collector.depth,document.language (thumbnailImage|imagePHash).* ``` Error: ``` 1...

feature-request

hello Pascal, one quick question: do you plan to develop an external API tagger / transformer? Similar to the existing ones, but no starting an executable, but calling an external...

Hello Pascal, is it possible to configure the Document Parser to apply the OCR processing for images from a given size / dimention? There are some metadata that could be...

feature-request

It would be great if there would be examples in the documentation how to set various web-driver capabilities, in particular: * proxy with and without auth * user agent *...

### What is the issue? * Podman container start: ```bash podman run -d \ --device nvidia.com/gpu=all \ --memory=100g \ -v ollama:$HOME/.ollama \ -v /local_path/ollama/models:/models \ -p 11434:11434 \ -e OLLAMA_MODELS=/models...

bug

I'd like to disable OCR Tesseract for images. Norconex v.3.1 Rendered config for `documentParsedFactory`: ```xml deu,eng DISABLED_image/(jpeg|png|gif) ``` does not prevent `tesseract` being started, e.g.: ``` └─java -Dlog4j2.configurationFile=file:/storage/norconex/etc/test/log4j2.xml -Xms4G -Xmx16G...

Requirements: - Download text content and images from the "main" site - Download images from the main site and from external sites, which are referenced on the "main" site Need...