elasticsearch-java icon indicating copy to clipboard operation
elasticsearch-java copied to clipboard

PutPipelineRequest Json Parser ignores attachment processor

Open haarli opened this issue 3 months ago • 3 comments

Java API client version

9.1.5

Java version

21

Elasticsearch Version

9.1.5

Problem description

We are trying to create a pipeline with two processors from a JSON file. The request returns successful, but only the "remove" processor is added. The "attachment" processor is ignored completely. It works fine with the same JSON via cURL.

JSON:

{
    "description": "Extract attachment information",
    "processors": [
        {
            "attachment": {
                "target_field": "fileData.attachment",
                "field": "fileData.data"

            },
            "remove": {
                "field": "fileData.data"
            }
        }
    ]
}

Java API call:

//is is InputStream from file above
PutPipelineRequest pr = PutPipelineRequest.of(b -> b.id("attachment").withJson(is));
this.client.ingest().putPipeline(pr);

GET http://localhost:9200/_ingest/pipeline/attachment

{
    "attachment": {
        "description": "Extract attachment information",
        "processors": [
            {
                "remove": {
                    "field": [
                        "fileData.data"
                    ]
                }
            }
        ]
    }
}

haarli avatar Oct 24 '25 08:10 haarli

Hello! The processors field should contain a list of processors, while the json posted represents a list containing a single object, containing two fields, which the java client does not recognize. This is the correct representation, which is deserialized correctly by the client :D

{
  "description": "Extract attachment information",
  "processors": [
    {
      "attachment": {
        "target_field": "fileData.attachment",
        "field": "fileData.data"
      }
    },
    {
      "remove": {
        "field": "fileData.data"
      }
    }
  ]
}

And of course the safest way to be sure that requests are correct is to use the Java DSL:

        PutPipelineRequest dslPr = PutPipelineRequest.of(b -> b
            .id("attachment")
            .processors(List.of(
                Processor.of(p -> p
                    .attachment(a -> a
                        .targetField("fileData.attachment")
                        .field("fileData.data")
                    )
                ),
                Processor.of(p -> p
                    .remove(r -> r
                        .field("fileData.data")
                    )
                )
            ))
        );

What I'm not sure is whether the elasticsearch server accepts the syntax you posted as an alternative, and is just lenient with parsing. I'll keep this open until I've verified this :)

l-trotta avatar Oct 24 '25 08:10 l-trotta

OMG, thank you so much, I couldn't see the forest for the trees...

But yes, the server is accepting it. It also returns the processors as one object, but the processing seems to work correctly.

GET http://localhost:9200/_ingest/pipeline/attachment

{
    "attachment": {
        "description": "Extract attachment information",
        "processors": [
            {
                "attachment": {
                    "target_field": "fileData.attachment",
                    "field": "fileData.data"
                },
                "remove": {
                    "field": "fileData.data"
                }
            }
        ]
    }
}

haarli avatar Oct 24 '25 08:10 haarli

Ah that confirms it! Unfortunately for strongly typed clients such as the java one this double syntax is pretty complex to support, but it should be documented somewhere that the list one is the only syntax supported by clients

l-trotta avatar Oct 24 '25 08:10 l-trotta