PutPipelineRequest Json Parser ignores attachment processor
Java API client version
9.1.5
Java version
21
Elasticsearch Version
9.1.5
Problem description
We are trying to create a pipeline with two processors from a JSON file. The request returns successful, but only the "remove" processor is added. The "attachment" processor is ignored completely. It works fine with the same JSON via cURL.
JSON:
{
"description": "Extract attachment information",
"processors": [
{
"attachment": {
"target_field": "fileData.attachment",
"field": "fileData.data"
},
"remove": {
"field": "fileData.data"
}
}
]
}
Java API call:
//is is InputStream from file above
PutPipelineRequest pr = PutPipelineRequest.of(b -> b.id("attachment").withJson(is));
this.client.ingest().putPipeline(pr);
GET http://localhost:9200/_ingest/pipeline/attachment
{
"attachment": {
"description": "Extract attachment information",
"processors": [
{
"remove": {
"field": [
"fileData.data"
]
}
}
]
}
}
Hello! The processors field should contain a list of processors, while the json posted represents a list containing a single object, containing two fields, which the java client does not recognize. This is the correct representation, which is deserialized correctly by the client :D
{
"description": "Extract attachment information",
"processors": [
{
"attachment": {
"target_field": "fileData.attachment",
"field": "fileData.data"
}
},
{
"remove": {
"field": "fileData.data"
}
}
]
}
And of course the safest way to be sure that requests are correct is to use the Java DSL:
PutPipelineRequest dslPr = PutPipelineRequest.of(b -> b
.id("attachment")
.processors(List.of(
Processor.of(p -> p
.attachment(a -> a
.targetField("fileData.attachment")
.field("fileData.data")
)
),
Processor.of(p -> p
.remove(r -> r
.field("fileData.data")
)
)
))
);
What I'm not sure is whether the elasticsearch server accepts the syntax you posted as an alternative, and is just lenient with parsing. I'll keep this open until I've verified this :)
OMG, thank you so much, I couldn't see the forest for the trees...
But yes, the server is accepting it. It also returns the processors as one object, but the processing seems to work correctly.
GET http://localhost:9200/_ingest/pipeline/attachment
{
"attachment": {
"description": "Extract attachment information",
"processors": [
{
"attachment": {
"target_field": "fileData.attachment",
"field": "fileData.data"
},
"remove": {
"field": "fileData.data"
}
}
]
}
}
Ah that confirms it! Unfortunately for strongly typed clients such as the java one this double syntax is pretty complex to support, but it should be documented somewhere that the list one is the only syntax supported by clients