[BUG] Unexpected schema validation error comes from model interface
What is the bug? When predict a ml model, even if the connector payload is filled, I still receives the error from model interface.
{
"error": {
"root_cause": [
{
"type": "status_exception",
"reason": "Error validating input schema: Validation failed: [$.parameters: required property 'inputs' not found] for instance: {\"algorithm\":\"REMOTE\",\"parameters\":{\"prompt\":\"hello, who are you?\"},\"action_type\":\"PREDICT\"} with schema: {\n \"type\": \"object\",\n \"properties\": {\n \"parameters\": {\n \"type\": \"object\",\n \"properties\": {\n \"inputs\": {\n \"type\": \"string\"\n }\n },\n \"required\": [\n \"inputs\"\n ]\n }\n },\n \"required\": [\n \"parameters\"\n ]\n}"
}
],
"type": "status_exception",
"reason": "Error validating input schema: Validation failed: [$.parameters: required property 'inputs' not found] for instance: {\"algorithm\":\"REMOTE\",\"parameters\":{\"prompt\":\"hello, who are you?\"},\"action_type\":\"PREDICT\"} with schema: {\n \"type\": \"object\",\n \"properties\": {\n \"parameters\": {\n \"type\": \"object\",\n \"properties\": {\n \"inputs\": {\n \"type\": \"string\"\n }\n },\n \"required\": [\n \"inputs\"\n ]\n }\n },\n \"required\": [\n \"parameters\"\n ]\n}"
},
"status": 400
}
How can one reproduce the bug?
- Create a connector for claude v2 model from the connector blue print. Only change
${parameters.inputs}to${parameters.prompt}.
POST /_plugins/_ml/connectors/_create
{
"name": "Amazon Bedrock",
"description": "Test connector for Amazon Bedrock",
"version": 1,
"protocol": "aws_sigv4",
"credential": {
"access_key": "...",
"secret_key": "..."
},
"parameters": {
"region": "...",
"service_name": "bedrock",
"model": "anthropic.claude-v2"
},
"actions": [
{
"action_type": "predict",
"method": "POST",
"headers": {
"content-type": "application/json"
},
"url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/${parameters.model}/invoke",
"request_body": """{"prompt":"\n\nHuman: ${parameters.prompt}\n\nAssistant:","max_tokens_to_sample":300,"temperature":0.5,"top_k":250,"top_p":1,"stop_sequences":["\\n\\nHuman:"]}"""
}
]
}
- Register model with connector
POST /_plugins/_ml/models/_register
{
"name": "anthropic.claude-v2",
"function_name": "remote",
"description": "test model",
"connector_id": "6uQTs5IB3hywALe5wWxu"
}
- Run model prediction only with
inputsparameter. You'll receive connector payload validation error sincepromptparameter is needed. This is correct.
POST /_plugins/_ml/models/IY9qt5IB1dQFBoCTo8nv/_predict/
{
"parameters": {
"inputs": "hello, who are you?"
}
}
- Run model prediction only with
promptparameter. You'll receive model interface error.
POST /_plugins/_ml/models/IY9qt5IB1dQFBoCTo8nv/_predict/
{
"parameters": {
"prompt": "hello, who are you?"
}
}
What is the expected behavior?
I should receive the model response as long as the connector payload is filled. I also find it quite weird that the bug won't be reproduced if I omit step 3.
What is your host/environment?
- OS: MacOS
- Version main
- Plugins ml-commons
Do you have any screenshots? If applicable, add screenshots to help explain your problem.
Do you have any additional context? Add any other context about the problem.
Hi @b4sjoo ! I understand you are quite busy in these days. Can you take a look at this issue when free? No need to hurry.
The expected behavior of the model interface feature should remain consistent no matter how the user predicts the ml model.
I am sorry but this is expected...The automated model interface only work when you strictly follow blueprint. If you changed something then you need update the model interface accordingly. In the future we will improve this, but at this time it still works in this way.
@yuye-aws Curious, what happens when you omit step 3? Is the request with prompt successful?
@yuye-aws Curious, what happens when you omit step 3? Is the request with
promptsuccessful?
Yes. It's successful.
@yuye-aws looks like this issue is resolved. I am going to close the issue, feel free to reopen it. @b4sjoo please raise new issue to fix this.
@yuye-aws looks like this issue is resolved. I am going to close the issue, feel free to reopen it. @b4sjoo please raise new issue to fix this.
Is there a PR fixing this issue?
@mingshl I found that I can still reproduce this issue with the 4 steps in the description, so I have to reopen this issue.
I also found this issue impacts the text embedding processor. The model interface blocks text embedding with the following steps:
- Creates a titan embedding connector following blueprints
POST /_plugins/_ml/connectors/_create
{
"name": "Amazon Bedrock Connector: embedding",
"description": "The connector to bedrock Titan embedding model",
"version": 1,
"protocol": "aws_sigv4",
"parameters": {
"region": "us-west-2",
"service_name": "bedrock",
"model": "amazon.titan-embed-text-v1"
},
"credential": {
"access_key": "...",
"secret_key": "..."
},
"actions": [
{
"action_type": "predict",
"method": "POST",
"url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/${parameters.model}/invoke",
"headers": {
"content-type": "application/json",
"x-amz-content-sha256": "required"
},
"request_body": "{ \"inputText\": \"${parameters.inputText}\" }",
"pre_process_function": "connector.pre_process.bedrock.embedding",
"post_process_function": "connector.post_process.bedrock.embedding"
}
]
}
- Creates a model from the connector
POST /_plugins/_ml/models/_register
{
"name": "anthropic.claude-v2",
"function_name": "remote",
"description": "test model",
"connector_id": "LhNR_5IB5-TXIyuBmqPe"
}
- Test the embedding model. This step should be ok.
POST /_plugins/_ml/models/MRNT_5IB5-TXIyuBQqOe/_predict/
{
"parameters": {
"inputText": "hello, who are you?"
}
}
- Create an ingest pipeline with text embedding processor.
PUT /_ingest/pipeline/nlp-ingest-pipeline
{
"description": "A text embedding pipeline",
"processors": [
{
"text_embedding": {
"model_id": "MRNT_5IB5-TXIyuBQqOe",
"field_map": {
"passage_text": "passage_embedding"
}
}
}
]
}
- Test the ingest pipeline. And then you will receive the error
Error validating input schema
POST _ingest/pipeline/nlp-ingest-pipeline/_simulate
{
"docs": [
{
"_index": "testindex1",
"_id": "1",
"_source":{
"passage_text": "hello world"
}
}
]
}
I'm currently testing out another bug and I am facing this same issue with the text embedding processor. It fails on v1 with ingest pipeline but works with v2.
"cause": {
"type": "status_exception",
"reason": """Error validating input schema: Validation failed: [$: required property 'parameters' not found] for instance: {"algorithm":"REMOTE","text_docs":["Abraham Lincoln was born on February 12, 1809, the second child of Thomas Lincoln and Nancy Hanks Lincoln, in a log cabin on Sinking Spring Farm near Hodgenville, Kentucky."],"return_bytes":false,"return_number":true,"target_response":["sentence_embedding"]} with schema: {
"type": "object",
"properties": {
"parameters": {
"type": "object",
"properties": {
"inputText": {
"type": "string"
}
},
"required": [
"inputText"
]
}
},
"required": [
"parameters"
]
}"""
},
"status": 400
}
I've tried to modify the input text field to be inputText, but still gets failed
PUT /_ingest/pipeline/nlp-ingest-pipeline
{
"description": "A text embedding pipeline",
"processors": [
{
"text_embedding": {
"model_id": "MRNT_5IB5-TXIyuBQqOe",
"field_map": {
"inputText": "inputText_embedding"
}
}
}
]
}
POST _ingest/pipeline/nlp-ingest-pipeline/_simulate
{
"docs": [
{
"_index": "testindex1",
"_id": "1",
"_source":{
"inputText": "hello world"
}
}
]
}
I'm currently testing out another bug and I am facing this same issue with the text embedding processor. It fails on v1 with ingest pipeline but works with v2.
+1. I meet with exactly the same problem just like you.
@b4sjoo Can you prioritize this issue ? Sounds a code bug
Ack, will take a look
I've figured out a workaround. For me, there's no rush to resolve this issue. Take your time @b4sjoo.
I've figured out a workaround. For me, there's no rush to resolve this issue. Take your time @b4sjoo.
Can you share if possible?
I guess it's just to update the interface field into an empty interface?
I've figured out a workaround. For me, there's no rush to resolve this issue. Take your time @b4sjoo.
Can you share if possible?
The workaround is not declaring the model in the parameters. You need to modify the connector blueprint into:
POST /_plugins/_ml/connectors/_create
{
"name": "Amazon Bedrock Connector: embedding",
"description": "The connector to bedrock Titan embedding model",
"version": 1,
"protocol": "aws_sigv4",
"parameters": {
"region": "us-west-2",
"service_name": "bedrock"
},
"credential": {
"access_key": "...",
"secret_key": "..."
},
"actions": [
{
"action_type": "predict",
"method": "POST",
"url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/amazon.titan-embed-text-v1/invoke",
"headers": {
"content-type": "application/json",
"x-amz-content-sha256": "required"
},
"request_body": "{ \"inputText\": \"${parameters.inputText}\" }",
"pre_process_function": "connector.pre_process.bedrock.embedding",
"post_process_function": "connector.post_process.bedrock.embedding"
}
]
}
I guess it's just to update the interface field into an empty interface?
Can you please elaborate more? How can an empty interface work?
[Catch All Triage - 1, 2, 3, 4]
@yuye-aws Previously missed replying this. So you can simply update the model_interface field to an empty object "model_interface": {} or "model_interface": {"input":{},"output":{}}, if previous not working. Essentially not declaring the model is same to this solution, because it will disable automated generation of model interface due to no model found.
@yuye-aws Previously missed replying this. So you can simply update the model_interface field to an empty object
"model_interface": {}or"model_interface": {"input":{},"output":{}}, if previous not working. Essentially not declaring themodelis same to this solution, because it will disable automated generation of model interface due to no model found.
I understand the workaround works, but for customers without enough context. If they are copying connector blueprints and perform text embedding, the model interface error would be super confusing to them.
Hi @b4sjoo ! Is there any update for this issue?
I can confirm the workaround works for my environments.
Could we expect this workaround to remain as a solution if a fix is implemented?
Is it worth updating the documentation to reflect this workaround?
We have this doc now https://opensearch.org/docs/latest/ml-commons-plugin/api/model-apis/update-model/
If the model interface is no longer needed, you can remove both the input and output schemas in order to bypass model schema validation:
PUT /_plugins/_ml/models/IMcNB5UB7judm8f45nXo
{
"interface": {
"input": null,
"output": null
}
}
@yuye-aws , your connector is not same with blueprint, blueprint is using parameters.inputs, not parameters.prompt.
"request_body": "{\"prompt\":\"\\n\\nHuman: ${parameters.inputs}\\n\\nAssistant:\",\"max_tokens_to_sample\":300,\"temperature\":0.5,\"top_k\":250,\"top_p\":1,\"stop_sequences\":[\"\\\\n\\\\nHuman:\"]}"
@yuye-aws , your connector is not same with blueprint, blueprint is using
parameters.inputs, notparameters.prompt."request_body": "{\"prompt\":\"\\n\\nHuman: ${parameters.inputs}\\n\\nAssistant:\",\"max_tokens_to_sample\":300,\"temperature\":0.5,\"top_k\":250,\"top_p\":1,\"stop_sequences\":[\"\\\\n\\\\nHuman:\"]}"
Thanks for pointing out. Does it also work for the text embedding processor with a user-defined field?