openapi-python-client icon indicating copy to clipboard operation
openapi-python-client copied to clipboard

fix: Fix multipart body file array

Open micha91 opened this issue 2 years ago โ€ข 3 comments

As described in #692 arrays of files are not handled correctly, if they are part of multipart/form-data. This is fixed in this PR by letting to_multipart return a List[Tuple[str, Any]] instead of a Dict[str, Any].

micha91 avatar Jan 13 '24 09:01 micha91

Thanks for adding this support! I believe this is a breaking change, since it's possible someone was using the JSON serialization behavior before (even though it seems wrong in general). I'll need to do some manual testing just to make sure I know how to describe the breaking change ๐Ÿ˜…. I'd also love to get someone to test a real running API and verify it all functions (maybe you've already done this?)

dbanty avatar Jan 15 '24 00:01 dbanty

Yes, I already use the updated model template as a custom template in a project. Unfortunately this project is not public yet and the API it is developed for, isn't publicly accessible, too

micha91 avatar Jan 15 '24 06:01 micha91

Hi, just tested this out with an api I am working on. It works great.

ratgen avatar Feb 15 '24 12:02 ratgen

@dbanty @micha91 Thank you so much for providing the openapi-python-client package and this PR!

I tested it in the following way: pipx install git+https://github.com/openapi-generators/openapi-python-client.git@refs/pull/938/head

openapi-python-client update --path ../../swagger.json --config openapi-python-generator-config.yml

    "/upload": {
      "post": {
        "tags": [
          "Microsoft.KernelMemory.ServiceAssembly"
        ],
        "summary": "Upload a new document to the knowledge base",
        "description": "Upload a document consisting of one or more files to extract memories from. The extraction process happens asynchronously. If a document with the same ID already exists, it will be overwritten and the memories previously extracted will be updated.",
        "operationId": "UploadDocument",
        "requestBody": {
          "description": "Document to upload and extract memories from",
          "content": {
            "multipart/form-data": {
              "schema": {
                "type": "object",
                "properties": {
                  "index": {
                    "type": "string",
                    "description": "Name of the index where to store memories generated by the files."
                  },
                  "documentId": {
                    "type": "string",
                    "description": "Unique ID used for import pipeline and document ID."
                  },
                  "tags": {
                    "type": "object",
                    "additionalProperties": {
                      "type": "string"
                    },
                    "description": "Tags to apply to the memories extracted from the files."
                  },
                  "steps": {
                    "type": "array",
                    "items": {
                      "type": "string"
                    },
                    "description": "How to process the files, e.g. how to extract/chunk etc."
                  },
                  "files": {
                    "type": "array",
                    "items": {
                      "type": "string",
                      "format": "binary"
                    },
                    "description": "Files to process and extract memories from."
                  }
                }
              }
            }
          }
        },

In general it works! I would greatly appreciate if we could have similar functionality to solve https://github.com/openapi-generators/openapi-python-client/issues/692 in the main branch, soon.

@dbanty What do you think about an alternative where a user could just overwrite how a

"files": {
      "type": "array",
      "items": {
        "type": "string",
        "format": "binary"
      },

is handled?

This could either generate:

        files: Union[Unset, tuple[None, bytes, str]] = UNSET
        if not isinstance(self.files, Unset):
            _temp_files = []
            for files_item_data in self.files:
                files_item = files_item_data.to_tuple()

                _temp_files.append(files_item)
            files = (None, json.dumps(_temp_files).encode(), "application/json")

or

        files: Union[Unset, tuple[None, bytes, str]] = UNSET
        if not isinstance(self.files, Unset):
            files = files

The later works well in the case and should be just a minimal, non-breaking and optional change?

FabianSchurig avatar Mar 23 '25 12:03 FabianSchurig

I reimplemented the changes based on the latest main branch. I tried to merge changes from main in here, but there were just too many changes in the end and the review view was just messed up completely. @dbanty I would really like to get this merged. We use the generator to build our REST API client for Polarion, which is a proprietary tool so we cannot adjust its API. As the current template version on main does not work for arrays of files at all (leads to a JSON serialization error as bytes are not JSON serializable) we have to use a custom template to get a working client. And updating these custom templates for new generator releases is really hard work. Because you asked before for a real word example - the full specification of the Polarion REST API can be found here and an endpoint using this array of files can be found here.

micha91 avatar May 13 '25 08:05 micha91

@micha91 thanks for updating! I'm slowing making some changes as I get time. A couple things still left to look into:

  1. According to the docs we should actually be using the data param instead of the files param for everything that's not a file... which would simplify things a lot I think. But it also doesn't specify what types are allowed here, so I have to look into that.
  2. I want to add multiple files to the integration tests just to make sure we keep it working long term

Hoping to get this all wrapped up soon. Thanks for your hard work and patience on this ๐Ÿ™‡

dbanty avatar May 17 '25 03:05 dbanty

Integration tests are updated, so left to do:

  • [x] Figure out what the new minimum version of httpx is or how to continue supporting 0.20 (https://github.com/openapi-generators/openapi-python-client/actions/runs/15366450734/job/43239989641?pr=938)
  • [x] Remove now-dead code triggering coverage warning
  • [x] ~Use data instead of files when appropriate per the docs~ Decided not to do this because data doesn't give us control over encoding, which we'll need some day.

dbanty avatar May 31 '25 18:05 dbanty

@micha91 (and all following) thanks for all the work, I think this is almost ready! One thing I've noticed as I go through and test/fix more stuff around multipart is that having a top-level schema of an array really didn't work anyway. The test we had with this schema:

"multipart/form-data": {
    "schema": {
        "type": "array",
        "items": {
            "type": "string",
            "format": "binary"
        }
    }
}

Produced code like this:

for body_item_data in body:
        body_item = body_item_data.to_tuple()

        _body.append(body_item)

    _kwargs["files"] = _body

Which as far as I can tell is completely invalid, since each .to_tuple() will have the (file_name, content, mime_type) but no field name, like ("files" (file_name, content, mime_type)). So I just removed that.

Any reason you can think of that we should keep some sort of top-level array-of-files support? How would that even work in multipart?

dbanty avatar Jun 03 '25 02:06 dbanty

@dbanty Thanks for your additional efforts on this! I will give the current implementation a try today to see if it also works in our real world client ๐Ÿ‘

Any reason you can think of that we should keep some sort of top-level array-of-files support? How would that even work in multipart?

I think it is simply unsupported when using multipart, so it should be absolutely correct to drop it

micha91 avatar Jun 03 '25 07:06 micha91

@dbanty Thanks for your additional efforts on this! I will give the current implementation a try today to see if it also works in our real world client ๐Ÿ‘

Any reason you can think of that we should keep some sort of top-level array-of-files support? How would that even work in multipart?

I think it is simply unsupported when using multipart, so it should be absolutely correct to drop it

Works as expected. I still need custom some custom templates, but that's due to issues in the specification

micha91 avatar Jun 05 '25 17:06 micha91