fix: Fix multipart body file array
As described in #692 arrays of files are not handled correctly, if they are part of multipart/form-data. This is fixed in this PR by letting to_multipart return a List[Tuple[str, Any]] instead of a Dict[str, Any].
Thanks for adding this support! I believe this is a breaking change, since it's possible someone was using the JSON serialization behavior before (even though it seems wrong in general). I'll need to do some manual testing just to make sure I know how to describe the breaking change ๐ . I'd also love to get someone to test a real running API and verify it all functions (maybe you've already done this?)
Yes, I already use the updated model template as a custom template in a project. Unfortunately this project is not public yet and the API it is developed for, isn't publicly accessible, too
Hi, just tested this out with an api I am working on. It works great.
@dbanty @micha91 Thank you so much for providing the openapi-python-client package and this PR!
I tested it in the following way:
pipx install git+https://github.com/openapi-generators/openapi-python-client.git@refs/pull/938/head
openapi-python-client update --path ../../swagger.json --config openapi-python-generator-config.yml
"/upload": {
"post": {
"tags": [
"Microsoft.KernelMemory.ServiceAssembly"
],
"summary": "Upload a new document to the knowledge base",
"description": "Upload a document consisting of one or more files to extract memories from. The extraction process happens asynchronously. If a document with the same ID already exists, it will be overwritten and the memories previously extracted will be updated.",
"operationId": "UploadDocument",
"requestBody": {
"description": "Document to upload and extract memories from",
"content": {
"multipart/form-data": {
"schema": {
"type": "object",
"properties": {
"index": {
"type": "string",
"description": "Name of the index where to store memories generated by the files."
},
"documentId": {
"type": "string",
"description": "Unique ID used for import pipeline and document ID."
},
"tags": {
"type": "object",
"additionalProperties": {
"type": "string"
},
"description": "Tags to apply to the memories extracted from the files."
},
"steps": {
"type": "array",
"items": {
"type": "string"
},
"description": "How to process the files, e.g. how to extract/chunk etc."
},
"files": {
"type": "array",
"items": {
"type": "string",
"format": "binary"
},
"description": "Files to process and extract memories from."
}
}
}
}
}
},
In general it works! I would greatly appreciate if we could have similar functionality to solve https://github.com/openapi-generators/openapi-python-client/issues/692 in the main branch, soon.
@dbanty What do you think about an alternative where a user could just overwrite how a
"files": {
"type": "array",
"items": {
"type": "string",
"format": "binary"
},
is handled?
This could either generate:
files: Union[Unset, tuple[None, bytes, str]] = UNSET
if not isinstance(self.files, Unset):
_temp_files = []
for files_item_data in self.files:
files_item = files_item_data.to_tuple()
_temp_files.append(files_item)
files = (None, json.dumps(_temp_files).encode(), "application/json")
or
files: Union[Unset, tuple[None, bytes, str]] = UNSET
if not isinstance(self.files, Unset):
files = files
The later works well in the case and should be just a minimal, non-breaking and optional change?
I reimplemented the changes based on the latest main branch. I tried to merge changes from main in here, but there were just too many changes in the end and the review view was just messed up completely. @dbanty I would really like to get this merged. We use the generator to build our REST API client for Polarion, which is a proprietary tool so we cannot adjust its API. As the current template version on main does not work for arrays of files at all (leads to a JSON serialization error as bytes are not JSON serializable) we have to use a custom template to get a working client. And updating these custom templates for new generator releases is really hard work. Because you asked before for a real word example - the full specification of the Polarion REST API can be found here and an endpoint using this array of files can be found here.
@micha91 thanks for updating! I'm slowing making some changes as I get time. A couple things still left to look into:
- According to the docs we should actually be using the
dataparam instead of thefilesparam for everything that's not a file... which would simplify things a lot I think. But it also doesn't specify what types are allowed here, so I have to look into that. - I want to add multiple files to the integration tests just to make sure we keep it working long term
Hoping to get this all wrapped up soon. Thanks for your hard work and patience on this ๐
Integration tests are updated, so left to do:
- [x] Figure out what the new minimum version of
httpxis or how to continue supporting 0.20 (https://github.com/openapi-generators/openapi-python-client/actions/runs/15366450734/job/43239989641?pr=938) - [x] Remove now-dead code triggering coverage warning
- [x] ~Use
datainstead offileswhen appropriate per the docs~ Decided not to do this becausedatadoesn't give us control over encoding, which we'll need some day.
@micha91 (and all following) thanks for all the work, I think this is almost ready! One thing I've noticed as I go through and test/fix more stuff around multipart is that having a top-level schema of an array really didn't work anyway. The test we had with this schema:
"multipart/form-data": {
"schema": {
"type": "array",
"items": {
"type": "string",
"format": "binary"
}
}
}
Produced code like this:
for body_item_data in body:
body_item = body_item_data.to_tuple()
_body.append(body_item)
_kwargs["files"] = _body
Which as far as I can tell is completely invalid, since each .to_tuple() will have the (file_name, content, mime_type) but no field name, like ("files" (file_name, content, mime_type)). So I just removed that.
Any reason you can think of that we should keep some sort of top-level array-of-files support? How would that even work in multipart?
@dbanty Thanks for your additional efforts on this! I will give the current implementation a try today to see if it also works in our real world client ๐
Any reason you can think of that we should keep some sort of top-level array-of-files support? How would that even work in multipart?
I think it is simply unsupported when using multipart, so it should be absolutely correct to drop it
@dbanty Thanks for your additional efforts on this! I will give the current implementation a try today to see if it also works in our real world client ๐
Any reason you can think of that we should keep some sort of top-level array-of-files support? How would that even work in multipart?
I think it is simply unsupported when using multipart, so it should be absolutely correct to drop it
Works as expected. I still need custom some custom templates, but that's due to issues in the specification