yada Incorrect parsing of multipart data results in unfilled form parameters and NPE in logs

yada 1.3.0-alpha10. Take a look at this:

I don't think I can provide you the data itself, but let me know if I can provide anything else.

May 08 '19 18:05 p-himik

Trying to debug this myself. I placed a breakpoint in yada.multipart/copy-bytes-after-delimiter and I'm trying to look through the values of window. Of course maybe I don't understand something and window can be reused partially, but right now it seems that there's broken content - sometimes no content-type after the boundary, sometimes mixed content of XML and PDF files that I'm trying to upload. As if there's some concurrency issue and multiple threads are writing to the same buffer without using any locks.

May 08 '19 18:05 p-himik

Do you have any test data you can share that replicates the issue?

May 08 '19 19:05 malcolmsparks

Couldn't share the original data but managed to reproduce the issue with some completely random data. Note that it's rather sensitive to the sizes. Form data:

------WebKitFormBoundaryAhflCu7e161nGb1Z
Content-Disposition: form-data; name="edit"

["^ ","~:media",["^ ","~i-43",["^ ","~:name","1111122222aaaaabbbbbccccc.sib","~:kind","~:main-sib","~:score-id",-1,"~:id",-43],"~i-44",["^ ","^2","11111222223333.pdf","^3","~:main-pdf","^5",-1,"^6",-44],"~i-45",["^ ","^2","11111111aaaaabbbbbccccccc.xml","^3","~:main-xml","^5",-1,"^6",-45]],"~:score",["^ ","~:description","","~:sequence-type-display-text","interval","~:default-tempo",100,"~:collection-id",49,"~:difficulty","Intermediate","~:transpose-limit",null,"~:percent-random",0,"~:transpose",false,"^2","aaaaaaaa","~:duration",90,"~:tempo-control-mode","Multiplier","~:play-as-chords",false]]
------WebKitFormBoundaryAhflCu7e161nGb1Z
Content-Disposition: form-data; name="main-sib"; filename="1111122222aaaaabbbbbccccc.sib"
Content-Type: application/octet-stream


------WebKitFormBoundaryAhflCu7e161nGb1Z
Content-Disposition: form-data; name="main-pdf"; filename="11111222223333.pdf"
Content-Type: application/pdf


------WebKitFormBoundaryAhflCu7e161nGb1Z
Content-Disposition: form-data; name="main-xml"; filename="11111111aaaaabbbbbccccccc.xml"
Content-Type: text/xml


------WebKitFormBoundaryAhflCu7e161nGb1Z--

The files themselves: files.zip

May 08 '19 19:05 p-himik

Huh, it's not that rare actually - I just encountered it 3 more times with different data. Also, I just noticed that there's this in the logged stack traces:

clojure.lang.ExceptionInfo: No end delimiter
at yada.multipart$finish_up.invokeStatic(multipart.clj:305)
at yada.multipart$finish_up.invoke(multipart.clj:229)
at yada.multipart$process_chunk.invokeStatic(multipart.clj:340)
at yada.multipart$process_chunk.invoke(multipart.clj:338)
at yada.multipart$parse_multipart$this__10282__auto____41431$fn__41432$fn__41433.invoke(multipart.clj:426)
at manifold.deferred$fn__10164$chain___10185.invoke(deferred.clj:859)
at manifold.deferred$fn__10164$subscribe__10165$fn__10174.invoke(deferred.clj:815)
at manifold.deferred.Listener.onSuccess(deferred.clj:219)
at manifold.deferred.Deferred$fn__9975$fn__9976.invoke(deferred.clj:398)
at clojure.lang.AFn.run(AFn.java:22)
at io.aleph.dirigiste.Executor$Worker$1.run(Executor.java:62)
at manifold.executor$thread_factory$reify__9488$f__9489.invoke(executor.clj:47)
at clojure.lang.AFn.run(AFn.java:22)
at java.base/java.lang.Thread.run(Thread.java:834)

But the initial error is still there in the stack traces as well, so I don't really think that the "No end delimiter" is really relevant.

May 09 '19 21:05 p-himik

@malcolmsparks I think this line https://github.com/juxt/yada/blob/master/ext/multipart/src/yada/multipart.clj#L187 should use copy-bytes-after-delimiter instead. What do you think?

May 10 '19 10:05 p-himik

Clarification: I'm seeing similar symptoms with version 1.2.15 and 1.2.16.

@p-himik I suspect I'm running into this exact issue as well. It seems to trigger more often with certain arg combinations, and for certain combos it fails 50% of the time locally, 100% of the time on our server. It definitely smells like a concurrency issue of some sort.

Did ya' try out that change that you suggested?

Jul 02 '19 22:07 leguma

@leguma Yep, so far so good since I've created this issue. I just copied the file into the source directory of my own project and modified it there. The classpath preference makes sure that it's the one that's used instead of the yada's original one.

Jul 08 '19 18:07 p-himik

@p-himik I verified that fixing that line worked! In case someone wants some reproduction information, here's a bunch of tests I ran attempting to isolate the issue: Yada Multipart Testing.xlsx

It can be difficult to reproduce with a universal set of args since it seems very platform specific, but you should be able to do so with some tinkering.

I'll submit a PR with the fix, since I don't see one submitted yet for some reason. For anyone coming in before it's in a stable release, modify the line that @p-himik mentioned (pass in m directly instead of destructuring it): :bytes (copy-bytes-after-delimiter m s e)

I think the issue has something to do with multipart chunking coming in different orders sometimes? Example good and bad slices it's trying to parse:

Failure: ([] [----------------------------035613219007199590421006] [Content-Disposition form-data; name="file2"; filename="file2.pdf"] [Content-Type application/pdf])

Success: ([Content-Disposition form-data; name="file1"; filename="file1.jpg"] [Content-Type image/jpeg])

When it goes to grab info from those parsed headers, it grabs first and second elements, which are obviously wrong in the failure example.

Jul 08 '19 21:07 leguma

This is good news. Which platforms are involved? I have been unable to replicate the issue on Linux, but very happy to review/merge a PR

Jul 09 '19 08:07 malcolmsparks

@malcolmsparks I am on Linux as well, I was able to replicate it locally and on Heroku. But as I mentioned on Gitter, I was unable to make a reliable test case. I don't think it's platform specific - seems to be dependent on the overall scenario of execution. I.e. the test data I have provided worked when I used it with a running application but didn't work in a test. Most likely, one can just tweak payload size while taking CHUNK-SIZE into account (maybe some other buffer size, can't tell for sure) to make it reproducible in a test.

Jul 09 '19 08:07 p-himik

That's useful info. I'll try to replicate locally in that case. The implementation was subjected to a round of property-based testing but obviously that didn't catch this.

Jul 09 '19 08:07 malcolmsparks

But I think that the fix that I've come up with can and should be implemented even if it proves impossible to create a robust test case. After all, copy-bytes-after-delimiter is definitely the correct method to use there since it is expected to have a boundary delimiter there.

Jul 09 '19 08:07 p-himik

Yes I agree.

Jul 09 '19 08:07 malcolmsparks

This is good news. Which platforms are involved? I have been unable to replicate the issue on Linux, but very happy to review/merge a PR

It happens on any platform. I've reproduced it in a Linux docker image, a Windows machine, and on OSX. As ya' can see from the tests I ran, the order and size of the files seem to matter, as well as some fuzziness caused by environment-specific stuff. I have reproduced it with XHR and Postman. The only pattern that I can immediately point to is "large" files followed by "small" ones may somehow be a trigger. Try using 4+ files of differing sizes and juggling them around until you trip the switch.

Jul 09 '19 16:07 leguma

@malcolmsparks Wanna review #303? It should take all of 10 seconds. :)

Jul 16 '19 16:07 leguma