serverless-offline icon indicating copy to clipboard operation
serverless-offline copied to clipboard

Upload image issue

Open dfloresgonz opened this issue 7 years ago • 52 comments

My image seems to be corrupted when uploaded to server, It only works with .txt files.

I'm using sls 1.27, EspressJS and multiparty Is there a way to upload images without any problems?

Thanks.

function uploadImage(req, res) {
    let multiparty = require('multiparty');
    let fs = require('fs');
    let form = new multiparty.Form();
    form.parse(req, function(err, fields, files) {
        console.log(files);
        //uploading file...
        })).then(result => {
            res.status(200).send({ msj : 'Images were uploaded'});
        }).catch(err => {
            res.status(500).send(err);
        });
    });
}
const serverless = require('serverless-http');
const express = require('express')
const app = express()
require('./middlewares/authenticated');

var bodyParser = require('body-parser');
var cors       = require('cors');

app.use(cors({'origin' : '*'}));
app.use(bodyParser.urlencoded({ extended : false }) )
   .use(bodyParser.json());
service: my-backend

provider:
  name: aws
  runtime: nodejs6.10
  stage: prod
  region: us-east-1

plugins:
  - serverless-offline
  - serverless-s3-local

functions:
  app:
    handler: index.handler
    events:
      - http: ANY /
      - http: 'ANY {proxy+}'
  upload:
    handler: index.handler
    events:
      - http: 'POST /api/registro/uploadImage/'

dfloresgonz avatar Jul 27 '18 01:07 dfloresgonz

@dfloresgonz did you ever get this working? I'm running into the same issue.

evangow avatar Sep 04 '18 17:09 evangow

@evangow I read that sls offline doesnt support binary types so it will never work, but only offline. I used base64 to make it work on local env.

dfloresgonz avatar Sep 04 '18 17:09 dfloresgonz

@dfloresgonz Are you converting it to base64 on the client side, then sending it to the server? Or, are you using something like the the gist linked below to parse the event and pass it onto serverless-http?

Would you happen to have a code sample or gist you could share if you're doing it on the server side? I think I could probably figure it out on the client side, but I've been banging my head against the wall for hours looking through issues trying to sort this out.

Parser Gist: https://gist.github.com/lteacher/9ef1c7bc5908418b30a18719521ff3c7#file-parsers-js-L12-L41 ^ Found in this issue: https://github.com/dherault/serverless-offline/issues/230

evangow avatar Sep 04 '18 20:09 evangow

same here using

curl -X POST -F 'file=@/home/jan/figub/pass.mp3;type=audio/mpeg' localhost:3333/upload -H "Content-Type: audio/mpeg" --verbose -H "Accept: audio/mpeg"

janwirth avatar Oct 22 '18 22:10 janwirth

With both serverless-offline-python and serverless-offline on a nodejs environment the body contains payload which is ~30% larger than the original file. A diff between the binary files showed perfect identity up until two thirds - where in the resulting file a bunch of data is appended.

janwirth avatar Oct 22 '18 22:10 janwirth

Same here, cannot get it working, we looked through all the other code time after time without understanding what was happening. It would be good to at least throw something if it is not supported - took us more than a day to find this problem.

malnor avatar Dec 03 '18 09:12 malnor

We would gladly accept a PR on this. It shouldn't be much, just a Hapijs tweaking.

dherault avatar Jan 08 '19 12:01 dherault

FYI - I chose a different architecture. The lambda now just requests a temporary URL from my file bucket. The file bucket can be hidden behind an internal (transparent) redirect.

janwirth avatar Jan 08 '19 22:01 janwirth

I have encountered this issue and after extensive investigation, I found that https://github.com/dherault/serverless-offline/blob/master/src/index.js#L490 causes the issue.

If I am trying to upload an image via multipart/form-data it is converted toString('binary') ( which seems is deprecated). Then it's passed to express (in my case), the request goes through https://www.npmjs.com/package/multer, which is decoded incorrectly and the image becomes corrupted. But if I commented out that toString call, express with its middleware correctly handles the raw buffer.

It was added with https://github.com/dherault/serverless-offline/pull/394 as a workaround? As there behavior that binary data was converted to a utf8 string.

At least from initial testing, it seems that serverless offline works fine without 490 line. Maybe @dherault and @lteacher @daniel-cottone could comment if it's still required?

arnas avatar Feb 15 '19 08:02 arnas

If the line is removed then i assume it would work, because the line does some magic check to ensure the toString('utf8') doesnt happen for multipart/form-data. So if it didn't ever do that the encoding detect thing wouldn't be needed.

Actually I haven't been using serverless for 2 years but we still have a service running that parses multi-part form data, though it was using an old branch.... I did switch to the latest serverless-offline just now and found that it worked ok still.

So regarding the above question, @arnas I also removed the line to see and as expected it worked just fine for me. That wasn't too surprising though because the check added was because of the toString('utf8') and that was to fix another issue so I have no idea if that issue would be a problem when removing that line. See #224

I also spent a bit of time checking out what the whole problem is around here and it was painful... I guess at the end of the day the issue seems to come down to the encoding (since I wasn't able to get an exact answer with code links to provide). So probably for most of the above, its something like... an issue because the parsing magic is done on pipe and the encoding has been set to binary string and its not a buffer when it comes to your logic.

For sure its just the above referenced line that converts to string, so if its removed and it doesn't break whatever that old issue was then it would be ok. Otherwise you could try setting the encoding, there is some check that happens for if it is an actual buffer so it would probably still work both online and in offline...

So summary try locally setting the encoding... In this gist the write function is called and given the encoding. In the examples from above the magical middleware you are all using is doing it in the pipe so try maybe setting req.setEncoding('binary') before passing it to the parse... which is a random guess since i didn't test it.

lteacher avatar Feb 15 '19 11:02 lteacher

Well, it seems that this issue only appears for images, my colleague has tested and for example, uploading .txt works fine. It's a bit strange but from my finding, only images are affected.

@lteacher I have checked the pr #224 and at least form the first look seems that toString() is unnecessary as to prevent hapis.js from parsing payload you can simply set payload to parse to false. And I am assuming if hapi parse is set to false function toString() should do nothing as payload already is a string.

arnas avatar Feb 15 '19 13:02 arnas

@arnas sorry I dont get exactly what you mean with the last sentence.

Problem History

  • #224 added the toString for whatever reason. This change breaks everyone who has data for upload etc as it converts to 'utf8'.
  • #230 added function to ensure that just for multipart/form-data the encoding will be chosen that doesn't destroy the data, which is then 'binary'.
  • In this issue thread, people are piping the req data without any set encoding but the data was converted to string, so that wont work. To resolve it without changing here they need to set the encoding as it will find its not a buffer and process as a string and it will not be the correct encoding as it will default to utf8 (side note, I saw in the multiparty code that it will throw an error if you try to set encoding)

Fixing in this repo

So to resolve the issue based on what you said above then the best thing is to not do any toString and remove this logic but you might need to fix that issue from #224 some other way, maybe with that parse option or whatever

Fixing original author issues (assuming the fix is not done here in rep)

  • Dont use multiparty.
  • When using multer set the req encoding
  • When using formidable pass the encoding as an option
  • When using something else i dunno, check the source code and find out how to set the encoding

Also for original author if using wrapping packages with this kind of thing like the file data etc you need to check closely everywhere on the packages you are using like this doc over here. As I recall (but might have changed over last 2 years) you can only get binary content to AWS lambda functions in base64 anyway (but of course even the base64 string is destroyed if it is encoded incorrectly to utf8).

@arnas I assume that you have success already on AWS and that you use something like serverless-http.

lteacher avatar Feb 16 '19 02:02 lteacher

@lteacher I have update the wording of my previous comment last sentence.

For fixing repo I believe that would be the best way. I am hoping that removing toString while hapi payload parse is to false will be enough. I will look into it if I have time.

@lteacher I am using serverless-http, but I havent deployed it to aws as I amasuming it will work without much problem as atleast from documentation aws api gateway supports that https://aws.amazon.com/about-aws/whats-new/2016/11/binary-data-now-supported-by-api-gateway/ .

arnas avatar Feb 16 '19 09:02 arnas

@arnas oh well should be np if using that serverless-http

Actually I was thinking of this again for some reason and I had another look and noticed that the parse option is actually used already over here. However, I actually didn't notice before when i added that detectEncoding but that payload is just needed as a string for the JSON.parse in there. There is some createLambdaProxyContext which uses the payload but someone haxed something in to not parse if its not a string there (using rawPayload).

Theres a velocity template method too seems to assume the payload is json so dunno about that since sometimes its not already, otherwise could be able to just even move that stringifying stuff into the relevant scope where its needed which is like if it needs to be parsed per last reference?

lteacher avatar Feb 18 '19 04:02 lteacher

@lteacher I have a bit of debugging and from the first look it seems that this issues is not only from serverless-offline side. Basically I have created plain handler.js which uses serverless-http to transfer requests to express and multer to deal with multipart data. https://gist.github.com/arnas/d8fed4b78ff2940a3b390e754090cea9
Without changing any dependencies I have confirmed that txt files were encoded and decoded correctly, but there was an issue with png files. They have grown about 30 percentage bigger. (I think somebody mentioned that).
Also, I have confirmed that if I encode buffer and decode it afterward via Buffer(binaryString, 'binary') everything is fine. But as we are giving serverless-http string it uses Buffer.from(binaryString, 'utf-8') and corrupts file unless it was a text file :/ https://github.com/dougmoscrop/serverless-http/blob/c70957b47ac66e363c7179a6e0a4cd8c6d78a2ee/lib/request.js#L15.

tl;dr the bug happens because serverless-http wrongly encodes request.

Suggestions for solution

  • I haven't looked into these velocity tempaltes and I am not sure what are they for, but I believe that encoding to binary string are not required. So I would suggest to remove them and assume that the lambda handler will be able to handle raw buffer.

  • Wait till and if serverless-http will fix it from their side. There were some issues with binary files https://github.com/dougmoscrop/serverless-http/issues/50, but i will raise a new one.

I could create a pr for a first point if this approch is acceptable with lib maintainers.

Side note rawPayload is always equal to https://github.com/dherault/serverless-offline/blob/master/src/index.js#L490

arnas avatar Feb 22 '19 23:02 arnas

Hey @arnas I think that you are missing some setup on AWS side. If you want to use binary content you need to do some different setup per these docs: https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-payload-encodings.html

Thats why I posted this thing from serverless-http

I knew about that Buffer.from(binaryString, 'utf-8') when i last looked and posted my response but I didnt post it here because its not relevant if the AWS setup is correct. That is because on AWS side if you are using binary content it should be coming base64 encoded and there is a flag its called isBase64Encoded. Actually I looked in the serverless-http code lol and I saw this flag so I assumed they used the flag but.... actually now I see they detect it or something so instead you have to provide some option? to set binary: true

If you look here in this gist you can see the actual property that your aws lambda-proxy binary content enabled handler will receive on the event. That is not a magical value I added for the gist, sadly that is an AWS thing that gets attached to the event.

I actually don't use serverless-http so dunno about that option. I didn't think it would be nice to include express or koa etc inside the this stuff. I have a separate package I created to make it nice for adding parsing etc, then I use the exact gist I posted above for parsing before the file handler.

In summary I think you can change your AWS setup to correctly handle binary content, then you are going to start recieving base64 encoded blobs in serverless-http. Then you need to be sure that serverless-http is expecting it to be isBase64Encoded. After that it should already work on AWS. Then to get serverless-offline working is where the actual issues are left because it doesn't do the base64 encoding magic that the AWS stuff is doing so thats all this entire issue so far.

lteacher avatar Feb 23 '19 02:02 lteacher

I'm running into this issue using serverless-http + serverless-offline and multer-s3 for handing multipart/form-data when uploading images.

@lteacher do you have a recommendation for getting serverless-offline to correctly handle this? My service works when deployed on AWS through api-gateway. For local development it is not.. Referring to your last comment above: "Then to get serverless-offline working is where the actual issues are left because it doesn't do the base64 encoding magic that the AWS stuff is doing so thats all this entire issue so far."

hqnarrate avatar Oct 24 '19 20:10 hqnarrate

@hqnarrate Sorry im just not up to date on this issue any more, there seems to be a PR #784 that was working to address this somewhat through the config options.

Looking back over the comments it does seem like there is a way to refactor the offending line out but I just don't have any time to investigate that at the moment and I don't use any of the packages mentioned in these issues except serverless-offline so I can't really test any resolution.

Maybe @cmuto09 can give an update on where the PR referenced above is at, it looks like it needs rebasing at minimum.

lteacher avatar Oct 24 '19 23:10 lteacher

@hqnarrate @lteacher a bit of a problem is that @cmuto09 PR depends on the https://www.npmjs.com/package/serverless-apigw-binary plugin. but it seems serverless supports this now as well.

dnalborczyk avatar Oct 25 '19 00:10 dnalborczyk

some references for serverless support:

https://serverless.com/framework/docs/providers/aws/events/apigateway/#binary-media-types https://serverless.com/blog/framework-release-v142/ https://github.com/serverless/serverless/pull/6063

additional issues: https://github.com/serverless/serverless/issues/2797 https://forum.serverless.com/t/returning-binary-data-jpg-from-lambda-via-api-gateway/796 https://github.com/dougmoscrop/serverless-http/issues/88

dnalborczyk avatar Oct 25 '19 00:10 dnalborczyk

@dnalborczyk - i was not clear. My service is working fine when deployed to AWS environment by setting the apiGateway.binaryMediaTypes correctly. It is when I'm running on offline mode with 'serverless offline start', my uploads would become corrupted.

hqnarrate avatar Oct 25 '19 00:10 hqnarrate

@hqnarrate sorry, I think I haven't been clear. 😄 I knew what you meant. The links above are pointers for me (or anyone else getting to it) for the implementation.

dnalborczyk avatar Oct 25 '19 02:10 dnalborczyk

@dnalborczyk Any update or workarounds on this issue?

pvsvamsi avatar Jan 20 '20 13:01 pvsvamsi

My solution: https://stackoverflow.com/a/61003498/9585130

Just add this to serverless.yml.

provider:
  apiGateway:
    binaryMediaTypes:
      - '*/*'

And I'm uing "aws-serverless-express-binary": "^1.0.1"

abinhho avatar Apr 03 '20 01:04 abinhho

Does anybody know a way around this in offline mode? I am trying to upload files / pdf and this does not work for me

lassesteffen avatar Apr 12 '20 14:04 lassesteffen

@pvsvamsi @lassesteffen Try aws-serverless-express-binary instead aws-serverless-express

abinhho avatar Apr 14 '20 00:04 abinhho

I am using apollo-server-lambda not express

lassesteffen avatar Apr 14 '20 06:04 lassesteffen

Try upgrading to the latest versions of serverless and serverless-offline. I believe this was fixed, but I could not find anything about it in the changelog

andriesss avatar Apr 14 '20 06:04 andriesss

Same issue there

@abinhho tried your solution but not working, Binary Media Types seems to be for download not upload. Not working in my case

My solution was to convert my file to base64 before upload, not really nice but only way I found to make it works

ajouve avatar Apr 25 '20 06:04 ajouve

@ajouve This upload function still running in my before project. So I sure it's worked. Maybe something wrong on your code. If you can share your code, I think someone can help.

abinhho avatar Apr 25 '20 08:04 abinhho