msgraph-sdk-java icon indicating copy to clipboard operation
msgraph-sdk-java copied to clipboard

When uploading file using LargeFileUploadTask (streaming and not in one part) the file getting corrupted in the SharePoint server

Open ihudedi opened this issue 1 year ago • 22 comments

Describe the bug

I am using latest version of graph api - 6.24.0 When I uploading any file using graphServiceClient.drives().byDriveId(driveId).items().byDriveItemId(getDriveItemIdFromRootPath(filePath)).content().put(fileContent) file is uploading correct.

When I upload via the LargeUploadTask - file is corrupted for any file such as MSOffice file or dat file or jar file. For MSOffice the file can't be opened and for the dat file null values are being inserted to the file. The code that I am using for the uploading in streaming is: LargeFileUploadTask<DriveItem> largeFileUploadTask = null; IProgressCallback callback = null; UploadResult<DriveItem> uploadResult = null; try { int spoPartSize = 5242880;

        // Set body of the upload session request
        // This is used to populate the request to create an upload session
        DriveItemUploadableProperties driveItemUploadableProperties = new DriveItemUploadableProperties();
        driveItemUploadableProperties.getAdditionalData().put("@microsoft.graph.conflictBehavior", "replace");

        // Finish setting up the request body
        CreateUploadSessionPostRequestBody uploadSessionPostRequestBody = new CreateUploadSessionPostRequestBody();
        uploadSessionPostRequestBody.setItem(driveItemUploadableProperties);

        // Create the upload session
        UploadSession uploadSession = graphServiceClient.drives()
                .byDriveId(driveId)
                .items()
                .byDriveItemId(getDriveItemIdFromRootPath(filePath))
                .createUploadSession().post(uploadSessionPostRequestBody);

        if (null == uploadSession) {
            throw new SPOException("SPOWrapper::uploadObjectInParallel: Could not create upload session");
        }

        // Create the large file upload task
        largeFileUploadTask = new LargeFileUploadTask(graphServiceClient.getRequestAdapter(),
                uploadSession,
                fileContent,
                fileSize,
                spoPartSize,
                DriveItem::createFromDiscriminatorValue);

        if (null == largeFileUploadTask) {
            throw new SPOException("SPOWrapper::uploadObjectInParallel: Could not create upload task");
        }

        // Create a callback used by the upload provider
        callback = new SPOProgressCallback(spoFileCopy);

        // Do the upload
        uploadResult = largeFileUploadTask.upload(spoRequestMaxAttempts, callback);
        if (uploadResult == null || !uploadResult.isUploadSuccessful()) {
            throw new SPOException(String.format("SPOWrapper::uploadObjectInParallel: filePath= '%s' wasn't uploaded successfully via upload method", originalFilePath));
        } else {
            logger.debug(String.format("SPOWrapper::uploadObjectInParallel: filePath= '%s' uploaded successfully via upload method", originalFilePath));
        }
    } catch (Exception e) {
        boolean spoIgnoreFailureWhenUploadingFileInParts = Boolean.valueOf(hostProperties.getParameterValueFromAdditionalParamsOrFromAftConfig(mftPropertiesConfig, PropertyData.spoIgnoreFailureWhenUploadingFileInParts, AdditionalParametersConsts.spoIgnoreFailureWhenUploadingFileInParts));
        logger.error(String.format("Error in SPOWrapper::uploadObjectInParallel, filePath = '%s', fileSize = '%s' : %s", originalFilePath, fileSize, e.getMessage()), e);
       /* try {
            if (uploadTask != null && callback != null) {
                uploadResult = uploadTask.resume(spoRequestMaxAttempts, callback);
                if (uploadResult == null || !uploadResult.isUploadSuccessful()) {
                    throw new SPOException(String.format("SPOWrapper::uploadObjectInParallel: filePath= '%s' wasn't uploaded successfully via resume method", originalFilePath));
                } else {
                    logger.debug(String.format("SPOWrapper::uploadObjectInParallel: filePath= '%s' uploaded successfully via resume method", originalFilePath));
                    return;
                }
            }
        } catch (Exception ex) {
                  logger.error(String.format("Error in SPOWrapper::uploadObjectInParallel, filePath = '%s', fileSize = '%s' : %s", originalFilePath, fileSize, e.getMessage()), e);
        throw e;
    } finally {
        logger.debug(String.format("End SPOWrapper::uploadObjectInParallel: filePath= '%s', fileSize= %s", originalFilePath, fileSize));
    }

Thanks, Itay

Expected behavior

File should uploaded corrdct with same bytes and noit being corrupted.

How to reproduce

Uploading file using LargeFileUploadTask

SDK Version

6.24/0

Latest version known to work for scenario above?

No response

Known Workarounds

No response

Debug output

Click to expand log ```
</details>


### Configuration

_No response_

### Other information

_No response_

ihudedi avatar Dec 24 '24 06:12 ihudedi

Thanks for reporting this @ihudedi.

Would you mind sharing the result of some checks here? This will help with debugging.

This will validate that the number of bytes uploaded is equal to the number of bytes in the file & just to validate contents are similar using a checksum?

// Calculate checksum before upload
String uploadFileChecksum = DigestUtils.md5Hex(fileContent);

// You might need to rewind the file content stream/open a new stream with the same file path here
// we'll validate the checksum of the local file matches the one of the uploaded file after the large file upload

// Create your upload session & execute large file upload task
// ...


// After getting uploadResult...
// Use the drive item ID in the uploadResult to fetch the uploaded file
DriveItem uploadedFile = graphServiceClient.drives()
        .byDriveId(driveId)
        .items()
        .byDriveItemId(uploadResult.itemResponse.getId())
        .get();

assertEquals(fileSize, uploadedFile.getSize());

InputStream downloadedFile = graphServiceClient.drives()
        .byDriveId(driveId)
        .items()
        .byDriveItemId(uploadResult.itemResponse.getId())
        .content()
        .get();

assertEquals(uploadFileChecksum, DigestUtils.md5Hex(downloadedFile));

Ndiritu avatar Dec 24 '24 08:12 Ndiritu

Hi @Ndiritu This code is used by our customers and some customers complains the same issue. In my env I can't reproduce it but I am trying to analyze the file they are trying to upload and the OS type. Thanks, Itay

ihudedi avatar Dec 24 '24 09:12 ihudedi

@ihudedi thanks for letting me know. I will need to create some test files and validate this. My tests work for .txt, csv and .xslx files.

Ndiritu avatar Dec 24 '24 09:12 Ndiritu

Hi @Ndiritu For ms file the server change the size.this is how it happening when uploading excel file.file get different size but readable. When I upload execl file also from web the size is getting different. Destination file size vs. source file size validation failed: source file size = 21641457, destination file size = 21647213 The file is not corrupted but the server always modify the file. Also when I upload from sharepoint site. Thanks, Itay

ihudedi avatar Dec 24 '24 09:12 ihudedi

Thanks for clarifying @ihudedi. So the issue is on the binary file formats (.dat, .jar..) where the file is corrupted? I'll need this to log a ticket with the API team.

Ndiritu avatar Jan 03 '25 08:01 Ndiritu

Hi @Ndiritu I got an example of the file that being transferred and corrupted after upload. I tried to reproduce on my env with no luck. How can it possible that this file is being corrupted in one env and not on mine. Both env are Linux machine. Attached the file before the upload and after the upload to SharePoint. If there any log I can see let me know or there is anything in the server log to display. files.zip

The zip file contains the file befroe the upload and after upload Thanks, Itay

ihudedi avatar Jan 05 '25 09:01 ihudedi

Hi @Ndiritu Is there any progress? Who can analyze the server logs to check why the file is being corrupted? Thanks, Itay

ihudedi avatar Jan 07 '25 11:01 ihudedi

Hi @Ndiritu Any updates?

Thanks, Itay

ihudedi avatar Jan 15 '25 11:01 ihudedi

@ihudedi unfortunately I'm also unable to reproduce this behaviour uploading & downloading a JAR If your customer can share the client-request-id response header value of an irregular upload, we can look at API logs:

You can add a HeaderInspectionOption to retrieve the headers using the sample here

Ndiritu avatar Jan 15 '25 13:01 Ndiritu

Hi @Ndiritu Could you please share the code to display the request ID Thanks, Itay

ihudedi avatar Jan 15 '25 14:01 ihudedi

Hi @Ndiritu The client-request-id is [0f37091d-ff8e-4f02-8d76-7b4e752f9cc9] Please check why the file is being corrupted in the server Thanks, Itay

ihudedi avatar Jan 19 '25 07:01 ihudedi

Hi @Ndiritu Any updates? Thanks, Itay

ihudedi avatar Jan 20 '25 21:01 ihudedi

Hi @Ndiritu Have you checked the logs with the above request ID? Thanks Itay.

ihudedi avatar Jan 22 '25 12:01 ihudedi

@ihudedi please share the request-id and correlation-id headers if available as well and I'll escalate these to the API service team.

Ndiritu avatar Jan 23 '25 09:01 Ndiritu

Hi @Ndiritu I already sent the client-request-id is [0f37091d-ff8e-4f02-8d76-7b4e752f9cc9] In addition I reproduced the issue. When uploading file from local system everything is OK ( FileInputStream) but when I transfer file from remote system ( via Sharepoint/SFTP/FTP) For shrepoint I use InputStream inputStream = graphServiceClient.drives().byDriveId(driveId).items().byDriveItemId(driveitemId).content().get(); The input stream that I am getting and passing to the LargeFileUploadTask will be corrupted in the server. I tried to upload text file,jar file ,dat file and all corrupted. When I upload from remote system using SFTP/FTP to Azure it's working fine. Seems like when upload the stream some bytes are being corrupted. Thanks, Itay

ihudedi avatar Jan 23 '25 10:01 ihudedi

Hi @Ndiritu @baywet I found the bug in the code that cause to file being corrupted if the InputStream is not from type (FileInputStream,ByteArrayInputStream) In LargeFileUploadTask::chunkInputStream this method read length of bytes. private byte[] chunkInputStream(InputStream stream, int length) throws IOException { byte[] buffer = new byte[length]; int lengthAssert = stream.read(buffer); assert lengthAssert == length; return buffer; } When the stream can't read all buffer length the buffer that returned has NULL values. It's hapenning when the source is from remote system like (sftp/ftp/also from sharepoint) and the the source has maximum size he can fetch.

I modified this method that will work for me private byte[] chunkInputStream(InputStream stream, int length) throws IOException { byte[] buffer = new byte[length]; int offset=0; int bytesRead; while (offset < length && (bytesRead = stream.read(buffer,offset,length-offset)) != -1) { offset += bytesRead; } assert offset == length; return buffer; }

Thanks, Itay

ihudedi avatar Jan 27 '25 06:01 ihudedi

Hi @Ndiritu Are you planning to fix this issue? Thanks Itay

ihudedi avatar Feb 02 '25 12:02 ihudedi

Hi @Ndiritu Any updates regarding this issue? Thanks, Itay

ihudedi avatar Feb 10 '25 09:02 ihudedi

Hi @Ndiritu Any updates regarding this issue? Thanks, Itay

ihudedi avatar Feb 16 '25 06:02 ihudedi

Hi @ihudedi, my apologies for the delayed response. I have slightly limited capacity on this project at the moment. I will get this bumped up the team's priority list and hopefully get this resolved soon, thanks.

Ndiritu avatar Feb 16 '25 15:02 Ndiritu

Hi @Ndiritu Any update regarding this issues ? eagerly waiting for this fix to give an upgrade :)

Thanks Anbu

anburabo avatar Mar 20 '25 10:03 anburabo

Hi @Ndiritu Any updates? Thanks, Itay

ihudedi avatar Jun 22 '25 06:06 ihudedi