When uploading file using LargeFileUploadTask (streaming and not in one part) the file getting corrupted in the SharePoint server
Describe the bug
I am using latest version of graph api - 6.24.0 When I uploading any file using graphServiceClient.drives().byDriveId(driveId).items().byDriveItemId(getDriveItemIdFromRootPath(filePath)).content().put(fileContent) file is uploading correct.
When I upload via the LargeUploadTask - file is corrupted for any file such as MSOffice file or dat file or jar file. For MSOffice the file can't be opened and for the dat file null values are being inserted to the file. The code that I am using for the uploading in streaming is: LargeFileUploadTask<DriveItem> largeFileUploadTask = null; IProgressCallback callback = null; UploadResult<DriveItem> uploadResult = null; try { int spoPartSize = 5242880;
// Set body of the upload session request
// This is used to populate the request to create an upload session
DriveItemUploadableProperties driveItemUploadableProperties = new DriveItemUploadableProperties();
driveItemUploadableProperties.getAdditionalData().put("@microsoft.graph.conflictBehavior", "replace");
// Finish setting up the request body
CreateUploadSessionPostRequestBody uploadSessionPostRequestBody = new CreateUploadSessionPostRequestBody();
uploadSessionPostRequestBody.setItem(driveItemUploadableProperties);
// Create the upload session
UploadSession uploadSession = graphServiceClient.drives()
.byDriveId(driveId)
.items()
.byDriveItemId(getDriveItemIdFromRootPath(filePath))
.createUploadSession().post(uploadSessionPostRequestBody);
if (null == uploadSession) {
throw new SPOException("SPOWrapper::uploadObjectInParallel: Could not create upload session");
}
// Create the large file upload task
largeFileUploadTask = new LargeFileUploadTask(graphServiceClient.getRequestAdapter(),
uploadSession,
fileContent,
fileSize,
spoPartSize,
DriveItem::createFromDiscriminatorValue);
if (null == largeFileUploadTask) {
throw new SPOException("SPOWrapper::uploadObjectInParallel: Could not create upload task");
}
// Create a callback used by the upload provider
callback = new SPOProgressCallback(spoFileCopy);
// Do the upload
uploadResult = largeFileUploadTask.upload(spoRequestMaxAttempts, callback);
if (uploadResult == null || !uploadResult.isUploadSuccessful()) {
throw new SPOException(String.format("SPOWrapper::uploadObjectInParallel: filePath= '%s' wasn't uploaded successfully via upload method", originalFilePath));
} else {
logger.debug(String.format("SPOWrapper::uploadObjectInParallel: filePath= '%s' uploaded successfully via upload method", originalFilePath));
}
} catch (Exception e) {
boolean spoIgnoreFailureWhenUploadingFileInParts = Boolean.valueOf(hostProperties.getParameterValueFromAdditionalParamsOrFromAftConfig(mftPropertiesConfig, PropertyData.spoIgnoreFailureWhenUploadingFileInParts, AdditionalParametersConsts.spoIgnoreFailureWhenUploadingFileInParts));
logger.error(String.format("Error in SPOWrapper::uploadObjectInParallel, filePath = '%s', fileSize = '%s' : %s", originalFilePath, fileSize, e.getMessage()), e);
/* try {
if (uploadTask != null && callback != null) {
uploadResult = uploadTask.resume(spoRequestMaxAttempts, callback);
if (uploadResult == null || !uploadResult.isUploadSuccessful()) {
throw new SPOException(String.format("SPOWrapper::uploadObjectInParallel: filePath= '%s' wasn't uploaded successfully via resume method", originalFilePath));
} else {
logger.debug(String.format("SPOWrapper::uploadObjectInParallel: filePath= '%s' uploaded successfully via resume method", originalFilePath));
return;
}
}
} catch (Exception ex) {
logger.error(String.format("Error in SPOWrapper::uploadObjectInParallel, filePath = '%s', fileSize = '%s' : %s", originalFilePath, fileSize, e.getMessage()), e);
throw e;
} finally {
logger.debug(String.format("End SPOWrapper::uploadObjectInParallel: filePath= '%s', fileSize= %s", originalFilePath, fileSize));
}
Thanks, Itay
Expected behavior
File should uploaded corrdct with same bytes and noit being corrupted.
How to reproduce
Uploading file using LargeFileUploadTask
SDK Version
6.24/0
Latest version known to work for scenario above?
No response
Known Workarounds
No response
Debug output
Click to expand log
```</details>
### Configuration
_No response_
### Other information
_No response_
Thanks for reporting this @ihudedi.
Would you mind sharing the result of some checks here? This will help with debugging.
This will validate that the number of bytes uploaded is equal to the number of bytes in the file & just to validate contents are similar using a checksum?
// Calculate checksum before upload
String uploadFileChecksum = DigestUtils.md5Hex(fileContent);
// You might need to rewind the file content stream/open a new stream with the same file path here
// we'll validate the checksum of the local file matches the one of the uploaded file after the large file upload
// Create your upload session & execute large file upload task
// ...
// After getting uploadResult...
// Use the drive item ID in the uploadResult to fetch the uploaded file
DriveItem uploadedFile = graphServiceClient.drives()
.byDriveId(driveId)
.items()
.byDriveItemId(uploadResult.itemResponse.getId())
.get();
assertEquals(fileSize, uploadedFile.getSize());
InputStream downloadedFile = graphServiceClient.drives()
.byDriveId(driveId)
.items()
.byDriveItemId(uploadResult.itemResponse.getId())
.content()
.get();
assertEquals(uploadFileChecksum, DigestUtils.md5Hex(downloadedFile));
Hi @Ndiritu This code is used by our customers and some customers complains the same issue. In my env I can't reproduce it but I am trying to analyze the file they are trying to upload and the OS type. Thanks, Itay
@ihudedi thanks for letting me know. I will need to create some test files and validate this. My tests work for .txt, csv and .xslx files.
Hi @Ndiritu For ms file the server change the size.this is how it happening when uploading excel file.file get different size but readable. When I upload execl file also from web the size is getting different. Destination file size vs. source file size validation failed: source file size = 21641457, destination file size = 21647213 The file is not corrupted but the server always modify the file. Also when I upload from sharepoint site. Thanks, Itay
Thanks for clarifying @ihudedi. So the issue is on the binary file formats (.dat, .jar..) where the file is corrupted? I'll need this to log a ticket with the API team.
Hi @Ndiritu I got an example of the file that being transferred and corrupted after upload. I tried to reproduce on my env with no luck. How can it possible that this file is being corrupted in one env and not on mine. Both env are Linux machine. Attached the file before the upload and after the upload to SharePoint. If there any log I can see let me know or there is anything in the server log to display. files.zip
The zip file contains the file befroe the upload and after upload Thanks, Itay
Hi @Ndiritu Is there any progress? Who can analyze the server logs to check why the file is being corrupted? Thanks, Itay
Hi @Ndiritu Any updates?
Thanks, Itay
@ihudedi unfortunately I'm also unable to reproduce this behaviour uploading & downloading a JAR
If your customer can share the client-request-id response header value of an irregular upload, we can look at API logs:
You can add a HeaderInspectionOption to retrieve the headers using the sample here
Hi @Ndiritu Could you please share the code to display the request ID Thanks, Itay
Hi @Ndiritu The client-request-id is [0f37091d-ff8e-4f02-8d76-7b4e752f9cc9] Please check why the file is being corrupted in the server Thanks, Itay
Hi @Ndiritu Any updates? Thanks, Itay
Hi @Ndiritu Have you checked the logs with the above request ID? Thanks Itay.
@ihudedi please share the request-id and correlation-id headers if available as well and I'll escalate these to the API service team.
Hi @Ndiritu I already sent the client-request-id is [0f37091d-ff8e-4f02-8d76-7b4e752f9cc9] In addition I reproduced the issue. When uploading file from local system everything is OK ( FileInputStream) but when I transfer file from remote system ( via Sharepoint/SFTP/FTP) For shrepoint I use InputStream inputStream = graphServiceClient.drives().byDriveId(driveId).items().byDriveItemId(driveitemId).content().get(); The input stream that I am getting and passing to the LargeFileUploadTask will be corrupted in the server. I tried to upload text file,jar file ,dat file and all corrupted. When I upload from remote system using SFTP/FTP to Azure it's working fine. Seems like when upload the stream some bytes are being corrupted. Thanks, Itay
Hi @Ndiritu @baywet I found the bug in the code that cause to file being corrupted if the InputStream is not from type (FileInputStream,ByteArrayInputStream) In LargeFileUploadTask::chunkInputStream this method read length of bytes. private byte[] chunkInputStream(InputStream stream, int length) throws IOException { byte[] buffer = new byte[length]; int lengthAssert = stream.read(buffer); assert lengthAssert == length; return buffer; } When the stream can't read all buffer length the buffer that returned has NULL values. It's hapenning when the source is from remote system like (sftp/ftp/also from sharepoint) and the the source has maximum size he can fetch.
I modified this method that will work for me private byte[] chunkInputStream(InputStream stream, int length) throws IOException { byte[] buffer = new byte[length]; int offset=0; int bytesRead; while (offset < length && (bytesRead = stream.read(buffer,offset,length-offset)) != -1) { offset += bytesRead; } assert offset == length; return buffer; }
Thanks, Itay
Hi @Ndiritu Are you planning to fix this issue? Thanks Itay
Hi @Ndiritu Any updates regarding this issue? Thanks, Itay
Hi @Ndiritu Any updates regarding this issue? Thanks, Itay
Hi @ihudedi, my apologies for the delayed response. I have slightly limited capacity on this project at the moment. I will get this bumped up the team's priority list and hopefully get this resolved soon, thanks.
Hi @Ndiritu Any update regarding this issues ? eagerly waiting for this fix to give an upgrade :)
Thanks Anbu
Hi @Ndiritu Any updates? Thanks, Itay