Improve memory efficiency of base64-encoded byte array decoding operations
Currently, decoding base64-encoded byte array with the Java API client is very inefficient in terms of memory usage.
Reading big attachments from the GMail API in memory-constrained environments like App Engine is quite a challenge because of that. https://developers.google.com/gmail/api/v1/reference/users/messages/attachments/get
On a 64 bit Java 7 VM, it required at least 115 MB of heap (-Xmx115M) to read a 12Mb attachment from the Gmail API (which is a 17MB base64 string in the response from the API).
The code to test it is dead simple (MESSAGE_ID / ATTACHMENT_ID references a big attachment):
gmail.users().messages().attachments().get("me", MESSAGE_ID, ATTACHMENT_ID).execute()
Here is the stacktrace when trying to load the attachment with only 110MB or heap:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.
Obtaining a stream from the data field in the response would help keep the memory usage lower.
Hi, Any update or workaround about that important concern? Ideally, providing similar API than for downloading file as stream from drive would be a good solution. Thanks
Use a manual parsing code:
Gmail.Users.Messages.Attachments.Get get = api.getGmail().users().messages()
.attachments().get(ME, gid, attachId);
// Very important!!
get.setPrettyPrint(false).setFields("data");
InputStream is = get.executeAsInputStream();
is = InputStreamJsonField.getBase64DataStream(is, "data");
And InputStreamJsonField is just a custom InputStream wrapper that ignores the first JSON chars and delegates to org.apache.commons.codec.binary.Base64InputStream.Base64InputStream(InputStream)
/**
* Very simple utility to stream data from a single field JSON,
* now only used for Gmail Api where we can get a JSON like this:
* {"FIELD":"LARGE_DATA"}
* In a single line without spaces, then we can easily extract the data
*/
public class InputStreamJsonField extends InputStream {
private String field;
private String expectedPrefix;
private InputStream is;
public static InputStream getBase64DataStream(InputStream is, String field) {
return new Base64InputStream(new InputStreamJsonField(is, field));
}
public InputStreamJsonField(InputStream is, String field) {
this.field = field;
this.is = is;
}
@Override
public int read() throws IOException {
if (expectedPrefix == null) {
// i.e: {"FIELD":"DATA_RETURNED"}
expectedPrefix = "{\"" + field + "\":\"";
byte[] buff = new byte[expectedPrefix.length()];
String prefix;
int pos = 0;
do {
int c = is.read();
if (c == -1) return -1;
if (c == ' ' || c == '\t' || c == '\n' || c == '\r') {
// swallow all blanks
} else {
buff[pos++] = (byte) c;
}
prefix = new String(buff, 0, pos);
if (!expectedPrefix.startsWith(prefix)) {
// error
break;
}
} while (!expectedPrefix.equals(prefix));
if (!prefix.equals(expectedPrefix)) {
throw new IllegalStateException(prefix + " != " + expectedPrefix);
}
return is.read();
}
int c = is.read();
if ('"' == c) {
// read the }
c = is.read();
// read -1 EOF
c = is.read();
}
return c;
}
@Override
public void close() throws IOException {
is.close();
}
}
Any efficient fixes or workaround for this issue?
It may be useful for someone. Found another way to parse large data without base64-decoding: https://github.com/googleapis/google-api-java-client-services/issues/7717#issuecomment-821024597