google-api-java-client icon indicating copy to clipboard operation
google-api-java-client copied to clipboard

Improve memory efficiency of base64-encoded byte array decoding operations

Open clementdenis opened this issue 10 years ago • 4 comments

Currently, decoding base64-encoded byte array with the Java API client is very inefficient in terms of memory usage.

Reading big attachments from the GMail API in memory-constrained environments like App Engine is quite a challenge because of that. https://developers.google.com/gmail/api/v1/reference/users/messages/attachments/get

On a 64 bit Java 7 VM, it required at least 115 MB of heap (-Xmx115M) to read a 12Mb attachment from the Gmail API (which is a 17MB base64 string in the response from the API).

The code to test it is dead simple (MESSAGE_ID / ATTACHMENT_ID references a big attachment):

gmail.users().messages().attachments().get("me", MESSAGE_ID, ATTACHMENT_ID).execute()

Here is the stacktrace when trying to load the attachment with only 110MB or heap: Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:2694) at java.lang.String.(String.java:203) at java.lang.StringBuilder.toString(StringBuilder.java:405) at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:360) at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:277) at com.google.api.client.json.jackson2.JacksonParser.getText(JacksonParser.java:76) at com.google.api.client.json.JsonParser.parseValue(JsonParser.java:850) at com.google.api.client.json.JsonParser.parse(JsonParser.java:471) at com.google.api.client.json.JsonParser.parseValue(JsonParser.java:780) at com.google.api.client.json.JsonParser.parse(JsonParser.java:381) at com.google.api.client.json.JsonParser.parse(JsonParser.java:354) at com.google.api.client.json.JsonObjectParser.parseAndClose(JsonObjectParser.java:87) at com.google.api.client.json.JsonObjectParser.parseAndClose(JsonObjectParser.java:81) at com.google.api.client.http.HttpResponse.parseAs(HttpResponse.java:459) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469) at LoadEmail.main(LoadEmail.java:31)

Obtaining a stream from the data field in the response would help keep the memory usage lower.

clementdenis avatar Sep 03 '15 12:09 clementdenis

Hi, Any update or workaround about that important concern? Ideally, providing similar API than for downloading file as stream from drive would be a good solution. Thanks

ndupont avatar Jun 02 '17 09:06 ndupont

Use a manual parsing code:

Gmail.Users.Messages.Attachments.Get get = api.getGmail().users().messages()
                .attachments().get(ME, gid, attachId);

// Very important!!
get.setPrettyPrint(false).setFields("data");

InputStream is = get.executeAsInputStream();

is = InputStreamJsonField.getBase64DataStream(is, "data");

And InputStreamJsonField is just a custom InputStream wrapper that ignores the first JSON chars and delegates to org.apache.commons.codec.binary.Base64InputStream.Base64InputStream(InputStream)

/**
 * Very simple utility to stream data from a single field JSON,
 * now only used for Gmail Api where we can get a JSON like this:
 *   {"FIELD":"LARGE_DATA"}
 * In a single line without spaces, then we can easily extract the data
 */
public class InputStreamJsonField extends InputStream {
    private String field;
    private String expectedPrefix;
    private InputStream is;

    public static InputStream getBase64DataStream(InputStream is, String field) {
        return new Base64InputStream(new InputStreamJsonField(is, field));
    }

    public InputStreamJsonField(InputStream is, String field) {
        this.field = field;
        this.is = is;
    }


    @Override
    public int read() throws IOException {

        if (expectedPrefix == null) {
            // i.e: {"FIELD":"DATA_RETURNED"}
            expectedPrefix = "{\"" + field + "\":\"";

            byte[] buff = new byte[expectedPrefix.length()];

            String prefix;
            int pos = 0;
            do {
                int c = is.read();

                if (c == -1) return -1;

                if (c == ' ' || c == '\t' || c == '\n' || c == '\r') {
                    // swallow all blanks
                } else {
                    buff[pos++] = (byte) c;
                }

                prefix = new String(buff, 0, pos);
                if (!expectedPrefix.startsWith(prefix)) {
                    // error
                    break;
                }

            } while (!expectedPrefix.equals(prefix));

            if (!prefix.equals(expectedPrefix)) {
                throw new IllegalStateException(prefix + " != " + expectedPrefix);
            }

            return is.read();
        }

        int c = is.read();
        if ('"' == c) {
            // read the }
            c = is.read();
            // read -1 EOF
            c = is.read();
        }
        return c;
    }

    @Override
    public void close() throws IOException {
        is.close();
    }
}

qtxo avatar Jun 05 '19 10:06 qtxo

Any efficient fixes or workaround for this issue?

samrajcse avatar Sep 03 '19 16:09 samrajcse

It may be useful for someone. Found another way to parse large data without base64-decoding: https://github.com/googleapis/google-api-java-client-services/issues/7717#issuecomment-821024597

vheneraliuk avatar Apr 16 '21 08:04 vheneraliuk