Update PDFStreamEngine.java
No need to allocate a new ArrayList here, reduce text extraction time from 16 seconds to 14 seconds on a 4.2M pdf.
This is a read only mirror. Please close this and open an issue in JIRA. https://issues.apache.org/jira/browse/PDFBOX
Of course every speed increase is welcome, but this change is one to be discussed with "the rest of the gang" - what is if one of the processOperator methods keeps the argument list? If not now, maybe at a later time? Your change would pull it under the feet.
@THausherr What do you mean by keep the argument list ? I assume you mean someone want to keep the elements in arguments inside processOperator, well, in that case, the clear method only remove elements out of arguments, not destroy them, so if some one keeps reference of the elements, it will still works.
Any progress on this? The users of the passed array must make a copy of the arguments array.
No progress, this is a read only mirror. I told to create an issue in JIRA. I won't create it myself because I'm not persuaded by this. If "The users of the passed array must make a copy of the arguments array." then where would be the speed gain?
I should have written: The users of the passed array, which have to keep a list of the arguments, must make a copy of the arguments array. However I agree, this kind of optimalization must be investigated further, so that there is no unexpected side-effects.
I've created https://github.com/apache/pdfbox/pull/38 which investigates whether the ArrayList is in use after the call to processor. First impression is that this is not the case, and that the optimalization is possible.