Enhancement: better element ID's
Is your feature request related to a problem? Please describe.
Currently, element_id's are simply a hash of the element's text. This is not great, since id's may then be duplicated within a page or document.
Proposal
Deterministic element ID's should be hash of (text, page_num, seq_no_in_page). Then, element_id's would be unique (with extremely high probability) within a document. If processing pages in parallel, element_id's should be consistent as if they were processed in serial (how they are currently processed) instead.
This implies that metadata_page_number_begin must also be an optional parameter for partition(), and, the API.
Other considerations
Hashing with other metadata is potentially fair game, to attempt to keep ID's distinct between documents. Determinism is a must, however.
Initially, this implementation would not effect the partition parameter: unique_element_ids=True.
I have an issue with this as well, as parent ids are wrongly set because of this