docling icon indicating copy to clipboard operation
docling copied to clipboard

Feature request: Parsing citations and references in research papers

Open deutranium opened this issue 1 year ago • 4 comments

Requested feature

Parsing citations and references in research papers.

Alternatives

We could simply hardcode all the citation (examples) and bibliography styles (examples) using regex and identify them in the text itself. This can also be expanded to identify author information, publication venue, year of publication etc. However, this might lead to mismatches if the regex has not been defined in detail and has not been tested on a big enough dataset.

Uses

The feature would be particularly useful in developing tools for the scientific community - eg. mapping paper sections to references, assessing importance of referred papers with respect to the claims cited by them etc.

Example citations in text:

Image

Example bibliography:

Image

deutranium avatar Feb 06 '25 20:02 deutranium

+1

verakutsenko avatar Feb 07 '25 02:02 verakutsenko

@deutranium I think we could start with a simple example notebook on how to do this.

PeterStaar-IBM avatar Feb 07 '25 07:02 PeterStaar-IBM

Yup sure, I'll keep you posted!

deutranium avatar Feb 07 '25 08:02 deutranium

Any updates on this ?

Necmttn avatar May 26 '25 08:05 Necmttn