commonmark-java icon indicating copy to clipboard operation
commonmark-java copied to clipboard

Support for footnotes

Open c0d1f1ed opened this issue 3 years ago • 14 comments

Footnotes are arguably an important markup feature but doesn't appear to be supported by CommonMark yet. GFM supports them and we're seeing an increasing number of uses e.g. in Chromium and Android but these don't render correctly in Google Code Search nor gitiles/Gerrit.

(Google bug b/255316523, contact hanwen@ for integration when fixed)

c0d1f1ed avatar Oct 24 '22 18:10 c0d1f1ed

Hey! Hmm, interesting. I see footnotes explained in GitHub's docs but not in their spec. But yeah, would be good to support that.

The thing that will be a bit tricky is that they overlap with link references, so e.g. this:

Text[^1]

[^1]: https://example.com

Is currently valid Markdown and renders like this (try it out on dingus): Text^1

But e.g. this is not a valid link reference definition, and it results in the second line to be rendered as plain text:

Text[^1]

[^1]: https://example.com test

Looks like GitHub also allows footnotes and link reference definitions to be mixed, as long as the links come first. In other words, this works:

Text[^1] [foo]

[foo]: https://example.com/foo
[^1]: https://example.com/1 test

But this turns the [foo]: https://example.com/foo into the second line of the footnote text :

Text[^1] [foo]

[^1]: https://example.com/1 test
[foo]: https://example.com/foo

Some other interesting cases:

[^1][]

[^1]: /footnote

This is a footnote, followed by [] as literal text. In CM, it's <a href="/footnote">^1</a>.

But this is parsed as a full reference link instead:

[^1][foo]

[^1]: /footnote

[foo]: /url

So I think what's happening is that [^1]: is parsed as a footnote definition, and thus not as a link reference definition. In the second case though, foo exists as a reference and so it takes precedence. (Without [foo]: /url, it's a footnote followed by literal [foo] text.)

robinst avatar Oct 27 '22 00:10 robinst

I'm actually looking to try and implement this functionality but my attempts so far have failed. Do you have pointers as to where I can start? Is there an extension you could recommend that I follow to get things started? I was using commonmark-ext-ins as a basis but I think it's getting messed up at org/commonmark/internal/InlineParserImpl.java:156 in the switch statement. Basically my test case is failing.

Here's the new files I've added for the extension and let me know if I should commit as a feature branch for easier collaboration:

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	new file:   commonmark-ext-footnotes/pom.xml
	new file:   commonmark-ext-footnotes/src/main/java/org/commonmark/ext/footnote/Footnote.java
	new file:   commonmark-ext-footnotes/src/main/java/org/commonmark/ext/footnote/FootnoteExtension.java
	new file:   commonmark-ext-footnotes/src/main/java/org/commonmark/ext/footnote/internal/FootnoteDelimiterProcessor.java
	new file:   commonmark-ext-footnotes/src/main/java/org/commonmark/ext/footnote/internal/FootnoteHtmlNodeRenderer.java
	new file:   commonmark-ext-footnotes/src/main/java/org/commonmark/ext/footnote/internal/FootnoteNodeRenderer.java
	new file:   commonmark-ext-footnotes/src/main/java/org/commonmark/ext/footnote/internal/FootnoteTextContentNodeRenderer.java
	new file:   commonmark-ext-footnotes/src/test/java/org/commonmark/ext/footnote/FootnotesTest.java

motopascyyy avatar Feb 21 '23 22:02 motopascyyy

I am looking into adding footnote support as well. If links referenced as [1] were implemented as an extension, I would borrow it and made it recognize ^ at the start and do something different for my project. Any help how to implement footnotes would be super helpful. 🙇‍♂️

MykolaGolubyev avatar Sep 28 '23 11:09 MykolaGolubyev

@robinst @motopascyyy did you have any luck figuring it out?

MykolaGolubyev avatar Nov 16 '23 13:11 MykolaGolubyev

No, I ended use a different library for my project as this was distracting me too much. At some point in the future I’ll get back to it and issue a PR but right now it’s on hold.

motopascyyy avatar Nov 16 '23 15:11 motopascyyy

That would be awesome. I am too deep into common mark to change. do you have any files to share via attach to help me write a custom extension? I am at the verge of doing a regexp crime

MykolaGolubyev avatar Nov 16 '23 16:11 MykolaGolubyev

I'd be also pretty interested in footnotes, trying to see what is missing from the current extensibility. I guess parsing the footnote body as custom blocks starting with [^label]: wouldn't be a problem, but parsing the inline footnote reference would need #263 and the possibility to validate the footnote reference against a registry of existing labels from the block-level pass (similar to LinkReferenceDefinitions). Maybe a generic way to attach metadata to the document root visible from the inline parsing context? (might be a similar issue to #285).

Would allowing custom delimiter processors to use reserved characters like brackets a step forward?

zampino avatar Apr 05 '24 11:04 zampino

Alright, you made me curious and I've started looking into this :).

Apart from reverse-engineering how GFM's footnotes work, we can also look at the source code of cmark-gfm. Here's some interesting bits:

  • The nodes are called CMARK_NODE_FOOTNOTE_DEFINITION and CMARK_NODE_FOOTNOTE_REFERENCE
  • The scanner for valid footnote labels ('[^' ([^\] \r\n\x00\t]+) ']:' [ \t]*): https://github.com/github/cmark-gfm/blob/587a12bb54d95ac37241377e6ddc93ea0e45439b/src/scanners.re#L362
  • Where the definition is parsed (not sure where continuation lines happen there): https://github.com/github/cmark-gfm/blob/587a12bb54d95ac37241377e6ddc93ea0e45439b/src/blocks.c#L1219
  • Where the reference is created (after failing to parse it as a link): https://github.com/github/cmark-gfm/blob/587a12bb54d95ac37241377e6ddc93ea0e45439b/src/inlines.c#L1238
  • process_footnotes which is done at the end of parsing: https://github.com/github/cmark-gfm/blob/c123e68e81725d59f30d5a9bee719125538a6c77/src/blocks.c#L465

Note that it looks like something like [^1] in the text is always parsed as a footnote, and only in process_footnotes there's a check whether it's in the definition map or not. If not, it is replaced by a text node then. I'm not sure why it's not done the same way as link reference definitions, where references are resolved during inline parsing. (Maybe @kivikakk knows :).)

robinst avatar Apr 06 '24 10:04 robinst

Unfortunately my memory doesn't go back that far! 🤍

Footnotes are typically defined after their references, so we can't decide if it's a valid reference or not before we've finished reading the entire document. If we don't parse them eagerly, there's a chance some other part of the parser might decide (some part of) the reference should instead be parsed as (part of) something else, but that's likely never correct.

So I think it's a sound way to do things, generally, but I could well be wrong :) I don't even recognise that code as mine any more.

kivikakk avatar Apr 06 '24 10:04 kivikakk

! I only just noticed the part of your comment about link reference definitions. I couldn't tell you why, alas.

kivikakk avatar Apr 06 '24 10:04 kivikakk

Heh, thanks for chiming in :). Yeah for reference links, the definitions are all parsed as part of block parsing, which is the first pass of parsing (before any inline parsing is done). Then during inline parsing, we have all the definitions and can look them up directly.

robinst avatar Apr 06 '24 10:04 robinst

Branch here, with block parsing of footnote definitions (that part is straightforward): https://github.com/commonmark/commonmark-java/compare/footnotes-extension?expand=1

robinst avatar Apr 06 '24 11:04 robinst

@robinst thanks for looking into this!

why it's not done the same way as link reference definitions, where references are resolved during inline parsing

I'd also expect a procedure similar to link/definitions parsing

zampino avatar Apr 08 '24 08:04 zampino

PR is ready now:

  • https://github.com/commonmark/commonmark-java/pull/332

I've also found some interesting edge cases that GitHub doesn't handle well, see https://github.com/commonmark/commonmark-java/pull/332#issuecomment-2212453622 :)

robinst avatar Jul 07 '24 13:07 robinst