crystal icon indicating copy to clipboard operation
crystal copied to clipboard

Trim anchors

Open Blacksmoke16 opened this issue 2 years ago • 24 comments

Currently when you permalink to a specific method within a type, the anchor includes the FQN of the type itself. Such that you end up with really long redundant URLs. E.g. https://athenaframework.org/Framework/Controller/ValueResolvers/RequestAttribute/#Athena::Framework::Controller::ValueResolvers::RequestAttribute#initialize.

Given the anchors are already isolated to the page itself, would it not be enough to just sub out the type name from the id of the method (and related types)? E.g. something like:

id=def.abs_id | replace("%s#" % obj.abs_id, '') | replace("%s." % obj.abs_id, '')

Where I updated the method to be referenced as def such that the type can still be accessed via obj.

Blacksmoke16 avatar Feb 18 '24 16:02 Blacksmoke16

Given the anchors are already isolated to the page itself

They are not, though.. just that every user happens to use it this way

oprypin avatar Feb 18 '24 16:02 oprypin

Relevant ongoing work:

  • https://github.com/mkdocstrings/autorefs/pull/39

oprypin avatar Feb 18 '24 16:02 oprypin

Yep, with the changes in the mentioned PR, it could be possible to change the anchor of objects while retaining the complete identifier for cross-refs.

pawamoy avatar Feb 18 '24 16:02 pawamoy

Gotcha, I guess I shall keep an eye on future releases then! Thanks!

Blacksmoke16 avatar Feb 18 '24 16:02 Blacksmoke16

id=def.abs_id | replace("%s#" % obj.abs_id, '') | replace("%s." % obj.abs_id, '')

Hm so this is nice and all, but that makes it impossible to distinguish class methods and instance methods.

This would be some solution to distinguish them, although still looking rather weird:

  • Foo/##bar vs
  • Foo/#.bar

Because I think you're suggesting basically

  • Foo/#bar vs
  • Foo/#bar

oprypin avatar Feb 27 '24 20:02 oprypin

Easiest fix would be just do what the stdlib doc generator docs and do something like:

id=def.abs_id | replace("%s#" % obj.abs_id, 'instance-') | replace("%s." % obj.abs_id, 'class-')

Then you'd end up with Foo/#instance-bar and Foo/#class-bar which is probably good enough?

EDIT: Or add it at the end 🤷. Either or.

Blacksmoke16 avatar Feb 27 '24 20:02 Blacksmoke16

@pawamoy Oof this one may be harder than predicted

If I have a heading with this id: Foo::Bar--usage-notes

Then I want this to remain the global identifier, but the actual local id to become: usage-notes

So I need to suppress IdPrependingTreeprocessor but still somehow not suppress it for the purpose of making this the global identifier for autorefs.

oprypin avatar Feb 27 '24 21:02 oprypin

I'm not sure to understand. IIUC, this is what we currently have:

  • Foo::Bar docstring
Docstring summary.

## Usage notes

[Link to usage notes](#usage-notes).
  • mkdocstrings will prefix ids and hrefs (as well as shift heading levels), so this becomes
<p>Docstring summary.</p>

<h4 id="Foo::Bar--usage-notes">Usage notes</h4>

<p><a href="#Foo::Bar--usage-notes">Link to usage notes</a></p>
  • and then autorefs correctly registers Foo::Bar--usage-notes as identifier

pawamoy avatar Feb 27 '24 22:02 pawamoy

Right but the request is to have it become

<h4 id="usage-notes">Usage notes</h4>

but the need to have a globally linkable identifier still remains

oprypin avatar Feb 27 '24 22:02 oprypin

I'm not sure a subheading (a heading in a docstring) can ever retain just its slug as id (usage-notes), because the shortest id we can give to its parent (the object using this docstring, Foo::Bar) is the last component of the object FQN (Bar), so the shortest id for a subheading would at least be prefixed with that name (Bar--usage-notes).

Unless we're documenting Foo::Bar on a page of its own? :thinking: Then there's no heading for the object itself, and its docstring headings stay the same. Hmmm.

pawamoy avatar Feb 27 '24 22:02 pawamoy

The parent's id is not so important, sure it needs to be at least Bar. But all its children don't need to contain even that.

The point that this issue is trying to make is that this is redundant:

/Foo/Bar/#Foo::Bar--usage-notes (subheading)

/Foo/Bar/#Foo::Bar.some_method (child)

oprypin avatar Feb 27 '24 22:02 oprypin

Yes the point is about the usage where the item is documented on its own on the page. As much as we say that it's not necessarily the case, I'm guessing 95%+ of usages are like that

oprypin avatar Feb 27 '24 22:02 oprypin

Yeah, right. Then we must add aliases with FQN above headings, so that autorefs registers those and redirects to the headings with short ids. So I think the two mkdocstrings filters do_heading and do_convert_markdown must take an additional argument for the short name (in addition to the FQN). And this short name can probably be inferred from the current page's path (or somehow computed given some additional user configuration).

pawamoy avatar Feb 27 '24 22:02 pawamoy

There's also the really nasty case

/Foo/Bar/#Foo::Bar--usage-notes wants to become #usage-notes but /Foo/Bar/#Foo::Bar.some_method--usage-notes probably wants to become #some_method--usage-notes 😵‍💫 while still retaining the differently prefixed global identifier

oprypin avatar Feb 27 '24 22:02 oprypin

Alternatively we could say that people who are so certain about the page paths should just forfeit the autorefs syntax and use normal links (absolute links will work properly in the next mkdocs release)

oprypin avatar Feb 27 '24 22:02 oprypin

Hmmm, I think it's feasible. If we pass both the FQN and the current path (left part of the FQN down to where we are in terms of page tree): FQN=Foo::Bar, CurrentQN=Foo::Bar, then we can compute the short names relative to the current qualified name, i.e. Foo::Bar.some_method--usage-notes gives some_method--usage-notes. It seems to become dependent on the language though.

<p>Docstring summary.</p>

<!-- Foo::Bar - Foo::Bar = "" -->
<a id="Foo::Bar--usages-notes"></a>
<h4 id="usage-notes">Usage notes</h4>

<p><a href="#usage-notes">Link to usage notes</a></p>

<!-- Foo::Bar.some_method - Foo::Bar = some_method -->
<a id="Foo::Bar.some_method"></a>
<h4 id="some_method">some method</h4>

<p>Some method summary.</p>

<a id="Foo::Bar.some_method--usages-notes"></a>
<h4 id="some_method--usage-notes">Usage notes</h4>

<p><a href="#some_methhod--usage-notes">Link to usage notes</a></p>

pawamoy avatar Feb 27 '24 22:02 pawamoy

Aha. It's definitely doable somehow and I was almost able to finish coding all this, but it just becomes super complicated

oprypin avatar Feb 27 '24 22:02 oprypin

That's what I meant by "a bit more work" in the autorefs PR hahah

pawamoy avatar Feb 27 '24 22:02 pawamoy

I don't think it's that complicated. Computing names relative to other names seems however to depend on the language (different separators), so this would need some kind of hook between mkdocstrings and the handlers... and yeah that part complicates things.

  • Foo::Bar.some_method relative to Foo: Bar.some_method
  • Foo::Bar.some_method relative to Foo::Bar: some_method

pawamoy avatar Feb 27 '24 23:02 pawamoy

Alternatively we could say that people who are so certain about the page paths should just forfeit the autorefs syntax and use normal links

I will say I'd be okay with this. For my specific use case I think I'd have to do this anyway to link something from one component to another, unless something is done to integrate https://squidfunk.github.io/mkdocs-material/plugins/projects/.

Blacksmoke16 avatar Feb 27 '24 23:02 Blacksmoke16

this would need some kind of hook between mkdocstrings and the handlers

Actually if handlers can expose a path components on objects (string tuple), no need for a hook:

  • Bar exposes its path components ("Foo", "Bar")
  • some_method exposes its path components ("Foo", "Bar", "some_method")

If we can infer we're down to ("Foo", "Bar") (for example because current page ends with Foo/Bar), then we can compute the relative path of Foo::Bar.some_method to be some_method.

This would be language-agnostic.

pawamoy avatar Feb 27 '24 23:02 pawamoy

I think the main difficulty is still here

https://github.com/mkdocstrings/autorefs/blob/95d32b492bf038fff6974addc2d572582a37dadd/src/mkdocs_autorefs/plugin.py#L187

We are still relying on "autorefs" to scan the headings, and it is hardcoded to look at id of headings, which can't easily be changed because this info is obtained from toc_tokens.

oprypin avatar Feb 28 '24 17:02 oprypin

Argh, my brain is exploding.

IIUC, headings in docstrings are solved (see previous comments, and thanks to the fact that the autorefs extension also runs when converting docstrings from markdown to html). The remaining issue is for headings generated by mkdocstrings itself with its do_heading Jinja filter, because we pass these headings through toc to retrieve them in on_page_content, calling map_urls on them. If we were to add anchors above headings in our Jinja templates, they would simply be ignored because the generated HTML is stashed and autorefs won't operate on it.

So, one solution would be to somehow call autorefs' register methods directly from our do_heading filter? :thinking:

pawamoy avatar Feb 29 '24 15:02 pawamoy

We are still relying on "autorefs" to scan the headings, and it is hardcoded to look at id of headings, which can't easily be changed because this info is obtained from toc_tokens.

After taking a good, serious look at everything again, it turns out that the on_page_content hook of the autorefs plugin, where toc items are iterated on to register anchors (same link as above), is only meant for headings written directly in Markdown pages, and not for headings generated by the do_heading filter, nor for subheadings appearing in docstrings.

These latter headings (API object headings and their subheadings in docstrings) are registered directly from within our mkdocstrings outer extension, in AutoDocProcessor.run. We don't explicitly pass them through toc, so it should be easy to add the necessary info to make this work. For subheadings in docstrings, they do pass through toc to gain an id, then our id prepending processor prepends the current object id to these ids. The prefix is the HTML id we pass to convert_markdown.

Note that autorefs in full-mode (listed in mkdocs.yml plugins) will see these headings (object headings and their docstring subheadings) and register them again. This plays a role in what follows: if we implement custom (trimmed) anchors for object headings and their subheadings by changing their HTML id and providing "aliases", autorefs in full-mode will register the aliases too, which we don't want.

Three examples.

Example 1, with root object heading:

  • mkdocstrings somehow registers Foo::Bar.baz -> /Foo/Bar/#baz
  • autorefs (full-mode) registers baz -> /Foo/Bar/#baz

This will cause issues if there are other non-API-object baz headings throughout the site.

Example 2, with root object sub-heading:

  • mkdocstrings somehow registers Foo::Bar--qux -> /Foo/Bar/#qux
  • autorefs (full-mode) registers qux -> /Foo/Bar/#qux

This will cause issues too if there are other non-API-object qux headings throughout the site. This one is more problematic because text headings are more likely to be reused.

Example 3, with sub-object sub-heading:

  • mkdocstrings somehow registers Foo::Bar.baz--qux -> /Foo/Bar/#baz--qux
  • autorefs (full-mode) registers baz--qux -> /Foo/Bar/#baz--qux

This one is less problematic because users are not supposed to try and cross-reference something using the short baz--qux version, and equivalent plain text headings # baz--qux would have their id slugified as baz-qux (single dash) anyway. In this case, autorefs would simply register anchors that are never used or usable (simply put, harmless noise in the URL map).

Conclusion: we have to find a way to prevent autorefs from registering short anchors again.


For the implementation itself of trimmed anchors:

  • We have to find a way to pass the page object (or just its URL) down to the mkdocstrings AutoDocProcessor.
  • From there, the processor will pass the page to the Jinja context used to render objects.
  • Thanks to it, the HTML id can be computed (in templates) relative to the current page (given the objects collected by the handler can provide their "path components").
  • The do_heading and do_convert_markdown filters should be changed to accept both the full id and the trimmed id.
  • Both filters should then record headings with both full and trimmed ids, so that AutoDocProcessor.run can later register those correctly against autorefs.
  • As noted above, find a way to prevent autorefs in full-mode from registering again with the short ids.

pawamoy avatar May 25 '24 14:05 pawamoy