lychee icon indicating copy to clipboard operation
lychee copied to clipboard

Fix relative links

Open mre opened this issue 1 year ago • 12 comments

Fixes support for relative links.

This is not ready yet. Just putting it here so that I don't forget about it.

Fixes https://github.com/lycheeverse/lychee/issues/1574

mre avatar Feb 06 '25 13:02 mre

Note to self: https://github.com/danreeves/path-clean

mre avatar Feb 07 '25 11:02 mre

Fragment handling doesn't work yet, e.g.

[ERROR] file:///Users/foo/bar/file.md%23some-fragment | Cannot find file

but the Markdown file contains the fragment:

### Some Fragment

mre avatar Feb 11 '25 15:02 mre

Files with spaces don't work either:

[ERROR] file:///Users/foo/bar/My%2520Image.png | Cannot find file

mre avatar Feb 11 '25 15:02 mre

Test-cases:

  • https://github.com/DioxusLabs/dioxus/actions/runs/13970715169/job/39111494101?pr=3891

mre avatar Mar 20 '25 15:03 mre

@katrinafyi, this is an old branch where I attempted to resolve the confusion around base URL and root path. Needless to say, it didn't go too well. 😆

From the outside, it could sound like relative links have nothing to do with base URLs or root paths, but that's very much the case, actually.

My solution is super hacky and half-baked at the moment. If you like, could you take a look at my attempt and see if we can cherry-pick some changes into a clean branch which would get us closer to resolve some problems around path handling? Alternatively, let's ditch it and start anew. The new documentation from https://github.com/lycheeverse/lychee/pull/1787 might come in handy.

mre avatar Aug 28 '25 11:08 mre

@mre Hmm I see. I'm fairly new here so I'm not quite across what the issues are/were. It seems to be about relative links in the presence of base-url? I see that there is some confusion about the meaning of the arguments, so hopefully https://github.com/lycheeverse/lychee/pull/1787 does help reduce the confusion.

That said, in my time working with Lychee, I've formed the opinion that --base-url is inconsistent and best avoided. If you don't specify it, then relative links seem to "just work" as you'd expect, and root-relative links (beginning with /) require --root-dir to be specified, also as you'd expect. Adding --base-url only causes problems because Lychee (currently) tries to make everything relative to that base URL.

Re potential fixes, I can't quite tell what changes this PR tried to make (the changes are many and I'm unfamiliar). I don't know if you saw, but in https://github.com/lycheeverse/lychee/issues/1718#issuecomment-3170382677, I wrote about what I imagined as a new behaviour for --base-url and --root-dir, with the goal of being detailed enough to implement. You can check this and give feedback as to whether it lines up with your plans :)

Also in the same comment, under the But what about remaps? section, I talk about how you can get a sensible base-url-like behaviour with the current Lychee options. You can test this and see if it works. If it does, it might be good to promote this workaround in the short term - it seems lots of people are affected by this base-url confusion.

Finally, I'd be happy to help but I think this touches a lot of Lychee's internals which I'm not familiar with. Lychee has some kind of async pipeline of handlers to process the links? I assume that implementing https://github.com/lycheeverse/lychee/issues/1718#issuecomment-3170382677 would need to be done in a certain step of this pipeline. But I'm really not sure.

katrinafyi avatar Aug 28 '25 13:08 katrinafyi

True, I remember coming across your proposal in the comment you mentioned and it sounded sensical to me. We can go forward with your approach.

My understanding of base-url still stems from the early lychee days where I always assumed that it should behave similar to the HTML <base> tag: it would only be applicable to remote URLs and never the file system, which is where I mentally always drew the line between base-url and root-dir. But, of course, nowadays base-url is used very differently. We prepend the base-url to a relative path in the file system to build a remote URL and this is where most of the confusion comes from.

The modern interpretation of base-url (as documented in the new help messages) makes a lot more sense.

At the moment, users get a lot of InvalidBaseJoin warnings as shown here, which is not pretty. My idea was to try and fix that with this PR. I believe the fixtures in this PR could be helpful to document the behavior for the future. Also the tests from https://github.com/lycheeverse/lychee/pull/1624/files#diff-aa1ee21334d5352e4b009dccb9d6c9d58020bd48300963a9a621b4e73c8c7cb3.

Finally, I'd be happy to help but I think this touches a lot of Lychee's internals which I'm not familiar with. Lychee has some kind of async pipeline of handlers to process the links? I assume that implementing https://github.com/lycheeverse/lychee/issues/1718#issuecomment-3170382677 would need to be done in a certain step of this pipeline. But I'm really not sure.

Yes, that is correct.

This part is here. It's very ugly right now. It's where most of my failed attempts to find a coherent way to harmonize root-dir and base-url took place. This is unmergeable as-is, but could serve as a checklist/blueprint for future work.

The rest of the PR is just tests and moving stuff around, but the core logic is in that section linked above. Guess you could ignore all the async channel handling for now. No rush on my end to get these changes in, but comments/changes are welcome. Maybe we have to break this PR down into smaller pieces which we can merge.

mre avatar Aug 28 '25 14:08 mre

I've had some time to think about this and try some things. I thought I'd check in to make sure we're on the same page. What is the actual problem which we are trying to solve?

I had quick a look through open issues which mention relative links:

  • https://github.com/lycheeverse/lychee/issues/1265 is fixed by adding --root-dir. Also, I think that absolute links are no longer silently ignored; if you have absolute links without --root-dir, I think you do get an InvalidBaseJoin warning now. I think that this warning is correct, but maybe the message can be made more actionable.
  • https://github.com/lycheeverse/lychee/issues/671 is hard because of reasons you've noted. I think this will be solved with https://github.com/lycheeverse/lychee/pull/1842.
  • https://github.com/lycheeverse/lychee/issues/1574 is a lot of things, but the initial OP is an incorrect use of base by passing a relative path.

So, I am not sure which issues remain, and an implementation of my plan in https://github.com/lycheeverse/lychee/issues/1718#issuecomment-3170382677 is not really a direct fix to these issues. Rather, it is a fix to the (imo) surprising interaction of --base-url and --root-dir. This is what I want, but is it what you want? :-)

If it is, in my current code (not ready to see the light of day), I have it so that the behaviour should be unchanged if only one of --root-dir or --base-url is specified. The new behaviour only applies if both are given, and it applies the base URL by mapping subpaths of root-dir into subpaths of base-url. If you only specify --base-url, that will still apply indiscriminately and uniformly to all inputs.

I don't know if this is a reasonable way to re-interpret the command-line options, or whether new flags should be added.

I'm also running into lots of terrific issues edge case issues ;-;

Edit: I should add that I think base-url should be changed to require root-dir, and the current base-url behaviour without root-dir could be provided as --fallback-base-url.

Also, in terms of issues that need fixing, I think that https://lychee.cli.rs/recipes/base-url/ is incorrect and misleading. It should be changed to line up with the current behaviour, or it should be rewritten after base-url is changed.

Edit 2: my work can be previewed at https://github.com/rina-forks/lychee/pull/3. commentary is welcome.

katrinafyi avatar Sep 07 '25 03:09 katrinafyi

What is the actual problem which we are trying to solve?

I want to get rid of message like this:

[WARN] Error creating request: InvalidBaseJoin("...")

In my opinion, we should either fix that root cause or alternative set the logging level to trace.

So, I am not sure which issues remain, and an implementation of my plan in https://github.com/lycheeverse/lychee/issues/1718#issuecomment-3170382677 is not really a direct fix to these issues. Rather, it is a fix to the (imo) surprising interaction of --base-url and --root-dir. This is what I want, but is it what you want? :-)

To be honest, I'm not 100% sure. Probably if we have a clear distinction between --base-url and --root-dir, my issue will also go away or at least it will be easier to fix. In general, users should never have to deal with InvalidBaseJoin; either we can resolve the path or we can't. If we can't we ignore the link. But I want lychee to do the "right thing" in every scenario where the user specified --base-url, --root-dir or a combination of those. The current error message is very confusing even for me who implemented it. As you said, not very actionable.

I'm also running into lots of terrific issues edge case issues ;-;

Sorry to hear that. But I think it's also one of the hardest problems to crack right now in the entire codebase. So hang in there. 😅

Also, in terms of issues that need fixing, I think that lychee.cli.rs/recipes/base-url is incorrect and misleading. It should be changed to line up with the current behaviour, or it should be rewritten after base-url is changed.

Totally. It's not great. We should completely rewrite it and perhaps add a warning for now.

mre avatar Sep 09 '25 15:09 mre

I see. In that case, I can suggest:

  • (1) Require absolute domain for base-url. This should prevent InvalidBaseJoin when a base-url is specified. It happens when a user gives a base URL that is unsuitable for basing relative URLs.
  • (2) For the case where a user tries to check relative links without giving base-url or root-dir, this should fail and show a clear and actionable message. I think that, in these cases, the InvalidBaseJoin error should be promoted to a real error that's reported as a failed link check. This will allow displaying a suggestion message using the logic that exists for other errors.
  • (3) Also, I think that hiding the message and silently skipping relative URLs would be bad.

These changes would improve the user experience and suggestions would direct them to correct usage of the flags, reducing confusion.

However, even with these changes, I think that the behaviour when using root-dir and base-url together is unavoidably surprising. For doing the "right thing" in that case, that would be the space of https://github.com/lycheeverse/lychee/issues/1718#issuecomment-3170382677, which we will call (4).

katrinafyi avatar Sep 10 '25 11:09 katrinafyi

Thanks for the thoughtful comment. I'm wondering if in this case it makes sense to even continue with this PR or fix it for good using the ideas from the linked comment. I personally don't care too much. If you believe that this PR could be the basis for future work, I can try to brush it up. If not, I'll just go ahead and close it and we work on the proper fix. What do you think? 😀

mre avatar Sep 10 '25 20:09 mre

Idk about this PR. It has lot of changes but it was never clear to me how it would fix relative links. So, my plans don't require the progression of this PR.

That said, maybe there are some useful test cases in here. Also, it might be useful to keep the PR open to show it's being worked on and discussed? At least until we have some other PRs working on this issue.

katrinafyi avatar Sep 11 '25 14:09 katrinafyi