cargo-docset Docset index contains relocated symbols that only result in a redirection

Describe the bug When a crate re-exports symbols from a private module, cargo doc still produces html files for these re-exported symbols in the private module, and these files simply redirect to the public location, and omits the private location from the JS search index. Unfortunately cargo-docset is still indexing these private locations.

To Reproduce

$ cargo init --lib --name doctest
$ cat >src/lib.rs <<EOF
mod foo {
    /// Yes this is Bar.
    pub struct Bar;
}
pub use foo::Bar;
EOF
$ cargo docset
$ sqlite3 target/docset/doctest.docset/Contents/Resources/docSet.dsidx 'SELECT * FROM searchIndex;'
1|doctest|Package|doctest/index.html
2|doctest::Bar|Struct|doctest/struct.Bar.html
3|doctest::foo::Bar|Struct|doctest/foo/struct.Bar.html

Expected behavior The index should look like

1|doctest|Package|doctest/index.html
2|doctest::Bar|Struct|doctest/struct.Bar.html

Screenshots

Desktop (please complete the following information):

OS: macOS 11.2.3 (20D91)
Documentation browser [e.g. dash, zeal]: Dash
Version: cargo-docset 0.2.1

Additional context These redirection files look like

<!DOCTYPE html>
<html lang="en">
<head>
    <meta http-equiv="refresh" content="0;URL=../../doctest/struct.Bar.html">
</head>
<body>
    <p>Redirecting to <a href="../../doctest/struct.Bar.html">../../doctest/struct.Bar.html</a>...</p>
    <script>location.replace("../../doctest/struct.Bar.html" + location.search + location.hash);</script>
</body>
</html>

It shouldn't be hard to detect the redirection and ignore the symbol.

I also wonder how hard it would be to extract the JSON search index from target/doc/search-index.js and use that to build the docset index? That seems like a more long-term effort though.

May 05 '21 02:05 lilyball

Thanks for this report and the other ones, I'll look into this and make a new release.

May 16 '21 16:05 Robzz

I finally looked into this and it appears redirection pages now include <title>Redirection</title> in the head section, which is probably the easiest way to identify them. I implemented this in #45 which seems to work and I haven't noticed any missing entries as a consequence. Docset generation does seem a bit slower as a result of reading the beginning of every HTML page, but it's still fast enough that I didn't feel the need to measure it.

I'll make a point release in a few days if I don't notice anything wrong by then.

Sep 21 '22 06:09 Robzz