readium-sdk icon indicating copy to clipboard operation
readium-sdk copied to clipboard

CFIs created by Readium add incorrect assertions for spine item references

Open marcuswu opened this issue 12 years ago • 7 comments

The CFIs generated by Readium seem to contain an ID assertion for an <itemref> element which has an IDREF attribute but no ID attribute. Per the CFI specification[0], the ID assertion should be used for elements with an ID but says nothing on an IDREF. See also[1]

[0] - http://www.idpf.org/epub/linking/cfi/epub-cfi.html#sec-path-xmlid [1] - https://jira.bibliolabs.com:8443/browse/BBW-1756

marcuswu avatar Apr 28 '14 14:04 marcuswu

Your second link is behind a login screen. Can you provide a test case?

dmitrym0 avatar Apr 29 '14 15:04 dmitrym0

On the Google sample Moby Dick epub, the following CFI was generated: epubcfi(/6/12[xepigraph_001]!/4/2/2/2/1:0). The segment leading up to the assertion, "/6/12," references the following spine itemref entry: . The 'xepigraph_001' assertion would make sense if the CFI pointed to the manifest item which does have that id, but the spine itemref entry does not have an id attribute.

marcuswu avatar Apr 29 '14 17:04 marcuswu

Should this be in the CFI-JS repo?

rkwright avatar Oct 23 '14 16:10 rkwright

@rkwright No as I believe this is limited to a problem in the SDK. To my knowledge the CFI-JS library isn't set up to generate CFIs with package document references only content document components, aka partial CFIs. There is code in the CFI-JS library to generate a full CFI if the package document is passed on to it, but a quick glance at the code (https://github.com/readium/readium-cfi-js/blob/master/src/models/cfi_generator.js#L126 https://github.com/readium/readium-cfi-js/blob/master/src/models/cfi_generator.js#L315) shows that this issue shouldn't happen there.

I'm investigating..

jccr avatar Oct 23 '14 17:10 jccr

The culprit: https://github.com/readium/readium-sdk/blob/master/ePub3/ePub/package.cpp#L1094

jccr avatar Oct 23 '14 18:10 jccr

thanks @JCCR (looks easy to fix, just need to check for other potential occurrences of similar issues)

unfortunately the unit-tests do not assert against "real" OPF files, they just evaluate expressions: https://github.com/readium/readium-sdk/blob/develop/UnitTests/cfi_tests.cpp

danielweck avatar Oct 23 '14 18:10 danielweck

So, time to revive this issue.

A full CFI is meant to contain a SPINE ITEM reference (not MANIFEST ITEM). However:

https://github.com/readium/readium-sdk/blob/master/ePub3/ePub/package.cpp#L1218

const CFI Package::CFIForSpineItem(shared_ptr<SpineItem> item) const
{
    CFI result;
    result._components.emplace_back(_spineCFIIndex);
    result._components.emplace_back(_Str((item->Index()+1)*2, "[", item->Idref(), "]!"));
    return result;
}

The (shared_ptr<SpineItem>)item->Idref() should not be inserted into the CFI assertion [ELEMENT_ID]. Instead, it should be the value of the opf>spine>itemref@id attribute (not opf>spine>itemref@idref which is opf>manifest>item@id!)

Similarly, this seems wrong:

https://github.com/readium/readium-sdk/blob/master/ePub3/ePub/package.cpp#L1211

const CFI Package::CFIForManifestItem(shared_ptr<ManifestItem> item) const
{
    CFI result;
    result._components.emplace_back(_spineCFIIndex);
    result._components.emplace_back(_Str((IndexOfSpineItemWithIDRef(item->Identifier())+1)*2, "[", item->Identifier(), "]!"));
    return result;
}

...and this:

https://github.com/readium/readium-sdk/blob/master/ePub3/ePub/package.cpp#L155

string PackageBase::CFISubpathForManifestItemWithID(const string &ident) const
{
    size_t sz = IndexOfSpineItemWithIDRef(ident);
    if ( sz == size_t(-1) )
        throw std::invalid_argument(_Str("Identifier '", ident, "' was not found in the spine."));

    return _Str(_spineCFIIndex, "/", (sz+1)*2, "[", ident, "]!");
}

Importantly, this assertion function is wrong too! (see how it checks for pItem->Idref() instead of pItem->Id()):

https://github.com/readium/readium-sdk/blob/master/ePub3/ePub/package.cpp#L252

shared_ptr<SpineItem> PackageBase::ConfirmOrCorrectSpineItemQualifier(shared_ptr<SpineItem> pItem, CFI::Component *pComponent) const
{
    if ( pComponent->HasQualifier() && pItem->Idref() != pComponent->qualifier )
    {
        // find the item with the qualifier
        pItem = _spine;
        uint32_t idx = 2;

        while ( pItem != nullptr )
        {
            if ( pItem->Idref() == pComponent->qualifier )
            {
                // found it-- correct the CFI
                pComponent->nodeIndex = idx;
                break;
            }

            pItem = pItem->Next();
        }
    }
    else if ( pComponent->HasQualifier() == false )
    {
        HandleError(EPUBError::CFINonAssertedXMLID);
    }

    return pItem;
}

Judging by this human-readable message, the original implementor of this CFI feature seems to have assumed that spine@IDREF is used instead of spine@ID:

https://github.com/readium/readium-sdk/blob/master/ePub3/ePub/package.cpp#L1261 HandleError(EPUBError::CFIIndirectionTargetMissing, "CFI spine node qualifier doesn't match any spine item idref");

So, it's a bug in the SDK C++ implementation, and it's not actually trivial to fix in the Readium SDK C++ code (because we also need to check for potential regression bugs in implementations where the API contract was used as-is). As mentioned before in this conversation thread, it's not a major issue in ReadiumJS, which uses its own internal CFI processor (and no full CFI syntax, so no spine item ID issues). Thankfully the CFI API does not seem to be used anywhere, and assertions are not a critical part of the CFI "applications" in Readium (in other words: no check of CFI robustness to assert whether the DOM has changed). http://www.idpf.org/epub/linking/cfi/epub-cfi.html#sec-path-xmlid

Thoughts?

danielweck avatar Mar 11 '16 15:03 danielweck