CFIs created by Readium add incorrect assertions for spine item references
The CFIs generated by Readium seem to contain an ID assertion for an <itemref> element which has an IDREF attribute but no ID attribute. Per the CFI specification[0], the ID assertion should be used for elements with an ID but says nothing on an IDREF. See also[1]
[0] - http://www.idpf.org/epub/linking/cfi/epub-cfi.html#sec-path-xmlid [1] - https://jira.bibliolabs.com:8443/browse/BBW-1756
Your second link is behind a login screen. Can you provide a test case?
On the Google sample Moby Dick epub, the following CFI was generated: epubcfi(/6/12[xepigraph_001]!/4/2/2/2/1:0). The segment leading up to the assertion, "/6/12," references the following spine itemref entry:
Should this be in the CFI-JS repo?
@rkwright No as I believe this is limited to a problem in the SDK. To my knowledge the CFI-JS library isn't set up to generate CFIs with package document references only content document components, aka partial CFIs. There is code in the CFI-JS library to generate a full CFI if the package document is passed on to it, but a quick glance at the code (https://github.com/readium/readium-cfi-js/blob/master/src/models/cfi_generator.js#L126 https://github.com/readium/readium-cfi-js/blob/master/src/models/cfi_generator.js#L315) shows that this issue shouldn't happen there.
I'm investigating..
The culprit: https://github.com/readium/readium-sdk/blob/master/ePub3/ePub/package.cpp#L1094
thanks @JCCR (looks easy to fix, just need to check for other potential occurrences of similar issues)
unfortunately the unit-tests do not assert against "real" OPF files, they just evaluate expressions: https://github.com/readium/readium-sdk/blob/develop/UnitTests/cfi_tests.cpp
So, time to revive this issue.
A full CFI is meant to contain a SPINE ITEM reference (not MANIFEST ITEM). However:
https://github.com/readium/readium-sdk/blob/master/ePub3/ePub/package.cpp#L1218
const CFI Package::CFIForSpineItem(shared_ptr<SpineItem> item) const
{
CFI result;
result._components.emplace_back(_spineCFIIndex);
result._components.emplace_back(_Str((item->Index()+1)*2, "[", item->Idref(), "]!"));
return result;
}
The (shared_ptr<SpineItem>)item->Idref() should not be inserted into the CFI assertion [ELEMENT_ID]. Instead, it should be the value of the opf>spine>itemref@id attribute (not opf>spine>itemref@idref which is opf>manifest>item@id!)
Similarly, this seems wrong:
https://github.com/readium/readium-sdk/blob/master/ePub3/ePub/package.cpp#L1211
const CFI Package::CFIForManifestItem(shared_ptr<ManifestItem> item) const
{
CFI result;
result._components.emplace_back(_spineCFIIndex);
result._components.emplace_back(_Str((IndexOfSpineItemWithIDRef(item->Identifier())+1)*2, "[", item->Identifier(), "]!"));
return result;
}
...and this:
https://github.com/readium/readium-sdk/blob/master/ePub3/ePub/package.cpp#L155
string PackageBase::CFISubpathForManifestItemWithID(const string &ident) const
{
size_t sz = IndexOfSpineItemWithIDRef(ident);
if ( sz == size_t(-1) )
throw std::invalid_argument(_Str("Identifier '", ident, "' was not found in the spine."));
return _Str(_spineCFIIndex, "/", (sz+1)*2, "[", ident, "]!");
}
Importantly, this assertion function is wrong too! (see how it checks for pItem->Idref() instead of pItem->Id()):
https://github.com/readium/readium-sdk/blob/master/ePub3/ePub/package.cpp#L252
shared_ptr<SpineItem> PackageBase::ConfirmOrCorrectSpineItemQualifier(shared_ptr<SpineItem> pItem, CFI::Component *pComponent) const
{
if ( pComponent->HasQualifier() && pItem->Idref() != pComponent->qualifier )
{
// find the item with the qualifier
pItem = _spine;
uint32_t idx = 2;
while ( pItem != nullptr )
{
if ( pItem->Idref() == pComponent->qualifier )
{
// found it-- correct the CFI
pComponent->nodeIndex = idx;
break;
}
pItem = pItem->Next();
}
}
else if ( pComponent->HasQualifier() == false )
{
HandleError(EPUBError::CFINonAssertedXMLID);
}
return pItem;
}
Judging by this human-readable message, the original implementor of this CFI feature seems to have assumed that spine@IDREF is used instead of spine@ID:
https://github.com/readium/readium-sdk/blob/master/ePub3/ePub/package.cpp#L1261
HandleError(EPUBError::CFIIndirectionTargetMissing, "CFI spine node qualifier doesn't match any spine item idref");
So, it's a bug in the SDK C++ implementation, and it's not actually trivial to fix in the Readium SDK C++ code (because we also need to check for potential regression bugs in implementations where the API contract was used as-is). As mentioned before in this conversation thread, it's not a major issue in ReadiumJS, which uses its own internal CFI processor (and no full CFI syntax, so no spine item ID issues). Thankfully the CFI API does not seem to be used anywhere, and assertions are not a critical part of the CFI "applications" in Readium (in other words: no check of CFI robustness to assert whether the DOM has changed). http://www.idpf.org/epub/linking/cfi/epub-cfi.html#sec-path-xmlid
Thoughts?