Is it OK to have a missing version directory?
Fixture https://github.com/OCFL/fixtures/pull/79 / E010_missing_versions brings up an interesting question for me. Is it really necessary to have a version directory for every version? There will be a version directory if there is an inventory for every version but this is not required. But an implementation isn't storing an inventory for every version and a version doesn't add any new content files, is an empty version directory required?
I brought this issue up previously (https://github.com/OCFL/spec/issues/535) and at the time was told that you must always have a version and if you aren't storing an inventory for every version you're doing it wrong.
Getting back to the core OCFL principles, I would argue that the first paragraph of "3.3 Version Directories" defines an important characteristic of OCFL. However, given the loophole raised in https://github.com/OCFL/spec/issues/535, I would suggest we add wording to the middle of the first paragraph of "3.7 Version Inventory and Inventory Digest" along the lines of:
In the case where no files have been added or updated in a given version, which would result in an empty and therefore absent "content" directory (see https://ocfl.io/1.0/spec/#content-directory), such a version directory MUST include an inventory file.
I feel uncomfortable with the idea that we might require an inventory.json just as a way to keep the version directory in implementations that choose otherwise not to have an inventory in the version directories.
Suggested chage to 3.3 Version Directories. Paragraph 1 should read (changes are highlighted):
OCFL Object content MUST be stored as a sequence of one or more versions. The sequence of version numbers is the sequence of positive, base-ten integers: 1, 2, 3, etc., and the version directory name is constructed by adding the prefix v. The version number sequence MUST start at 1 and MUST be continuous without missing integers. Each object version MUST be stored in a version directory under the object root.
and then the last paragraph should read (changes are highlighted):
There MUST be no other files as children of a version directory, other than an inventory file, an inventory digest, or a
.no_contentfile. The version directory SHOULD NOT contain any directories other than the designated content sub-directory. Once created, the contents of a version directory are expected to be immutable.
I don't think the suggested changes indicate that you can't have an empty version directory which is of course the whole point of this ticket.
We need additional language that lets readers know .no_content file should exist when 1. you don't store an inventory file in your version directories AND 2. your version does not have any content to be stored (e.g. the version was created to document a file name change).
Suggestion on language to use welcome.
cc: @zimeon @awoods
Per slack discussion with @neilsjefferies, I think don't see any benefit of making the no_content file a "dot"/hidden file.
Questions:
- Is it allowed to have a
no_contentfile AND an inventory? (I think it YES) - Is it allowed to have a
no_contentfile AND a content sub-directory? (I think NO) - Is it preferred to have a
no_contentfile in the case that there is no content sub-directory, even if there is an inventory? (I think YES) - What should the content of the
no_contentfile be? (I suggest empty, SHOULD?)
Assuming my two answers to the above. I think we could write something like the following although it ends up as a bit of a mouthful:
There MUST be no files as children of a version directory except an inventory file, an inventory digest, or a
no_contentfile. The version directory SHOULD NOT contain any directories other than the designated content sub-directory. The version directory MUST NOT be empty and in the case that there is no content sub-directory there SHOULD be ano_contentfile. If present, theno_contentfile SHOULD be empty. Once created, the contents of a version directory are expected to be immutable.
That language does not enforce point 2
Is there any harm in making a no_content file mandatory in the absence of a content subdirectory?
We also discussed whether or not 'no_content' should have content and felt that for validation it didn't matter we would just check for presence. Therefore we would remain silent on whether or not there is content in the no_content file. @zimeon can you explain why we need to dictate that no_content is zero length?
@pwinckles re. https://github.com/OCFL/spec/issues/540#issuecomment-982779257 - yes indeed, good point
@neilsjefferies re. https://github.com/OCFL/spec/issues/540#issuecomment-982783608 - if we make it mandatory then we don't have backward compatibility with 1.0... but now I think about it, requiring no_content when there isn't a content directory is also not backwards compatible so maybe this whole change has to wait for 2.0?
@rosy1280 re. https://github.com/OCFL/spec/issues/540#issuecomment-982786512 - no particular reason why no_content should have no content (though you have to admit it is kinda cute). I do think it is better to recommend something as that avoids someone having to make an arbitrary implementation decision.
Taking the above into account, a revised proposal might be:
There MUST be no files as children of a version directory except an inventory file, an inventory digest, or a
no_contentfile. The version directory SHOULD NOT contain any directories other than the designated content sub-directory. The version directory MUST NOT be empty. In the case that there is no designated content sub-directory there [SHOULD|MUST] be a file namedno_content, and there MUST NOT be a file or directory namedno_contentotherwise. If present, theno_contentfile [SHOULD be empty|MAY be empty or have any content]. Once created, the contents of a version directory are expected to be immutable.
@zimeon 👍🏼 to it may be a breaking change. I've been wondering that as we drafted this. Should have said something sooner.
Given the fact that we do not want to introduce any breaking changes in a 1.1 release, would a softening of my earlier suggestion to a SHOULD instead of MUST be sufficient guidance for this release?
In the case where no files have been added or updated in a given version, which would result in an empty and therefore absent "content" directory (see https://ocfl.io/1.0/spec/#content-directory), such a version directory SHOULD include an inventory file.
I am now leaning towards @awoods suggestion as the minimal change to the spec required to resolve the issue. It is a little untidy only if you are taking the NOT RECOMMENDED route of not having version inventories.
I do not think we should make the earlier suggestion but with SHOULD instead of MUST because it doesn't solve the problem: it would still just be a warning to not have a version directory even though now two warnings (no inventory and no directory).
I think we should punt this to v2.0 with the understanding that in v1.0 (and v1.1) it is possible (though not recommended) to not have a version directory in the case of no files updated and no version inventories stored. I don't see a non-breaking correction/fix without other implications.
I agree that my updated suggestion does not solve the problem. It does, however, provide clear guidance on how to address the empty version directory scenario.
If that guidance is less helpful than not, I am happy to leave the text as-is, and punt to 2.0.
Is it spelled out somewhere what the compatibility between 1.0 and 1.1 is supposed to be? Is 1.1 supposed to just be 1.0, but with a few validations made explicit?
It's true that the no_content change would make the representation on disk of 1.0 and 1.1 versions substantively different so that some 1.0 versions would be invalid per 1.1 and some 1.1 versions would be invalid per 1.0. However, this is only true depending on how validators are intended to behave.
If an object is created 1.0 and is later "upgraded" to 1.1, should the 1.0 versions be validated against the 1.0 spec or the 1.1 spec?
For my validators, I had originally planned on simply updating everything to validate to 1.1, because the majority of the changes were providing clarity to constraints that could already be inferred from the 1.0 spec.
If 1.1 were to include the no_content change then things become more complicated, and I was thinking of validating versions based on the spec version that they were created under. It additionally introduces the complication for OCFL clients that would need to create versions slightly differently depending on the current spec version the object conforms to.
All of that to say, I think you could put the no_content change in 1.1 if you wanted. It would make clients and validators more complicated, but it wouldn't "break" anything. Personally, I'm just as happy punting on it because it means I have less work to do, and this is a very niche edge case.
@pwinckles : per your comment about validating versions: The spec is clear that versions should be validated against the version they were written to conform to, but this isn't actually clear without a version inventory... I have created https://github.com/OCFL/spec/issues/569 to discuss
Requiring a namaste file in all versions would solve the empty directory problem. :D
+1 Punt this one to 2.0
...is there any mileage in putting something about this in the Implementation Notes?
Agreement in community call to delay until 2.0 (@rosy1280 @awoods @julianmorley @zimeon present). Removing 1.1 tag