bagit-java icon indicating copy to clipboard operation
bagit-java copied to clipboard

various fixes

Open jscancella opened this issue 7 years ago • 14 comments

fixes #124 - by renaming to make it more clear that we verify all manifests fixes #123 - by fixing the typo in the message formatting so the path gets added to the message fixes #122 - fixed when checking if a file is in at least one manifest and file is hidden. It was not skipping if it was hidden and the ignoreHidden setting was set fixes #121 - changes the number of threads to be limited by the cpu count instead of unbounded fixes #119 - by adding test to ensure we don't get warnings when there shouldn't be any

jscancella avatar Jul 22 '18 16:07 jscancella

Coverage Status

Coverage increased (+0.07%) to 98.354% when pulling a06001ce68b91b2f532a9ec23979f631621ce309 on jscancella:master into 2b7002e62d721f5eb50461e4c4c70a6ef643ec1d on LibraryOfCongress:master.

coveralls avatar Jul 22 '18 16:07 coveralls

@jscancella Not sure if I can ask this here or if I should make a new issue for this. However, since you're touching the class I wanted to ask about in this PR, I'd try it here...

I was wondering if there is any special reason for BagVerifier to be final and checkHashes to be package private. In my project we would really benefit from extending from this class and calling checkHashes from another sequence of validation steps (like your isComplete and isValid. I mean, if the final is there only to protect isComplete and/or isValid from being overridden and not conforming to the specs, you could also use final keywords on these methods themselves.

rvanheest avatar Jul 23 '18 20:07 rvanheest

@rvanheest The original reason was to convey that these classes were not designed to be overridden, That's not to say you would be in danger if you did, merely that you were on your own if you wanted to fork this and change the class to be one you can override.

I also personally like to make everything as final as possible to decrease the attack surface should a vulnerability be found in the library or dependency library. But I no longer work at the Library of Congress so if the maintainers wish to make this public that is their choice. What are you doing that you need to check the hashes again later without using BagVerifer?

jscancella avatar Jul 24 '18 13:07 jscancella

@jscancella I understand why you made these classes final. We defined a another kind of validness (called 'virtually valid'), which is basically the same as 'valid', but with some different rules for the fetch file. So it would be ideal if I could extend from BagVerifier and create a new method that uses the existing functionality for validating the checksums (checkHashes). By the way, I'm not asking to make checkHashes public! That would expose it too much. Personally I would choose protected in this case. But I'll leave that to the new maintainers, indeed.

Finally, sorry to hear that you left Library of Congress. It was a pleasure for my coworkers and me to collaborate and interact with you via GitHub. Best wishes in your new position.

rvanheest avatar Jul 24 '18 13:07 rvanheest

@rvanheest interesting, virtually valid makes me think you are building a bag and then checking it later, is this a correct assumption? If so my only comment would be that I tend to think of bagit as a method of transferring files correctly using (typically) a "sneakernet".

Thanks for the kind remarks. As for leaving the Library of Congress, it was most unfortunate but as you can see I am still active on bagit and hope to remain so (even if it is on my own time).

jscancella avatar Jul 24 '18 14:07 jscancella

@jscancella, as I mentioned in another place to you, we use the concept of Bags, BagIt, etc. as one of our storage formats. Indeed we also use it for transferring datasets from our clients to us, but also use it for storage (after having done some cleanup and metadata enrichment). See our public documentation for a description of virtuallyValid.

In the implementation of virtuallyValid I could use bagit-java's implementation of checkHashes as 'stand-alone'. As you know however, this method is currently package private. I'll prepare a pull request for this later, such that you and the maintainers can comment on my proposed changes.

rvanheest avatar Jul 25 '18 10:07 rvanheest

@rvanheest could you not do something like create a derivative bag that doesn't include the files listed in the fetch file in the manifest(s)? That way you can use all the tools normally and you don't need to define a "virtually" valid bag.

For example, something like this:

BagReader reader = new BagReader();
Bag bag = reader.read(<YOUR PATH HERE>);
for(FetchItem item : bag.getItemsToFetch()){
  for(Manifest manifest : bag.getPayLoadManifests()){
    manifest.getFileToChecksumMap().remove(PathUtils.getDataDir(bag).resolve(item.getPath()));
  }
}

//TODO update tag manifest(s)...
BagWriter.write(bag, <SOME PLACE YOU KEEP STORE BAGS>);

jscancella avatar Jul 25 '18 18:07 jscancella

@acdha just a side note, this PR includes your changes in #120

jscancella avatar Aug 15 '18 13:08 jscancella

@jscancella Would you please share the compiled jar with me that I can use in my project as I am also having the Thread pool issue for the larger bag.

smntb avatar Aug 20 '18 12:08 smntb

@smntb I would wait until someone from the Library of Congress publishes an official jar. But if you really can't wait, you can clone my repo and use the instructions in the README to build it yourself

jscancella avatar Aug 21 '18 02:08 jscancella

@acdha thoughts? It looks like there are people who are interested in this being merged in and deployed to the main java repositories (maven central and jcenter)

jscancella avatar Aug 23 '18 13:08 jscancella

I don't have access to this repo. Perhaps @dbrunton knows who's picking up maintainership?

acdha avatar Aug 23 '18 13:08 acdha

Any update on this pull request?

smntb avatar Sep 04 '18 09:09 smntb

bump

jscancella avatar Oct 11 '18 16:10 jscancella