Need the ability to skip md5 check for some files in snapshot
I am trying to avoid taking the md5sum for some files in my snapshot.
I found this example of one potential method that was shared online here, which I have modified as such;
nextflow_pipeline {
name "Test main Pipeline"
script "main.nf"
test("Should run without failures") {
when {
params {
// NOTE: make sure 'outdir' is defined inside the JSON!
load("$baseDir/examples/params.small.json")
}
}
then {
assert workflow.success
def exclude_suffix = [".html", "_complete", "_invocation",
"_outs", "_vdrkill", "_args","_complete",
"_jobinfo","_log","_outs","_stderr","_stdout",
"_chunk_defs", "_stage_defs", "_disabled",
"_cmdline", "_filelist", "_finalstate", "_jobmode", "_mrosource", "_perf", "_sitecheck",
"_tags", "_timestamp", "_uuid", "_versions"]
assert snapshot(
workflow,
path("${params.outdir}")
.list()
.collect { getRecursiveFileNames(it, "${params.outdir}") }
.flatten()
.findAll {
def keep = true
exclude_suffix.each { suffix ->
if (it.toString().endsWith(suffix)) {
keep = false
// println "${it} : ${keep}"
return keep // Exit the loop early if a match is found
}
}
// println "${it} : ${keep}"
return keep
}
).match()
}
}
}
def getRecursiveFileNames(fileOrDir, outputDir) {
if(file(fileOrDir.toString()).isDirectory()) {
return fileOrDir.list().collect { getRecursiveFileNames(it, outputDir) }
}
return fileOrDir.toString().replace("${outputDir}/", "")
}
It works to exclude the files with the listed suffixes, but the snapshot now only contains a list of files, no md5's for the remaining files in the list. Also, I realized that what I really wanted was to just exclude only the md5 from the files with inconsistent hashes, instead of removing them entirely. Not sure how to implement that. Can we have a feature that just builds this in to the nf-test directly?
I think this is related to this issue https://github.com/askimed/nf-test/issues/116 however the main difference that I still want to check for the existence of the files, just not their md5
I was in a somewhat similar situation and resorted to the following logic for the orthofinder module:
import groovy.io.FileType
.
.
.
assert process.success
def all_files = []
file(process.out.orthofinder[0][1]).eachFileRecurse (FileType.FILES) { file ->
all_files << file
}
def all_file_names = all_files.collect { it.name }.sort(false)
def stable_file_names = [
'Statistics_PerSpecies.tsv',
'SpeciesTree_Gene_Duplications_0.5_Support.txt',
'SpeciesTree_rooted.txt'
]
def stable_files = all_files.findAll { it.name in stable_file_names }
assert snapshot(
all_file_names,
stable_files,
process.out.versions[0]
).match()
https://github.com/nf-core/nft-utils has this functionality
I think nft-utils is the best way to implement this logic, so I am closing this issue.