AMBER icon indicating copy to clipboard operation
AMBER copied to clipboard

Old files archive are not correct

Open SantaMcCloud opened this issue 1 year ago • 2 comments

Hello,

sorry for writing the issue here, since I didn't find an email to contact any of the CAMI staff. I'm currently working on my bachelor thesis which including building a workflow on the web server https://usegalaxy.eu/ which serve a lot of different tools in the bioinformatic fields. Since amber is up there now, I need some benchmarks to test the workflow and I did discover that you are providing the old archive like cami low or mouse gut toy etc. I did work with the cami low and the mouse gut toy low archives, but I also want to test the high or medium archive as well, and now there is the problem. I did download both tarballs [from http://gigadb.org/dataset/100344] and unzip them, but only to get the samples without any other file while there should be also the gsa and binning which are not there in both tarballs. Is it possible to fix this, or is there any other source which contain the correct tarball as download?

This would be a great help and thank you in advance and again I'm sorry if this topic is wrong here!

SantaMcCloud avatar Sep 07 '24 15:09 SantaMcCloud

You can download the binning gold standards for the Medium and High pooled assemblies here: Medium: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_MEDIUM/pooled_gsa_mapping.binning.tsv High: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_HIGH/gsa_mapping_pool.binning Other files are available, as in the description of each dataset at https://data.cami-challenge.org/participate. The camiClient.jar can be useful sometimes to list and download available files.

fernandomeyer avatar Sep 13 '24 05:09 fernandomeyer

Yes this did help, thank you, but there is a problem with the high dataset. The reads and the binning files of the sample doesn't have matching IDs. I don't know if this is only the problem since the reads are download from gigadb and not from the openstack. I tried to download it from there, but I don't have the access for it, at least for the first sample, the other I did not try.

Then I tried to switch to the CAMI2 Toy set which has result in these repositories but the archive in the dataset directory missing some 'tar.gz' files for example sample_3 only has the contig in there, but the reads are missing. Since there is a 'README.txt' file for every file, I assume the missing files should be in there? Could it be possible to make the missing file accessible or not?

Sorry for this kind of question but if there is no possible way to update this archive or fix the mismatching of the sequence IDs from the CAMI high dataset just let me know it!

Thanks you in advance!

SantaMcCloud avatar Oct 05 '24 00:10 SantaMcCloud