subset not working as expected

Open jvolkening opened this issue 4 years ago • 0 comments

Hello,

I've been using kallisto/sleuth for DE analysis of a virus/host dataset. It has been working well for analysing the full transcript set, but now I would like to normalize and analyze only the viral genes separately. I have tried to use the following code to subset the kallisto objects:

# old_path and new_path point to locations of existing and to-be-created HDF5 files
dir.create(dirname(new_path), recursive=T)
read_kallisto_h5(old_path, read_bootstrap=T) %>%
    subset_kallisto(target_ids=viral_ids) %>%
    write_kallisto_hdf5(fname=new_path)

This runs without error, but after troubleshooting subsequent sleuth errors and dumping the contents of the newly written HDF5 files, I found that all of the datasets in the new HDF5 are empty except for bias_observed, bias_normalized, fld, and the bootstrap datasets. For instance, the ids dataset is empty, which was the initial source of my downstream problems. I did double-check that viral_ids were correct and matched those used in kallisto, and sleuth reports the expected number of transcripts filtered.

It appears the problem is in subset_kallisto(), since if I don't subset and only read/write directly, the created HDF5 file seems to be valid. I have seen similar issues, e.g. #204, but I'm not sure if they are directly related or not.

I just tried with sleuth installed from GitHub today using devtools, with the same result.

Thanks in advance for any help.

Oct 13 '21 22:10 jvolkening