Add method sam_open_write
-
The new method
sam_open_writecan open ahtsFilefile in writing mode only, attach a header struct (sam_hdr_t) to it and write the header data to the disk, before any alignment data. -
New getter method
hts_get_fnis used to fetch the internal file name ofhtsFile. -
The index file name inside
htsFileis now managed by HTSlib, and no longer points to an application variable.
If we change fnidx to be managed by HTSlib (which I think we can get away with, as I haven't found anything that tries to fiddle with it), we should also change the documentation for sam_idx_init() as it's no longer necessary for the pointer to remain valid.
Looks good. Maybe squash the 1st and 3rd commits together?
Looks good. Maybe squash the 1st and 3rd commits together?
I squashed all three of them, as the second didn't make much sense by itself.
@jmarshall so... were there any comments....?
To continue the conversation from the previous PR #1052,
It could be argued that the samtools API was better. I've never entirely seen the point of having to pass in the header to sam_read1() and sam_write1()
When there are lots of different sets of headers around, e.g., when implementing samtools merge, I think the overall code is clearer when it is explicit which headers go with which bam1_t records and which files. (e.g., user code will use hdr->target_name[mybam->core.tid] to get their bam1_t's RNAME string and needs to be clear which hdr to use with that mybam — which is clearer if the right hdr also appears in nearby sam_read1/sam_write1 calls.)
Having said that, I can see the point of attaching the headers to the htsFile and having the option of not passing them explicitly to sam_read1() and sam_write1(). So FWIW I would support adding this sam_write1(fp, NULL, b) mode of operation so that user code can choose which style to use.
There are two ways of opening a file for use with sam_read1()/sam_write1(): hts_open()/hts_open_format() and hts_hopen(). In the latter case, this might be via hopen("filename") … hts_hopen() or hdopen(arbitrary_fd) … hts_hopen().
The proposed sam_open_write() limits you to directly providing the filename à la hts_open(). This means it's badly factored w.r.t. the rest of the API, as you can't use it with an hFILE* that you already have. The new function also in the end doesn't buy you much: it just bundles the sam_hdr_write and attaching of the headers into the single opening function.
I would suggest instead of adding this sam_open_write(), instead make sam_hdr_write() itself attach the headers to the file pointer. Then new-style user code could be:
samFile *fout = hts_open_format(filename, "w", &myfmt); // or hts_hopen etc
if (fout == NULL) { "error can't open: permission problem etc" }
if (sam_hdr_write(fout, headers) < 0) { "error headers borked or I/O error" }
…
…sam_write1(fout, NULL, b)…
which keeps the handling for the distinct failure modes separate and is still only two function calls (compared to the proposed sam_open_write's one function call).
Due to the header reference counting (which needs to be verified as still correct with either form of more widespread header attaching), having sam_hdr_write() implicitly attach the headers to the file pointer should have no adverse effect on old-style code that specifies the headers explicitly in sam_write1().
(If OTOH you do go with sam_open_write(), I have additional comments about error handling. hts_open_format reports operating system failures via errno, so the other error paths need to set errno too.)
On the whole I think I prefer the simplicity of only having to call one function. It is true that it lacks features compared to hts_open etc., but then the hts_open interfaces also aren't completely consistent - for example, there's no hts_dopen which can lead to quite a dance when you want to write a bam file into one. You also can't pass an htsFormat struct to hts_hopen and you can't pass varargs into hts_open - thinking of which, maybe we should allow varargs in sam_open_write() which could be passed on when creating the hFILE (would require a vhopen())?
We could always add sam_hopen_write() and sam_dopen_write() later to complete the set, and simplify the sam file into file descriptor problem.