[BUG] ClustalO wrapper example from the docs is not working correctly
Describe the bug The example throws an error when it wants to read some temporary files
System.IO.IOException: The process cannot access the file 'C:\Users\schne\AppData\Local\Temp\ac7bce1d-efe2-45ba-8ffa-90d0a60e0f49.fasta' because it is being used by another process. at System.IO.FileSystem.DeleteFile(String fullPath)
at System.IO.File.Delete(String path)
at BioFSharp.IO.ClustalOWrapper.ClustalOWrapper.AlignSequences(IEnumerable`1 input, IEnumerable`1 parameters, FSharpOption`1 name) in C:\Users\schne\source\repos\CSBiology\BioFSharp\src\BioFSharp.IO\ClustalOWrapper.fs:line 347
at <StartupCode$FSI_0010>.$FSI_0010.main@()
Stopped due to error
To Reproduce
#r "nuget: BioFSharp, 2.0.0-beta5"
#r "nuget: BioFSharp.IO, 2.0.0-beta5"
open BioFSharp
open BioFSharp.IO
open ClustalOWrapper
let cw = ClustalOWrapper()
let sequences =
[
TaggedSequence.createTaggedSequence "pep1" ("AAGECGEK")
TaggedSequence.createTaggedSequence "pep2" ("AAGEGEK")
TaggedSequence.createTaggedSequence "pep3" ("AAAGECGEK")
TaggedSequence.createTaggedSequence "pep4" ("AAGECGEL")
]
let alignedSequences =
cw.AlignSequences(sequences,Seq.empty)
Additional context
Also, conversion functions or overloads for AlignSequences that work on all BioSequences (BioArray, BioList, BioSeq), so something like TaggedSequence<string,seq<IBioItem>> would be awesome. Tagging @HLWeil as you're the creator ;)
Maybe related: #61
If I remember correctly, the ClustalO tool is executed and the output is directly read back into the F# runtime afterwards. It seems like the reader is not waiting until the ClustalO is finished. I'll look into this..
The issue #61 is unrelated but definitely needs some attention 😅 (Edit: Whoops definitely related to your additional context)
I could reproduce the problem but: The code you wrote above shouldn't even get that far because using the ClustalOWrapper() constructor with no parameters will be looking for a default clustalo.exe in a nonsense path (mea culpa).
Downloading the tool and giving it's exe path to the constructor fixes this first hurdle:
let clustalToolPath = System.IO.Path.Combine(__SOURCE_DIRECTORY__,"clustal-omega-1.2.2-win64\clustalo.exe")
let cw = ClustalOWrapper(clustalToolPath)
But calling the AlignSequences methods fails with the error you described. The reason for this seems to be the clustalo tool failing:
> Starting -i C:\Users\HLWei\AppData\Local\Temp\5a1b3b45-1685-4741-a450-28f071ce30cf.fasta -o C:\Users\HLWei\AppData\Local\Temp\0508777c-193d-4bb7-bd27-d903cffb787a.aln --infmt=fa --outfmt=clu ...
FATAL: File C:\Users\HLWei\AppData\Local\Temp\5a1b3b45-1685-4741-a450-28f071ce30cf.fasta does not appear to be in FASTA format at line 1. You may want to specify the file format on the command line. Usually this is done with an option --infmt <fmt>.
I can reproduce this error when running the tool by hand in the powershell, which is weird as the input file is definitely in fasta format:
>pep1
AAGECGEK
>pep2
AAGEGEK
>pep3
AAAGECGEK
>pep4
AAGECGEL
Maybe some line ending bonanza or the fasta format was changed recently :shrug:
Edit: Whoops misclicked on close issue. Will investigate further
The line FastA.write id inPath (sequences |> Seq.map tsToFasta) does write in UTF8-BOM encoding which does not seem to be supported by ClustalO. Copying the same fastA file into a UTF8 encoded file does work properly.
Can you check whether your temporary files are encoded in UTF8-BOM or UTF8 not?
Also the FastA.write does not seem to close the writer when finished, blocking the file.
If this is also the case for you I think we should open an additional issue.
AdditionalInfo:
let tsToFasta (ts:TaggedSequence.TaggedSequence<string,char>) =
{FastA.Header = ts.Tag;
FastA.Sequence = ts.Sequence}
Good catch, UTF8BOM was already an issue when is tried to use fasta files for TargetP2, could have thought about that one. Ill fix that one
Should be fixed with https://github.com/CSBiology/BioFSharp/commit/beb4158e8e8d71e7c956347db1df2f2e99f8a5e4
Adaptions to work on biosequences would still be highly appreciated
Also, the rootpath works just fine for me as long is i have the BioFSharp repo on my machine, was that not the intention of the design?
well yes, but actually no
This is the path it tries for me: C:\Users\schne\source\repos\CSBiology\BioFSharp\lib\clustal-omega\clustalo.exe 💯
I think either we change it to look for a global clustal installation or we make the parameter mandatory?
(Edit: Yes that was the intention. Not sure if building the repo yourself should be regarded as a a case in the code though)