The most appropriate way to estimate sequence coverage

Open andosl opened this issue 1 year ago • 1 comments

Hi,

Thanks for a great tool! I have a question on the expected coverage you use to input for running fit_model_extra.py. When we have no reason to suspect there should be systematic increase or decrease in coverage, what is the most appropriate way to estimate this? E.g. median or arithmetic mean from output of samtools depth?

"## Run fit_model_extra.py to fit the model docker run
-v ${INPUT_DIR}:${INPUT_DIR}
-v ${OUTPUT_DIR}:${OUTPUT_DIR}
mobinasri/flagger:v0.3.2
python3 /home/programs/src/fit_gmm.py
--counts ${INPUT_DIR}/read_alignment.counts
--cov ${EXPECTED_COVERAGE}
--output ${OUTPUT_DIR}/read_alignment.table "

Kind regards, Andreas

Mar 05 '24 08:03 andosl

Hi @andosl Thanks for using Flagger. It should be robust to median or mean as long they are not highly different but median is better. Since Flagger only uses the value you pass as the initial value for fitting parameters by EM algorithm, the final output should be robust to this parameter.

Apr 29 '24 18:04 mobinasri