NetRAX icon indicating copy to clipboard operation
NetRAX copied to clipboard

Experiment Plot planning for Tree Simulations

Open lutteropp opened this issue 5 years ago • 11 comments

Now that the columns in the results CSV are fixed (see https://github.com/lutteropp/NetRAX/issues/14), let's list which plots we need. Here are some that I believe make sense to have:

  • Use the dataset IDs as x-values. For each combination of MSA-size, simulator-type, sampling-type, likelihood-type, do the following plots:

Plot 1: BIC score of

  • true simulated network
  • raxml-ng best tree
  • inferred network with NetRAX starting from raxml-ng best tree
  • inferred network with NetRAX starting from 10 random + 10 parsimony trees

Plot 2: Network loglikelihood score of

  • true simulated network
  • raxml-ng best tree
  • inferred network with NetRAX starting from raxml-ng best tree
  • inferred network with NetRAX starting from 10 random + 10 parsimony trees

Plot 3: Normalized RF distance with true simulated tree and

  • raxml-ng best tree
  • inferred network with NetRAX starting from raxml-ng best tree
  • inferred network with NetRAX starting from 10 random + 10 parsimony trees

Plot 4:

  • number of near-zero branches in best raxml-ng tree

lutteropp avatar Nov 30 '20 13:11 lutteropp

sounds good

On 30.11.20 15:20, Sarah Lutteropp wrote:

Now that the columns in the results CSV are fixed (see #14 https://github.com/lutteropp/NetRAX/issues/14), let's list which plots we need. Here are some that I believe make sense to have:

  • Use the dataset IDs as x-values. For each combination of MSA-size, simulator-type, sampling-type, likelihood-type, do the following plots:

Plot 1: BIC score of

  • true simulated network
  • raxml-ng best tree
  • inferred network with NetRAX starting from raxml-ng best tree
  • inferred network with NetRAX starting from 10 random + 10 parsimony trees

Plot 2: Network loglikelihood score of

  • true simulated network
  • raxml-ng best tree
  • inferred network with NetRAX starting from raxml-ng best tree
  • inferred network with NetRAX starting from 10 random + 10 parsimony trees

Plot 3: Normalized RF distance with true simulated tree and

  • raxml-ng best tree
  • inferred network with NetRAX starting from raxml-ng best tree
  • inferred network with NetRAX starting from 10 random + 10 parsimony trees

Plot 4:

  • number of near-zero branches in best raxml-ng tree

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/15, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6TDQ3FKAXWM7RGKH5DSSOLYDANCNFSM4UHSCWZA.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

stamatak avatar Nov 30 '20 19:11 stamatak

current state in trying to plot things, turns out the BICs are too closely lying together. Instead of plotting the BIC scores, it likely makes more sense to print absolute difference to BIC score of "true" network... Screenshot from 2020-11-30 22-36-14

lutteropp avatar Nov 30 '20 21:11 lutteropp

Screenshot from 2020-11-30 22-50-25 Yes... this looks slightly better, but still not useful... switching to relative BIC difference instead of absolute difference here. Also, likely a histogram works better for this kind of data.

lutteropp avatar Nov 30 '20 21:11 lutteropp

Here with relative BIC differences, values smaller than zero meaning a BIC improvement Screenshot from 2020-11-30 22-57-53 Slightly more useful, but still... histogram is likely better here.

lutteropp avatar Nov 30 '20 21:11 lutteropp

Maybe for the BIC score, what we really are interested in are the counts of these situations happening:

  • NetRAX (starting from best raxml-ng tree) BIC was less-or-equal (better) than "true" BIC
  • NetRAX (starting from best raxml-ng tree) BIC was larger (worse) than "true" BIC
  • NetRAX (starting from 10 random + 10 parsimony trees) BIC was less-or-equal (better) than "true" BIC
  • NetRAX (starting from 10 random + 10 parsimony trees) BIC was larger (worse) than "true" BIC
  • raxml-ng best tree BIC was less-or-equal (better) than "true" BIC
  • raxml-ng best tree BIC was larger (worse) than "true" BIC

lutteropp avatar Nov 30 '20 22:11 lutteropp

I got the BIC score plots to look like this now SimulationType CELINE_SamplingType PERFECT_SAMPLING_1000_msasize_LikelihoodType BEST_bic_stats SimulationType CELINE_SamplingType PERFECT_SAMPLING_1000_msasize_LikelihoodType BEST_bic_plot

lutteropp avatar Nov 30 '20 22:11 lutteropp

For relative RF distance, I currently have such plots: SimulationType CELINE_SamplingType PERFECT_SAMPLING_1000_msasize_LikelihoodType BEST_rfdist_stats SimulationType CELINE_SamplingType PERFECT_SAMPLING_1000_msasize_LikelihoodType BEST_rfdist_plot

A set of histograms would definitely fit better here.

lutteropp avatar Nov 30 '20 23:11 lutteropp

I guess so

On 30.11.20 23:37, Sarah Lutteropp wrote:

current state in trying to plot things, turns out the BICs are too closely lying together. Instead of plotting the BIC scores, it likely makes more sense to print absolute difference to BIC score of "true" network... Screenshot from 2020-11-30 22-36-14 https://user-images.githubusercontent.com/1059869/100668712-a39e4b00-335c-11eb-87a3-0b23f4bc7178.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/15#issuecomment-736071729, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6WZFB5F5ALHZGUYRT3SSQGDBANCNFSM4UHSCWZA.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

stamatak avatar Dec 01 '20 05:12 stamatak

This one looks pretty good

On 30.11.20 23:59, Sarah Lutteropp wrote:

Here with relative BIC differences, values smaller than zero meaning a BIC improvement Screenshot from 2020-11-30 22-57-53 https://user-images.githubusercontent.com/1059869/100670749-9cc50780-335f-11eb-8965-08adcf5d3be1.png Slightly more useful, but still... histogram is likely better here.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/15#issuecomment-736081684, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6T4Z4ZMVBZ5GMSU5RLSSQIUTANCNFSM4UHSCWZA.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

stamatak avatar Dec 01 '20 05:12 stamatak

This looks really good now, it will only be a bit difficult to present in the paper as the various combinations might cause confusion, so maybe just presenting 3-4 such configurations and moving the rest into the supplement might be a good idea

On 01.12.20 00:41, Sarah Lutteropp wrote:

I got the BIC score plots to look like this now SimulationType CELINE_SamplingType PERFECT_SAMPLING_1000_msasize_LikelihoodType BEST_bic_stats https://user-images.githubusercontent.com/1059869/100674523-90dc4400-3365-11eb-980c-6fa487be6e99.png SimulationType CELINE_SamplingType PERFECT_SAMPLING_1000_msasize_LikelihoodType BEST_bic_plot https://user-images.githubusercontent.com/1059869/100674525-9174da80-3365-11eb-9d1a-184e22bad109.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/15#issuecomment-736099506, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6VEIX2X5AUOXCNS7TDSSQNSRANCNFSM4UHSCWZA.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

stamatak avatar Dec 01 '20 05:12 stamatak

does zero refer to near-zero branch lengths?

alexis

On 01.12.20 01:19, Sarah Lutteropp wrote:

For relative RF distance, I currently have such plots: SimulationType CELINE_SamplingType PERFECT_SAMPLING_1000_msasize_LikelihoodType BEST_rfdist_stats https://user-images.githubusercontent.com/1059869/100677201-d4857c80-336a-11eb-9b1a-1e397432a35f.png SimulationType CELINE_SamplingType PERFECT_SAMPLING_1000_msasize_LikelihoodType BEST_rfdist_plot https://user-images.githubusercontent.com/1059869/100677203-d5b6a980-336a-11eb-95bb-306259984ae4.png

A set of histograms would definitely fit better here.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/15#issuecomment-736113763, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6WW77FSCBYS4J4WHW3SSQSA5ANCNFSM4UHSCWZA.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

stamatak avatar Dec 01 '20 05:12 stamatak