Keyerror
Hello,
I'm trying to run GADMA with dadi engine on a sf file of 2 pops and GADMA pops this error that I don't understand :
gadma -p param_file_Toronto_downsampled -o ./test
UserWarning: Parameters will be in genetic units (Relative parameters). Engines dadi and moments require mutation rate and sequence length for unit translation (/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/cli/settings_storage.py:1360)
UserWarning: Code for momentsLD will not be generated as: VCF input data is required. (/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/cli/settings_storage.py:1477)
Data reading
False
WARNING:Spectrum_mod:Creating Spectrum with data_folded = True, but data has non-zero values in entries which are nonsensical for a folded Spectrum.
WARNING:Spectrum_mod:Creating Spectrum with data_folded = True, but mask is not True for all entries which are nonsensical for a folded Spectrum.
UserWarning: Spectrum file /scratch/projects/trifolium/glue/demography/glue_demography/results/gadma/Toronto/Toronto_4fold_r_u_dadi_format_good.fs is in an old format - without population labels, so they will be taken from the corresponding parameter: RURAL, URBAN. (/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/engines/dadi_moments_common.py:348)
Number of populations: 2
Projections: [30, 30]
Population labels: ['RURAL', 'URBAN']
Outgroup: False
--Successful data reading--
--Successful arguments parsing--
Parameters of launch are saved in output directory: /scratch/projects/trifolium/glue/demography/glue_demography/results/gadma/test/params_file
All output is saved in output directory: /scratch/projects/trifolium/glue/demography/glue_demography/results/gadma/test/GADMA.log
--Start pipeline--
Run launch number 1
RuntimeWarning: Mean of empty slice. (/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3474)
UserWarning: Additional evaluation for theta. Nothing to worry if this warning is seldom. (/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/engines/dadi_moments_common.py:138)
[000:01:00]
All best by log-likelihood models
Number log-likelihood Model
UserWarning: Additional evaluation for theta. Nothing to worry if this warning is seldom. (/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/engines/dadi_moments_common.py:138)
Traceback (most recent call last):
File "/home/caizergu/.conda/envs/gadma_test2/bin/gadma", line 8, in <module>
sys.exit(main())
File "/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/core/core.py", line 145, in main
print_runs_summary(start_time, shared_dict, settings_storage)
File "/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/core/draw_and_generate_code.py", line 242, in print_runs_summary
theta = engine.get_theta(x, *settings.get_engine_args())
File "/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/engines/dadi_engine.py", line 169, in get_theta
return super(DadiEngine, self).get_theta(values, pts)
File "/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/engines/dadi_moments_common.py", line 141, in get_theta
theta = self.saved_add_info[key]
KeyError: ((0.4519932843626896, 0.0004799626451119927, 6.59285705597225, 5.5977869812679915, 'Sud', 'Sud', 0, 0), (30, 40, 50))
I join the files I used to this message (not that I had to add the .txt extension so that github would let me attach them).
Thank you for your help,
Aude
[param_file_Toronto_downsampled.txt](https://github.com/ctlab/GADMA/files/9270591/param_fil Toronto_4fold_r_u_dadi_format_good.fs.txt e_Toronto_downsampled.txt)
Hi Aude,
Glad you continue working with GADMA. Thank you very much for this feedback, I just check GADMA's code and I think that I know why it is happening. Funny, that it did not happen before. I think that upper split bound caused it a little.
I need some time to fix it. Probably tomorrow I will make a new release of GADMA with update. Sorry for the inconvenience!
Best regards, Ekaterina
Now I am not so sure. Could you please check if this error will happen if you remove upper split bound?
Could you also tell what versions of numpy and scipy do you have?
Hi Ekaterina !
I'm glad to use it again ! It worked perfectly on my simulated VCF, but I now have to use it on my real data through sfs. I just tryed to re-run it after removing the upper split bound but I have a similar (yet not exactly the same on the last line) error. `gadma -p param_file_Toronto_downsampled -o ./test UserWarning: Parameters will be in genetic units (Relative parameters). Engines dadi and moments require mutation rate and sequence length for unit translation (/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/cli/settings_storage.py:1360) UserWarning: Code for momentsLD will not be generated as: VCF input data is required. (/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/cli/settings_storage.py:1477) Data reading False WARNING:Spectrum_mod:Creating Spectrum with data_folded = True, but data has non-zero values in entries which are nonsensical for a folded Spectrum. WARNING:Spectrum_mod:Creating Spectrum with data_folded = True, but mask is not True for all entries which are nonsensical for a folded Spectrum. UserWarning: Spectrum file /scratch/projects/trifolium/glue/demography/glue_demography/results/gadma/Toronto/Toronto_4fold_r_u_dadi_format_good.fs is in an old format - without population labels, so they will be taken from the corresponding parameter: RURAL, URBAN. (/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/engines/dadi_moments_common.py:348) Number of populations: 2 Projections: [30, 30] Population labels: ['RURAL', 'URBAN'] Outgroup: False --Successful data reading--
--Successful arguments parsing--
Parameters of launch are saved in output directory: /scratch/projects/trifolium/glue/demography/glue_demography/results/gadma/test/params_file All output is saved in output directory: /scratch/projects/trifolium/glue/demography/glue_demography/results/gadma/test/GADMA.log
--Start pipeline-- Run launch number 1 RuntimeWarning: Mean of empty slice. (/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3474) UserWarning: Additional evaluation for theta. Nothing to worry if this warning is seldom. (/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/engines/dadi_moments_common.py:138)
[000:01:00]
All best by log-likelihood models
Number log-likelihood Model
UserWarning: Additional evaluation for theta. Nothing to worry if this warning is seldom. (/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/engines/dadi_moments_common.py:138)
Traceback (most recent call last):
File "/home/caizergu/.conda/envs/gadma_test2/bin/gadma", line 8, in
My numpy version is : 1.22.4 My Scipy version is : 1.8.1
Thank you for always such rapid answers !
Aude
Hi Aude,
Thanks for the information. I think that my first guess about the bug might be correct and I will fix it. Unfortunately, I think that it will not help a lot in your case. The error seems to happen because dadi fails to evaluate likelihood each time. I tried to reconstruct your run but using some spectrum I have and it worked fine. Numpy and scipy versions should be also okay.
So I would recommend to check your spectrum because I see the following warnings in your output:
WARNING:Spectrum_mod:Creating Spectrum with data_folded = True, but data has non-zero values in entries which are nonsensical for a folded Spectrum.
WARNING:Spectrum_mod:Creating Spectrum with data_folded = True, but mask is not True for all entries which are nonsensical for a folded Spectrum.
Maybe it is the one causing the whole thing.
Please check if you have folded or unfolded spectrum and if it should be folded then this warnings tell us that the mask is not correct for the folded spectrum. It is better to use dadi (or moments) for correct folding.
I am not sure but those are my thoughts by now. I will put new release in four-five hours. If the SFS is okay and the problem will be still happening then I suggest to send me all your files by e-mail (of course only if you wish and can do it) and I will run my tests to reveal the problem.
Best regards, Ekaterina
Hi !
Yes indeed I thought it might have been coming from the SFS... I created a 2dsfs with ANGSD and converted it to dadi format with https://github.com/z0on/2bRAD_denovo/blob/master/realsfs2dadi.pl (folded). I also had to manually change the header to match the number of individuals. Do you have a better process to create these entry files ? Also, I send you by email the files (sfs and param_file).
Thank you for your help again!
Aude
Hi Aude,
I am not sure about script you used. I know one more from moments: parse_angsd_output.py. Maybe you can try that one?
Best regards, Ekaterina
Thank you very much for your files, I checked SFS and it is read incorrectly - all entries are masked out. That is why the error occurs. I fixed it but now some tests fall and I need to fix them before the release.
I suggest to try the script from moments and hope that everything will work well then.
Best regards, Ekaterina
Hi Audi,
I have finished correction of tests and everything is working fine. You can install updated version of GADMA by:
pip install -i https://test.pypi.org/simple/ gadma
Keep me updated about your SFS.
Best regards, Ekaterina
Thanks for everything ! I'll try that on moments then. Do you have any idea of the input format of the sfs from dadi to go into parse_angsd_output.py https://bitbucket.org/simongravel/moments/src/main/examples/fs_from_angsd/parse_angsd_output.py ? I can't quite figure if it's the 2sfs from realSFS command in ANGSD or the dadi formated sfs output by *realSFS dadi. *Do you have a clue ?
Thanks again, I'll keep you posted !
Aude
Le lun. 8 août 2022 à 14:44, Ekaterina Noskova @.***> a écrit :
Hi Audi,
I have finished correction of tests and everything is working fine. You can install updated version of GADMA by: pip install -i https://test.pypi.org/simple/ gadma
Keep me updated about your SFS.
Best regards, Ekaterina
— Reply to this email directly, view it on GitHub https://github.com/ctlab/GADMA/issues/73#issuecomment-1208080046, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWZYEH76VJKGMXCG7GP7RJLVYD6LZANCNFSM55WYKCYA . You are receiving this because you authored the thread.Message ID: @.***>
--
Aude CAIZERGUES, PhD Postdoctoral researcher Department of Biology University of Toronto - Mississauga @.*** @.**>
As I know it should convert output from ANGSD to SFS for dadi/moments.