gan
gan copied to clipboard
📜 the Great Automatic Nomenclator — The Next Million Names for Archaea and Bacteria
GAN: The Great Automatic Nomenclator
The Next Million Names for Archaea and Bacteria
Citation
Mark J. Pallen et al. The Next Million Names for Archaea and Bacteria, Trends in Microbiology (2020). DOI: 10.1016/j.tim.2020.10.009
Principle
To generate a large number of new names, we apply a combinatorial approach starting with two or three sets of curated roots, that are processed to produce all their possible combinations while keeping trace of their grammatical metadata to draft a valid etymology.

Dependencies
The scripts in this repository require Python (at least 3.6) and these modules:
- itertools (ships with Python)
- pandas (>1.0)
- xlrd (1.2.0)
To run the scripts of this repository, we suggest to create a conda environment as follows:
conda create -c conda-forge -n gan python=3.8 pandas pip ipython
conda activate gan
pip install xlrd==1.2.0
Genera generator
A set of two (or three) Excel tables formatted as shown below is used to generate the list of combinations in JSON, HTML and LaTeX format.

Synopsis:
usage: gan-genus.py [-h] -1 FIRST -2 SECOND [-3 THIRD] -o OUTDIR [-p PREFIX] [-c CONNECTOR] [-v]
For full usage and installation instructions, please check the documentation.
Example output
Using three small files in the input_test directory (8, 11 and 8 words, respectively), GAN produced 968 (8 x 11 x 8)combinations:
- in PDF format
- in HTML format
Etymology
"The great automatic nomenclaturer" is a reference to a short story ("The Great Automatic Grammatizator") written by the British author Roald Dahl [link].