Running the `evodiff/generate.py` script
I've been having trouble with getting the conda environment to work properly, so this may be exacerbating the issue below.
(evodiff3) ubuntu@209-20-159-77:~/evodiff_repo$ python evodiff/generate.py --model-type oa_dm_38M --num-seqs 100
Traceback (most recent call last):
File "evodiff/generate.py", line 323, in <module>
main()
File "evodiff/generate.py", line 40, in main
data = UniRefDataset('data/uniref50/', 'train', structure=False, max_len=2048)
File "/home/ubuntu/miniconda3/envs/evodiff3/lib/python3.8/site-packages/sequence_models/datasets.py", line 330, in __init__
with open(data_dir + 'splits.json', 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/uniref50/splits.json'
(evodiff3) ubuntu@209-20-159-77:~/evodiff_repo$
@Amelie-Schreiber I ran to the same error and I found that you have to download the uniref50 (from https://github.com/microsoft/evodiff/issues/10#issuecomment-1747536718) to run the code.
I believe it's not necessary, you can hack the code to bypass it.
-
Comment out the code https://github.com/microsoft/evodiff/blob/32e3bd8d1ada1d786795e3ba1b84c855b22b4702/evodiff/generate.py#L40-L41 https://github.com/microsoft/evodiff/blob/32e3bd8d1ada1d786795e3ba1b84c855b22b4702/evodiff/generate.py#L128-L129 https://github.com/microsoft/evodiff/blob/32e3bd8d1ada1d786795e3ba1b84c855b22b4702/evodiff/generate.py#L148
-
Add one line of code https://github.com/microsoft/evodiff/blob/32e3bd8d1ada1d786795e3ba1b84c855b22b4702/evodiff/generate.py#L130
+ # the sequence length you want to sample from, for example (30, 200)
+ seq_len = np.random.choice(np.arange(30, 200))
- run the bash command
export AMLT_OUTPUT_DIR=YOUR_OUTPUT_DIR; python evodiff/generate.py --model-type oa_dm_38M --num-seqs 10 --amlt`