Running the `evodiff/generate.py` script

Open Amelie-Schreiber opened this issue 2 years ago • 1 comments

I've been having trouble with getting the conda environment to work properly, so this may be exacerbating the issue below.

(evodiff3) ubuntu@209-20-159-77:~/evodiff_repo$ python evodiff/generate.py --model-type oa_dm_38M --num-seqs 100
Traceback (most recent call last):
  File "evodiff/generate.py", line 323, in <module>
    main()
  File "evodiff/generate.py", line 40, in main
    data = UniRefDataset('data/uniref50/', 'train', structure=False, max_len=2048)
  File "/home/ubuntu/miniconda3/envs/evodiff3/lib/python3.8/site-packages/sequence_models/datasets.py", line 330, in __init__
    with open(data_dir + 'splits.json', 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/uniref50/splits.json'
(evodiff3) ubuntu@209-20-159-77:~/evodiff_repo$

Dec 30 '23 00:12 Amelie-Schreiber

@Amelie-Schreiber I ran to the same error and I found that you have to download the uniref50 (from https://github.com/microsoft/evodiff/issues/10#issuecomment-1747536718) to run the code.

I believe it's not necessary, you can hack the code to bypass it.

Comment out the code https://github.com/microsoft/evodiff/blob/32e3bd8d1ada1d786795e3ba1b84c855b22b4702/evodiff/generate.py#L40-L41 https://github.com/microsoft/evodiff/blob/32e3bd8d1ada1d786795e3ba1b84c855b22b4702/evodiff/generate.py#L128-L129 https://github.com/microsoft/evodiff/blob/32e3bd8d1ada1d786795e3ba1b84c855b22b4702/evodiff/generate.py#L148
Add one line of code https://github.com/microsoft/evodiff/blob/32e3bd8d1ada1d786795e3ba1b84c855b22b4702/evodiff/generate.py#L130

+ # the sequence length you want to sample from, for example (30, 200)
+ seq_len = np.random.choice(np.arange(30, 200))

run the bash command

export AMLT_OUTPUT_DIR=YOUR_OUTPUT_DIR; python evodiff/generate.py --model-type oa_dm_38M --num-seqs 10 --amlt`

Mar 20 '24 09:03 chAwater