language
language copied to clipboard
Where I can get valid json?
I try run this for jsonl
python -m language.question_answering.bert_joint.prepare_nq_data \
--logtostderr \
--input_jsonl ~/data/nq-train-??.jsonl.gz \
--output_tfrecord ~/output_dir/nq-train.tfrecords-00000-of-00001 \
--max_seq_length=512 \
--include_unknowns=0.02 \
--vocab_file=bert-joint-baseline/vocab-nq.txt
but get error
Traceback (most recent call last):
File "/home/achaptykov/anaconda3/envs/py36-test/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/achaptykov/anaconda3/envs/py36-test/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/data1/achaptykov/model/googlebert/language/question_answering/bert_joint/prepare_nq_data.py", line 90, in <module>
tf.app.run()
File "/data1/achaptykov/model/googlebert/env/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/data1/achaptykov/model/googlebert/language/question_answering/bert_joint/prepare_nq_data.py", line 69, in main
for example in get_examples(FLAGS.input_jsonl):
File "/data1/achaptykov/model/googlebert/language/question_answering/bert_joint/prepare_nq_data.py", line 59, in get_examples
for line in input_file:
File "/home/achaptykov/anaconda3/envs/py36-test/lib/python3.6/gzip.py", line 374, in readline
return self._buffer.readline(size)
File "/home/achaptykov/anaconda3/envs/py36-test/lib/python3.6/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/home/achaptykov/anaconda3/envs/py36-test/lib/python3.6/gzip.py", line 463, in read
if not self._read_gzip_header():
File "/home/achaptykov/anaconda3/envs/py36-test/lib/python3.6/gzip.py", line 406, in _read_gzip_header
magic = self._fp.read(2)
File "/home/achaptykov/anaconda3/envs/py36-test/lib/python3.6/gzip.py", line 91, in read
self.file.read(size-self._length+read)
TypeError: can't concat str to bytes
I think you can get the data here: gsutil -m cp -R gs://natural_questions/v1.0