How to run the example?
Hi, I'm opening a new issue for this problem, since I could not find any information.
I've manage to run the sampling.py with the wang dataset, it runs and it generates the pairs. So far I've just copy/pasted the entry in the README of the example.
Now I would like to run distance.py, but according to the documentation:
python distance.py \
--distance_pairs 1M_nysiis_balanced.json \
--distance_model linkage.dat \
--input_signatures input/signatures.json \
--input_records input/records.json \
--input_ethnicity_estimator ethnicity_estimator.pickle \
--verbose 3
What should I use as ethnicity_estimator.pickle?
What should I use as ethnicity_estimator.pickle
The result of: https://github.com/inspirehep/beard/blob/master/examples/applications/author-disambiguation/ethnicity.py
Please note that it's not so simple to get the data needed by the ethnicity estimator. However, a pretty good disambiguation can be run without it, simply by skipping this parameter
where is ethnicity_estimator.pickle?
Ethnicity estimator was trained on data that is not publicly available and this we could not make trained estimator publicly available in the repo.