SuperStyl
SuperStyl copied to clipboard
Documentation
- State clearly which step is compulsory is compulsory and which one is not at the beginning
- State clearly what kind of data one will need: a. a reference and a test set? b. 1 file/per author? Or multiple files is OK if it starts with the same name?
- Give a number to the three steps to be clear about the order (and the fact that there are three steps, the second being optional)
- Give an example of
debug_authors.csv,feature_list.json,feats_tests.csvlangcert_revised.csv… so that we know what kind of data you expect (what is a column, what is a row…) - Move Alternatively, you can choose to do not specific split, but to use a leave-one-out approach. just under the title part so that it is clear that it is not a compulsory step
- Drop a couple of lines on how to choose the
--samplingoptions - Provide an example to play with, so that people ca check if everything works fine and observe the structure of the data
With that you should solve a lot of problems (and avoid a lot of emails like mine)
Here is my script :
python main.py -s train/* -t chars -n 3
mv feats_tests_n3_k_5000.csv train.csv
python main.py -s test/* -t chars -n 3 -f feature_list_chars3grams5000mf.json
mv feats_tests_n3_k_5000.csv test.csv
python train_svm.py train.csv --test_path test.csv --norms --final
Notice that, for the first main.py, I get "K Limit ignored because the size of the list is lower (3302 < 5000)".
Then I get this error in from svm.py l. 190 :
myclasses = pipe.classes_
decs = pipe.decision_function(test)
dists = {}
for myclass in enumerate(myclasses):
dists[myclass[1]] = [d[myclass[0]] for d in decs]
-->
dists[myclass[1]] = [d[myclass[0]] for d in decs]
IndexError: invalid index to scalar variable.