New phase assessModel
Write a python script which whill expose the model stats - confusion matrix and number of records marked, unmarked, matches, non matches, not sure.
We will use the Labeller class. The python script takes the conf and passes it to the Client. Client will invoke the Labeller. Refer to the python api example at https://github.com/zinggAI/zingg/blob/main/api/scala/FebrlExample.py.
The script calls getMarkedRecords, getMarkedRecordsStat, getUnmarkedRecords on the Client and provides the stats. You can convert the df returned by the Client to python df. To build the confusion matrix, following can be used.
import pandas as pd import seaborn as sn import matplotlib.pyplot as plt
confusion_matrix = pd.crosstab(markedRecords['z_isMatch'], markedRecords['z_prediction'], rownames=['Actual'], colnames=['Predicted'])
sn.heatmap(confusion_matrix, annot=True) plt.show()
I have added new methods on the client - getMatchedMarkedRecordsStat(Dataset<Row> markedRecords), getUnmatchedMarkedRecordsStat(Dataset<Row> markedRecords), getUnsureMarkedRecordsStat and getMarkedRecords()
you can use them to build the logic
@RavirajBaraiya
Confusion Matrix looks like below

Generated Config File from Arguments object ArgumentsToFile.txt
Statistics for model 100
No. of Records Marked : 76
No. of Records UnMarked : 72
No. of Matches : 14
No. of Non-Matches : 24
No. of Not Sure : 0
need to look at the right model internally for this - should be expose label model or should we expose the actual model