zingg New phase assessModel

Write a python script which whill expose the model stats - confusion matrix and number of records marked, unmarked, matches, non matches, not sure.

We will use the Labeller class. The python script takes the conf and passes it to the Client. Client will invoke the Labeller. Refer to the python api example at https://github.com/zinggAI/zingg/blob/main/api/scala/FebrlExample.py.

The script calls getMarkedRecords, getMarkedRecordsStat, getUnmarkedRecords on the Client and provides the stats. You can convert the df returned by the Client to python df. To build the confusion matrix, following can be used.

import pandas as pd import seaborn as sn import matplotlib.pyplot as plt

confusion_matrix = pd.crosstab(markedRecords['z_isMatch'], markedRecords['z_prediction'], rownames=['Actual'], colnames=['Predicted'])

sn.heatmap(confusion_matrix, annot=True) plt.show()

May 26 '22 07:05 sonalgoyal

I have added new methods on the client - getMatchedMarkedRecordsStat(Dataset<Row> markedRecords), getUnmatchedMarkedRecordsStat(Dataset<Row> markedRecords), getUnsureMarkedRecordsStat and getMarkedRecords()

you can use them to build the logic

May 26 '22 16:05 sonalgoyal

@RavirajBaraiya

May 26 '22 16:05 sonalgoyal

Confusion Matrix looks like below

Jun 06 '22 19:06 navinrathore

Generated Config File from Arguments object ArgumentsToFile.txt

Jun 06 '22 19:06 navinrathore

Statistics for model 100

No. of Records Marked   :  76
No. of Records UnMarked :  72
No. of Matches          :  14
No. of Non-Matches      :  24
No. of Not Sure         :  0

Jun 06 '22 19:06 navinrathore

need to look at the right model internally for this - should be expose label model or should we expose the actual model

Jun 15 '22 18:06 sonalgoyal