zingg icon indicating copy to clipboard operation
zingg copied to clipboard

New phase assessModel

Open sonalgoyal opened this issue 3 years ago • 6 comments

Write a python script which whill expose the model stats - confusion matrix and number of records marked, unmarked, matches, non matches, not sure.

We will use the Labeller class. The python script takes the conf and passes it to the Client. Client will invoke the Labeller. Refer to the python api example at https://github.com/zinggAI/zingg/blob/main/api/scala/FebrlExample.py.

The script calls getMarkedRecords, getMarkedRecordsStat, getUnmarkedRecords on the Client and provides the stats. You can convert the df returned by the Client to python df. To build the confusion matrix, following can be used.

import pandas as pd import seaborn as sn import matplotlib.pyplot as plt

confusion_matrix = pd.crosstab(markedRecords['z_isMatch'], markedRecords['z_prediction'], rownames=['Actual'], colnames=['Predicted'])

sn.heatmap(confusion_matrix, annot=True) plt.show()

sonalgoyal avatar May 26 '22 07:05 sonalgoyal

I have added new methods on the client - getMatchedMarkedRecordsStat(Dataset<Row> markedRecords), getUnmatchedMarkedRecordsStat(Dataset<Row> markedRecords), getUnsureMarkedRecordsStat and getMarkedRecords()

you can use them to build the logic

sonalgoyal avatar May 26 '22 16:05 sonalgoyal

@RavirajBaraiya

sonalgoyal avatar May 26 '22 16:05 sonalgoyal

Confusion Matrix looks like below image

navinrathore avatar Jun 06 '22 19:06 navinrathore

Generated Config File from Arguments object ArgumentsToFile.txt

navinrathore avatar Jun 06 '22 19:06 navinrathore

Statistics for model 100

No. of Records Marked   :  76
No. of Records UnMarked :  72
No. of Matches          :  14
No. of Non-Matches      :  24
No. of Not Sure         :  0

navinrathore avatar Jun 06 '22 19:06 navinrathore

need to look at the right model internally for this - should be expose label model or should we expose the actual model

sonalgoyal avatar Jun 15 '22 18:06 sonalgoyal