Incorrect data in training
Hi
This code looks wrong
print("Training True : {0} ({1:0.2f}%)".format(len(y_train[y_train[:] == 1]), (len(y_train[y_train[:] == 1])/len(df.index) * 100.0))) print("Training False : {0} ({1:0.2f}%)".format(len(y_train[y_train[:] == 0]), (len(y_train[y_train[:] == 0])/len(df.index) * 100.0))) print("Test True : {0} ({1:0.2f}%)".format(len(y_test[y_test[:] == 1]), (len(y_test[y_test[:] == 1])/len(df.index) * 100.0))) print("Test False : {0} ({1:0.2f}%)".format(len(y_test[y_test[:] == 0]), (len(y_test[y_test[:] == 0])/len(df.index) * 100.0)
Training True : 537 (69.92%) Training False : 537 (69.92%) Test True : 231 (30.08%) Test False : 231 (30.08%)
When counting the occurences of 1, with len(y_train[y_train[:] == 1]), it returns all the items match that. In fact, if you change the condition to ==5, it still returns the full length of the array
I was able to get the code segment to work properly by changing [:] to 'diabetes'.
rint("Training True : {0} ({1:0.2f}%)".format(len(y_train[y_train['diabetes'] == 1]), (len(y_train[y_train['diabetes'] == 1])/len(df.index) * 100.0)))
etc...
Thanks guys! I fixed the qualifier error. Good catch!