intro2stats icon indicating copy to clipboard operation
intro2stats copied to clipboard

Potential Error in Resampling Notebook: shuffle_experiment function

Open ramseyb4 opened this issue 7 years ago • 0 comments

In the shuffle_experiment function, I believe the accessing of the experiment_data array used to calculate the mean is off (see highlighted portion below). Specifically, the condition is grabbing the rows labeled 0 or 1 correctly, however, after that we need to only grab the second column of each row so as to exclude the label value from the mean calculation.

Current code:

def shuffle_experiment(number_of_times): experiment_diff_mean = np.empty([number_of_times,1]) for times in np.arange(number_of_times): experiment_label = np.random.randint(0,2,shoe_sales.shape[0]) experiment_data = np.array([experiment_label, shoe_sales[:,1]]).T experiment_diff_mean[times] = experiment_data[experiment_data[:,0]==1].mean()
- experiment_data[experiment_data[:,0]==0].mean()
return experiment_diff_mean

Proposed code:

def shuffle_experiment(number_of_times): experiment_diff_mean = np.empty([number_of_times,1]) for times in np.arange(number_of_times): experiment_label = np.random.randint(0,2,shoe_sales.shape[0]) experiment_data = np.array([experiment_label, shoe_sales[:,1]]).T experiment_diff_mean[times] = experiment_data[experiment_data[:,0]==1][:,1].mean()
- experiment_data[experiment_data[:,0]==0][:,1].mean() return experiment_diff_mean

The same issue exists in this block:

experiment_diff_mean = experiment_data[experiment_data[:,0]==1][:,1].mean()
- experiment_data[experiment_data[:,0]==0][:,1].mean()

ramseyb4 avatar May 24 '18 03:05 ramseyb4