FakeNewsNet icon indicating copy to clipboard operation
FakeNewsNet copied to clipboard

How to split FakeNewsNet into the training and testing sets?

Open lenhhoxung86 opened this issue 8 years ago • 3 comments

Hello, I've read your paper but I don't know how to split your dataset the same way you did in your paper. Could you please provide the training and testing sets separately? Thanks.

lenhhoxung86 avatar Jan 11 '18 10:01 lenhhoxung86

@lenhhoxung86 sir you can try splitting the dataset using the following code

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(new_name,z, test_size=0.2)

This will split the dataset into 80:20 ratio 80% for training and 20% for testing

sailee18dalvi avatar Sep 05 '18 14:09 sailee18dalvi

The problem is that I know how to split the dataset, but I don't know how to achieve the same splitting as they did in their paper. Therefore, it would be not fair when comparing results.

lenhhoxung86 avatar Sep 05 '18 15:09 lenhhoxung86

@lenhhoxung86 For our recent experiments, we have used the sample code provided in the sklearn example for splitting the data as train and test set.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

For most experiments, we have used the random state as 42 and if you use the same random state you will most likely get the same train/test sets. In future versions of dataset, we'll update the train/test sets used for the experiments.

mdepak avatar Sep 05 '18 17:09 mdepak