data-formatter icon indicating copy to clipboard operation
data-formatter copied to clipboard

Error: ValueError: '"continuous"' is not in list

Open niklaszantner opened this issue 9 years ago • 2 comments

I'm experiencing an interesting error:

message from Python: *********************************************************************
message from Python: 
message from Python: Warning, we have received a value in the first row that is not valid:
message from Python: "continuous"
message from Python: Please remember that the first row must contain information describing that column of data
message from Python: Acceptable values are: "ID", "Output Category", "Output Multi-Category", "Output Regression", "Continuous", "Categorical", "Date", "IGNORE", "Validation Split", and "NLP", though they are not case sensitive.
message from Python: 
message from Python: The column index of this unexpected value is:
message from Python: 0
message from Python: The entire row that we received is:
message from Python: ['"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"output category"', ' ']
message from Python: *********************************************************************
message from Python: This is an error that prevents the rest of the prorgram from running. Please fix and run machineJS again.

I know that this error occurs when I'm running machineJS but it also occurs with data-formatter installed via npm install -g data-formatter.

The .csv files I'm look like this (only the first lines and yes, those are from the current numer.ai):

training data (numerai_training_data.csv):

"Continuous","Continuous","Continuous","Continuous","Continuous","Continuous","Continuous","Continuous","Continuous","Continuous","Continuous","Continuous","Continuous","Continuous","Continuous","Continuous","Continuous","Continuous","Continuous","Continuous","Continuous","Output Category", 
"feature1","feature2","feature3","feature4","feature5","feature6","feature7","feature8","feature9","feature10","feature11","feature12","feature13","feature14","feature15","feature16","feature17","feature18","feature19","feature20","feature21","target"
0.86864091396194,0.506736891211661,0.612936323346674,0.938594725439847,0.497599118270575,0.666396780090536,0.39187077660641,0.727678764938448,0.150861110163878,0.772584165855727,0.689308577276422,0.860138928538823,0.214899033241811,0.629553604161714,0.242945084419877,0.0733669816596503,0.275066846363697,0.445346760764699,0.508648861192553,0.230880938243752,0.594808578272208,1
0.187006901664161,0.830565721067219,0.50777134657673,0.346875532490266,0.41332329728442,0.470310621632215,0.948287627170581,0.253222134175955,0.825946417585556,0.596589174930343,0.579960501169059,0.763485328236244,0.723233338462968,0.298057535056738,0.729876115901926,0.808066360958326,0.364541079113559,0.573676732740155,0.561999610847449,0.395796299453029,0.337658911516832,0

here is also the tournament data (numerai_tournament_data.csv):

'ID','Continuous','Continuous','Continuous','Continuous','Continuous','Continuous','Continuous','Continuous','Continuous','Continuous','Continuous','Continuous','Continuous','Continuous','Continuous','Continuous','Continuous','Continuous','Continuous','Continuous','Continuous'
"t_id","feature1","feature2","feature3","feature4","feature5","feature6","feature7","feature8","feature9","feature10","feature11","feature12","feature13","feature14","feature15","feature16","feature17","feature18","feature19","feature20","feature21"
19778,0.652450941772408,0.454574228014018,0.270804859628407,0.161097880875707,0.905690304463858,0.295944221546047,0.163038610622454,0.853296442428432,0.181040164079697,0.524846886624367,0.405589768854382,0.300021012452414,0.942145182118905,0.332197669804339,0.763536894453461,0.673533824569794,0.524846098121362,0.180003787962373,0.929883021866594,0.54392047168455,0.543158600587601
21465,0.560970703270285,0.51002996638114,0.51939683127632,0.113067422344801,0.183254285878405,0.45550499202652,0.845135463201596,0.411922967229636,0.77756563362742,0.900449910639058,0.915049964295094,0.996133302998115,0.316080113778575,0.313864881224719,0.802118173321136,0.84367550559471,0.637792172884529,0.574702376141301,0.25118322548224,0.611590593003519,0.919855546585812

Having a look at the validatoin.py line 69, it tells me that the expected values are one of those:

['id','continuous','groupby continuous','categorical','groupby categorical','date','groupby date','ignore', 'validation split', 'nlp']

So I don't understand, why (from validatoin.py line 81):

 ['"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"continuous"', '"output category"', ' ']

don't match? The joinDataDescription = [x.lower() for x in row] in join.py works fine as we know form the error message, but why every string wrapped in " " and ' ', like this: '"continuous"'? Shouldn't it be only 'continuous' to math the objects?

I also tried it with "Continuous" instead of 'Continuous', but that didn't work either. I guess there is something wrong with the join.py:

OUTDATED continue below

try:
    dialect = csv.Sniffer().sniff(joinFile.read(2048))
    joinFile.seek(0)
except:
    dialect = 'excel'
joinRows = csv.reader(joinFile, dialect)

I guess the dialect selection could screw up here. Anyone else having an idea or a similar problem?

PS: I'm new to python, sorry! And thanks for your great npm packages, all of them, that makes it really easy to start with machine learning!

EDIT found it, the error has been hiding somewhere else:

adding

if name.startswith('"') and name.endswith('"'):
     name = name[1:-1]

to validation.py line 24 fixed it. So the error is in validation.dataDescription() not as I thought in validation.joinDataDescription().

Of course what I did is only a dirty patch, but it at least finally get's it running on my setup :D. I will still not close this, as it is simply not cleanly fixed.

niklaszantner avatar Jun 29 '16 12:06 niklaszantner

Also someone should write the python version in the README.

niklaszantner avatar Jun 29 '16 13:06 niklaszantner

Same occurs in concat.py line 164 as

idHeader = testingHeader[ testingDataDescription.index('id') ]

can't find '"id"' (note the quotes).

niklaszantner avatar Jun 29 '16 16:06 niklaszantner