Question: Programmatically creating splines and applying knots to new data
I have found that I can create a spline on training data and then apply to test data like this:
#create x1= dmatrix("cr(x, df=3) - 1", {"x":TRAIN_DATA.VARIABLE.values}) #apply xx1=build_design_matrices([x1.design_info], {"x":TEST_DATA.VARIABLE.values })
This works but of course requires manually creating variables or trying to programatically creating strings.
Is there anyway to do something like this patsy.cr(x, df=5)
and grab the knots to apply to new data using the same function cr()?
I'm not really an expert, so there's likely an oversight here.
First, do you need to know the knots for some reason? If not, I think the canonical way would be to do something like...
# Build the design matrix
x = np.arange(100)
dm = patsy.dmatrix('cr(x, df=5)', {'x': x})
# Apply design matrix to new data...
new_data = np.arange(25, 75)
patsy.dmatrix(dm.design_info, {'x': new_data})
If you really want to know what the knots were, you could probably dig through the dm.design_info object and find it.
However, it may be a little easier to pull the CR class out of the cr stateful transform function.
cr = patsy.cr.__patsy_stateful_transform__()
cr.memorize_chunk(x, df=5)
cr.memorize_finish()
cr._all_knots
You could also apply to the new data using...
cr.transform(new_data)