Send null values down both branches
WIP
Potential means of handling missing data that avoids overfitting. See feature-request #8249 and this link for the basic concept.
I added a new boolean into the training parameters (better names are, of course, welcome) and took a crack at refactoring evaluate_splits.h to incorporate the algorithm when searching for splits with continuous features. The idea is instead of simply doing a backward sweep when there is a discrepancy between left_sum and the total gradient and hessian, do a backward sweep while adding 1/(iend-ibegin) times that discrepancy to the left_sum for each iteration in the sweep. This way, the residual gradients and Hessians from null values are added to left_sum incrementally throughout the sweep.
If how the code was refactored seems acceptable, I can repeat the pattern on methods that handle categorical and one hot encoded feature, then move on to inference and testing.
@trivialfis
Thank you for working on this. I will look deeper into the PR