Final report:

random_forest_hyperparamter_tuning.pdf

https://github.com/utat-ss/FINCH-Science/blob/main/Unmixing_Methods/Unconventional_Unmixing_Methods/RandomForest/hyperparameter_tuning/random_forest_hyperparameter_tuning.ipynb

The best possible R^2 I managed to achieve on validation are <0.63. Random Forest is constantly over fitting (to liker^2>.90 on the training data), and nothing I tried so far was able to deal with it. All of Sklearns hyperparameter tuning methods (GridSearchCV/RandomizedSearchCv) produce overfitting models and max the parameters.

We are using https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

Best tuning found so far:

# random state 42
# simpler_data
# 900 - 1700nm
train_X, validate_X, train_y, validate_y = sklearn.model_selection.train_test_split(spectra, npv_fractions, test_size=0.2, random_state=42)
rf = sklearn.ensemble.RandomForestRegressor(
    n_estimators=200,    
    max_depth=18,    
    max_features = 13,    
    min_samples_split=3,    
    min_samples_leaf=1,    
    min_impurity_decrease=0.0,    
    min_weight_fraction_leaf=0.0,    
    random_state = 42,)
rf.fit(train_X, train_y)
print("Training R^2:",round(rf.score(train_X,train_y),4)) # 0.9392
print("Validation R^2:",round(rf.score(validate_X,validate_y),4)) # 0.6309

Overview of Tunable Hyperparameters