Introduction

Traditional machine learning models, such as linear regressor and random forest regressor, only use one model to predict output features. The purpose of a multi-output regressor is to extend this: it takes a list of regressors and fits one estimator per target. This allows for simultaneous calculations potentially more accurate results, especially when the outputs are correlated.

How It Works

Imported from sci-kit learn, the multi-output regressor first requires a list of estimators, including but not limited to linear regression, Bayesian process, logistic and poisson regression. The algorithm used for the code is random forest, which creates random series of decision trees. These trees are independent of each other and predict values as a whole for regression, meaning that the predicted values from each tree are averaged. This prevents overfitting and biased results. Similarly to other algorithms, the multi-output regressor processed by random forest splits data for training and testing.

In the code, there are also functions that allow for setting up the data for the multi-output regressor, graphing and running everything. The data preparation function requires the target columns and the file path and simply returns split dataset from the converted file to DataFrame. The graphing function is optional by setting graph = False when calling the master function, which is run_rfmo_model(). To run the whole code, simply call this function with optional parameters: target abundances, graphing and the number of estimators.

Link to the code: https://github.com/utat-ss/FINCH-Science/blob/main/Unmixing_Methods/Unconventional_Unmixing_Methods/Multi-Output Regressor.py

Results

Mean Squared Error: 0.026715565542984293

R-squared: 0.722817031505144

R-squared for npv only: 0.6155036984351795

This code was tested with simpler_data.csv from the UTAT Space Systems Division Google drive.