Cross-validation and other model validation tools. Part 1 – A view on some model validation tools

In multivariate data analysis (MVDA) it is crucial to realize the difference between model fit and model predictivity. The fit, or R2, tells how well we are able to mathematically reproduce the data of the training set. The predictivity, or Q2, tells how well we are able to predict future data of a test set. Estimating “model validity” to avoid overfitting is a mandatory step for all kinds of multivariate models.

What do we then mean by the term “model validity”? We will mean by a valid model that it predicts much better than chance. A valid model should have model parameters estimated with little bias, which should have the correct sign and be large for important variables and small for unimportant variables. When interpreting the model, the interpretation should be in line with the existing chemical, biological and engineering knowledge. A valid model model has a well-defined applicability domain, which has been thoroughly assessed and found relevant for the scope of the model.  

Attend this webinar and understand what tools are available in SIMCA to estimate model validity for PCA, PLS, OPLS and O2PLS models. An introduction to cross-validation is given.