Selection of useful predictors in multivariate calibration

  1. Forina, M. 2
  2. Lanteri, S. 2
  3. Oliveros, M.C.C. 2
  4. Millan, C.P. 1
  1. 1 Universidad de La Rioja
    info

    Universidad de La Rioja

    Logroño, España

    ROR https://ror.org/0553yr311

  2. 2 Dept. Pharmaceutical Food Chem. T., University of Genova, Via Brigata Salerno (s/n), 16147 Genova, Italy
Revista:
Analytical and Bioanalytical Chemistry

ISSN: 1618-2642

Año de publicación: 2004

Volumen: 380

Número: 3 SPEC.ISS.

Páginas: 397-418

Tipo: Artículo

DOI: 10.1007/S00216-004-2768-X SCOPUS: 2-s2.0-9944230061 WoS: WOS:000224863000008 DIALNET GOOGLE SCHOLAR

Otras publicaciones en: Analytical and Bioanalytical Chemistry

Repositorio institucional: lock_openAcceso abierto Editor

Resumen

Ten techniques used for selection of useful predictors in multivariate calibration and in other cases of multivariate regression are described and discussed in terms of their performance (ability to detect useless predictors, predictive power, number of retained predictors) with real and artificial data. The techniques studied include classical stepwise ordinary least-squares (SOLS), techniques based on the genetic algorithms, and a family of methods based on partial least-squares (PLS) regression and on the optimization of the predictive ability. A short introduction presents the evaluation strategies, a description of the quantities used to evaluate the regression model, and the criteria used to define the complexity of PLS models. The selection techniques can be divided into conservative techniques that try to retain all the informative, useful predictors, and parsimonious techniques, whose objective is to select a minimum but sufficient number of useful predictors. Some combined techniques, in which a conservative technique is used to perform a preliminary selection before the use of parsimonious techniques, are also presented. Among the conservative techniques, the Westad-Martens uncertainty test (MUT) used in Unscrambler, and uninformative variables elimination (UVE), developed by Massart et al., seem the most efficient techniques. The old SOLS can be improved to become the most efficient parsimonious technique, by means of the use of plots of the F-statistics value of the entered predictors and comparison with parallel results obtained with a data matrix with random data. This procedure indicates correctly how many predictors can be accepted and substantially reduces the possibility of overfitting. A possible alternative to SOLS is iterative predictors weighting (IPW) that automatically selects a minimum set of informative predictors. The use of an external evaluation set, with objects never used in the elimination of predictors, or of "complete validation" is suggested to avoid overestimate of the prediction ability. © Springer-Verlag 2004.