Searching parsimonious solutions with GA-PARSIMONY and XGboost in high-dimensional databases

  1. Martinez-de-Pison, F.J. 1
  2. Fraile-Garcia, E. 1
  3. Ferreiro-Cabello, J. 1
  4. Gonzalez, R. 1
  5. Pernia, A. 1
  1. 1 Universidad de La Rioja
    info

    Universidad de La Rioja

    Logroño, España

    ROR https://ror.org/0553yr311

Libro:
Advances in Intelligent Systems and Computing

ISSN: 2194-5357

ISBN: 978-331947363-5

Año de publicación: 2017

Volumen: 527

Páginas: 201-210

Tipo: Capítulo de Libro

DOI: 10.1007/978-3-319-47364-2_20 SCOPUS: 2-s2.0-84992465117 WoS: WOS:000405330000020 GOOGLE SCHOLAR

Resumen

EXtreme Gradient Boosting (XGBoost) has become one of the most successful techniques in machine learning competitions. It is computationally efficient and scalable, it supports a wide variety of objective functions and it includes different mechanisms to avoid overfitting and improve accuracy. Having so many tuning parameters, soft computing (SC) is an alternative to search precise and robust models against classical hyper-tuning methods. In this context, we present a preliminary study in which a SC methodology, named GA-PARSIMONY, is used to find accurate and parsimonious XGBoost solutions. The methodology was designed to optimize the search of parsimonious models by feature selection, parameter tuning and model selection. In this work, different experiments are conducted with four complexity metrics in six high dimensional datasets. Although XGBoost performs well with high-dimensional databases, preliminary results indicated that GAPARSIMONY with feature selection slightly improved the testing error. Therefore, the choice of solutions with fewer inputs, between those with similar cross-validation errors, can help to obtain more robust solutions with better generalization capabilities. © Springer International Publishing AG 2017.