Fused stagewise regression - A waveband selection algorithm for spectroscopy

B. Malli, T. Natschläger. Fused stagewise regression - A waveband selection algorithm for spectroscopy. Chemometrics and Intelligent Laboratory Systems, volume 149, pages 53-65, DOI 10.1016/j.chemolab.2015.09.004, 12, 2015.

  • Birgit Malli
  • Thomas Natschläger
JournalChemometrics and Intelligent Laboratory Systems
SeriePart B

While partial least squares (PLS) and principal component regression (PCR), the most popular regression techniques in chemometrics, may theoretically be able to deal with large numbers of possibly correlated variables, as occurring in the analysis of spectroscopic data, the importance of performing some form of variable selection in practical applications has been widely discussed and acknowledged. In this work we address this problem via proposing a sparse regression algorithm, referred to as Fused Stagewise Regression (FSR), which iteratively performs a selection of connected regions of variables (wavelengths), while being quite easy to implement and interpret, due to its resemblance to typical steps in iterative manual feature selection procedures. We evaluate the proposed variable selection technique on a publicly available benchmark data set and compare the performance of PLS models built on the determined selection to ones yield by state-of-the-art feature selection methods from the fields of chemometrics and machine learning. In order to ensure robust feature selection, we integrate the individual selection methods into an extensive repeated cross validation procedure. For the data set under investigation, it is shown that FSR performs at least as good as state-of-the-art approaches and well within the range of variable selections provided by experts.