Partial least square regression versus domain invariant partial least square regression with application to near-infrared spectroscopy of fresh fruit

P. Mishra, R. Nikzad-Langerodi. Partial least square regression versus domain invariant partial least square regression with application to near-infrared spectroscopy of fresh fruit. Infrared Physics & Technology, DOI 10.1016/j.infrared.2020.103547, 10, 2020.

Autoren
  • Puneet Mishra
  • Ramin Nikzad-Langerodi
TypArtikel
JournalInfrared Physics & Technology
DOI10.1016/j.infrared.2020.103547
Monat10
Jahr2020
Abstract

Calibration models required for near-infrared (NIR) spectroscopy-based analysis of fresh fruit frequently fail to extrapolate adequately to conditions not encountered during initial data acquisition. Such different conditions can be due to physical, chemical or environmental effects and might be encountered for instance when measurements are carried out on a new instrument, at different sensor operating temperatures or if the model is applied to samples harvested under different seasonal conditions. To cope with such changes efficiently, this study investigates the application of domain-invariant partial least square (di-PLS) regression to obtain calibration models that maintain the performance when used on a new condition. In particular, di-PLS allows unsupervised adaptation of a calibration model to a new condition, i.e. without the need to have access to reference measurements (e.g. dry matter contents) for the samples analyzed under the new condition. The potential of di-PLS for compensation of instruments/seasons and sensor temperature changes is demonstrated on four different use cases in the realm of NIR-based fruit quality assessment. The results showed that di-PLS regression outperformed standard PLS regression when tested on data affected by the aforementioned factors. The prediction R2 increased by up to 67 % with a 46 % and 80 % decrease in RMSEP and prediction bias, respectively. The main limitation of di-PLS is that, to operate efficiently, it requires that the distribution of the response variables to be similar in the data from the different conditions.