Analytica Chimica Acta (v.544, #1-2)

Contents (vii-viii).

Editorial CAC 2004 by Lutgarde Buydens; Anna de Juan (1).

Analysis of linear dynamic systems of low rank by Satu-Pia Reinikainen; Kari Aaljoki; Agnar Höskuldsson (2-14).
The objective of this paper is to show how the procedures of traditional chemometrics like stepwise evaluation of the model, graphic analysis of the latent structure, etc., can be applied to common modeling methods in chemical engineering like for instance Kalman filtering. Procedures of how to find stable solutions for linear dynamic systems are presented. Different types of models are considered. The procedures are based on the H-principle of mathematical modeling. The basic idea is to approximate the solution by rank 1 parts. Each of them is found by optimizing the estimation and prediction part of the model. The approximations stop, when the prediction ability of the model cannot be improved for the present data. Therefore, the present methods give better prediction results than traditional methods that are based on exact solutions. The vectors used in the approximations can be used to carry out graphic analysis of the dynamic systems. It is shown how score vectors can display the low dimensional variation in data, the loading vectors display the correlation structure, and the transformation vectors how the variables generate the resulting variation in data; these graphic analysis have proven their importance in traditional chemometric methods. These graphics methods are important in supervising and controlling the process in light of the variation in data. The algorithms can provide with solutions of models having hundreds or thousands of variables. It is shown here how these algorithms can provide with solutions involving NIR data for process control having over 1000 variables.
Keywords: Dynamic models; H-principle; Low rank solutions; Supervision of processes; Graphics display of processes;

This work reports application of the multivariate curve resolution (MCR) and the two-dimensional correlation spectroscopy (2DCOS) to the study of temperature induced changes of conformational multiequilibria in water. The phenomena were simultaneously monitored by the mid-infrared (MIR) and the near-infrared (NIR) spectra. Ordinary methods of band shape analysis are not sufficient to distinguish sub-bands that can be unambiguously assigned to specifically hydrogen-bonded assembles. To achieve the best possible resolution of the multicomponent water system, the MCR analysis has been performed on an augmented data matrix. The 2DCOS calculations were carried out in hetero-spectral mode. The simultaneous analysis of the intensity changes for bands assigned to the fundamental stretching vibration and vibration that combines the fundamental stretching and bending vibrations of the OH groups pointed out on a two-component structure in terms of O―H coordination manifested by three spectral components that are in a different way involved in hydrogen bond. The use of MCR and 2DCOS for exploring the NIR and MIR spectra of water enabled to postulate one common picture of the temperature-induced structural changes of water that should be very useful in monitoring the conformational transitions of proteins embedded in aqueous environment.
Keywords: Water; Infrared spectroscopy; Multivariate curve resolution; Two-dimensional correlation spectroscopy;

A gas chromatographic method with mass spectrometry detection (GC/MS) has been developed for the analysis of eight hormones: diethylstilbestrol, hexestrol, dienestrol, nor-testosterone, methyl-testosterone, 17-α-estradiol, 17-β-estradiol and 17-α-ethynylestradiol. As the diastereoisomers, α and β-estradiol, coelute and both have the same mass fragments, the experimental design methodology has been applied for obtaining their chromatographic separation. In this paper, the temperature programme of the column with two ramps has been optimized to improve not only the resolution between the peaks of both isomers but also the quality of the peaks (sharp and symmetric peaks).Firstly, a 16-experiment screening design was carried out to determine with a reduced number of experiments which factors affect both the resolution and the peak width. The results concluded that the final temperature of the first ramp mainly affects the resolution whereas the final temperature of the second ramp influences both the resolution and the peak width.A two-factor Doehlert design was subsequently performed to fit a second-order model and jointly optimize the resolution and the peak width through a global desirability function. Thus, the final temperatures of the first and the second ramp are 260 and 300 °C, respectively. The method developed has been validated according to the European Decision 2002/657/EC, including the estimation of the capability of detection (CCβ, X 0  = 0, for banned substances) with evaluation of the probability of false positive, α, and of false negative, β.
Keywords: Estrogens; Androgens; α-Estradiol; β-Estradiol; Screening design; Doehlert design; Desirability; CCβ;

Application of independent component analysis to 1H MR spectroscopic imaging exams of brain tumours by F. Szabo de Edelenyi; A.W. Simonetti; G. Postma; R. Huo; L.M.C. Buydens (36-46).
The low spatial resolution of clinical 1H MRSI leads to partial volume effects. To overcome this problem, we applied independent component analysis (ICA) on a set of 1H MRSI exams of brain tumours. With this method, tissue types that yield statistically independent spectra can be separated. Up to three components, corresponding to necrosis, tumoral tissue and healthy tissue have been detected inside tumours. In non-agressive tumours, the “necrotic” component was absent, confirming that only agressive tumours exhibit high levels of lipids. In conclusion, the ICA algorithm allows to find useful hidden components in tumours. The reliability and robustness of the results have also been investigated by means of bootstrapping combined with unsupervised clustering. A comparison of ICA with a method of curve resolution, MCR-ALS, has also been performed.
Keywords: MRS; Brain tumours; Independent component analysis; MCR;

The presented work is concerned with spectroscopy solutions as means for feedback and feedforward control in the production of wet-processed hardboards. For this purpose online UV–vis–NIR spectroscopy was installed on a full-scale industrial hardboard production to collect spectra from fibre materials, intermediate fibremats and from final hardboards. The acquired spectra were subjected to following chemometric methods: (i) principal component analyses (PCA) to characterise raw materials and process variables; and (ii) partial least square (PLS) regression to investigate the linkages between spectral information and technological data that were determined on the final hardboards. The groupings according to different raw material types, fine and coarse fibre furnish, and the “H-factor” a measure that is indicative for the softening of lignin and the hydrolysis of hemicelluloses, clearly demonstrated that UV–vis–NIR spectroscopy is instrumental in feedback and feedforward control throughout the processing stages. Strong relationships were observed between UV–vis-online-spectra of the fibremats and the water uptake of the corresponding fibreboards. Additional plant trials are needed to investigate further applications that lead a full feedback and feedforward control of industrial manufacturing of wood-based panels.
Keywords: Feedback; Feedforward control; Chemometrids; Natural fibre; Wood composite; Hardboard; PCA; PLS;

Outliers in partial least squares regression by R. Lletí; E. Meléndez; M.C. Ortiz; L.A. Sarabia; M.S. Sánchez (60-70).
The process control of the elaboration of wines as well as the final quality of the product is at present incorporating non-destructive methods for the analysis so that they can be systematically applied anywhere in the process. MIR spectroscopy is an easy, fast and reproducible technique that allows obtaining several parameters from the same spectrum and even the calibration transfer among different instruments. Therefore, MIR spectroscopy is being routinely used in many oenological stations and wine cellars.As it provides non specific signals, a multivariate calibration is mandatory, usually by partial least squares regression (PLSR). However, wine samples present high variability due to their origin (varieties of vineyard, land, climate, etc.) and to the elaboration process, so that it is necessary to work with large number of samples possibly with important dissimilarities among them. This work studies some of these aspects by using 816 samples of wine representative of seven Spanish Denominations of Origin and measured at the Oenological Station of Haro, the official laboratory of the Qualified Denomination of Origin ‘Rioja’.The fact is that in the stage of building and validating a PLS regression model it is possible to identify samples that present spectral “abnormalities” and/or abnormalities in the response, but when using the built model with new spectra for predicting it is only possible to detect samples that present spectral abnormalities. In this context, it seems obvious that what we call “spectral abnormality” depends on the set used for training but also on the calibration itself, i.e. on the analytical response being modelled with the spectra. Thus, the paper is devoted to study the possibility of detecting by only using the (dis)similarity among spectra those samples declared abnormal in the successive steps of construction and validation of a partial least squares regression.The results shown are obtained by only modelling the alcoholic grade of the wines but the proposed solution is methodological and can be extended to calibrate any other parameter. The main conclusion is that on the studied data set, the samples declared abnormal (outliers) in the calibration step are not detected by only using the spectral (dis)similarity. However, the analysis of spectra shows the presence of a set of wines, not related with the outliers detected by PLS, whose presence in the calibration set is necessary to guarantee that the PLSR model built can be applied to future samples.
Keywords: Partial least squares regression; Outlier; Contingence table; Cluster analysis; Genetic algorithm; Mean infrared spectroscopy; MIR; Wine; Alcoholic grade;

This paper describes a novel approach to multivariate curve resolution (MCR) for rapid resolution of low-resolution chromatographic data from complex mixtures collected using a liquid-core Raman waveguide detector. Pure spectra with individual chromatographic profiles are extracted from HPLC–Raman chromatograms. In order to accurately and quickly extract the pure components, the data must be resolved by a series of smaller analyses and then recombined to form whole chromatograms. Discrete segments of individual resolved chemical and spectral features are combined based on correlation coefficients calculated across segments of the chromatogram. The method described here differs from traditional MCR by applying a series of small local models where the rank of each limited data segment can be more readily determined. The piecewise method also is much faster than a traditional MCR analysis of the full data set by requiring fewer iterations in an alternating least-squares optimization. The resulting estimated spectral profiles are highly correlated to pure analyte spectra making this an excellent method for rapid qualitative analysis for the identification of chemical species separated via low-resolution chromatography.
Keywords: Multivariate curve resolution; Window factor analysis; Liquid chromatography; Raman spectroscopy;

The development of a broad-based near infrared spectroscopy (NIRS) calibration can be considered as an alternative approach to overcome the narrow concentration range of some analytical parameters in fresh cheese such as protein, fat, total solids content, etc. Applying this approach consists in putting together in one single calibration samples from different types of fresh cheese.The broad-based NIRS calibration developed in this study to predict the total solids content of various fresh cheeses shows better performance characteristics than the specific calibrations. For instance, the validation set for fresh cheese with low total solids (MQ) applied into the specific calibration for MQ fresh cheese presents relative standard error of prediction (RSEP) of 2.21% and the broad-based calibration using three and five types of fresh cheeses shows RSEP of 1.90% and 1.72%, respectively. The better performance of the broad-based calibration is linked to a better coverage of the calibration range and an increase of the number of samples. Besides, this approach presented some additional benefits for the calibration model, such as an increased robustness and reliability. The other major advantage is the reduced necessary maintenance of the calibration since there is only one broad-based calibration instead of many specific ones. And also eliminates the risk of the operator selecting the wrong calibration model.
Keywords: Near infrared spectroscopy; Fresh cheese; Total solids content; Broad-based calibration; PLS calibration; Robustness;

This paper presents modifications to our recently introduced pre-processing method, orthogonal WAVElet correction (OWAVEC), based on the combination of wavelet analysis and an orthogonal correction algorithm, described in detail in a former paper [I. Esteban-Díez, J.M. González-Sáiz, C. Pizarro, OWAVEC: a combination of wavelet analysis and an orthogonalization algorithm as a pre-processing step in multivariate calibration, Anal. Chim. Acta 515 (2004) 31–41], aimed at extending its applicability and at improving its performance; thanks to an additional use of OWAVEC as an effective data compression tool. The OWAVEC method uses the discrete wavelet transform (DWT) to decompose each individual signal into the wavelet domain, and then an orthogonalization algorithm is applied to the obtained wavelet coefficients matrix to remove the information not related to a considered response variable. Later, the corrected wavelet coefficients are ranked by their variance or by their correlation coefficient with the response variable, and the subset providing the most stable and reliable calibration model is finally selected (data compression). The new version of OWAVEC has been applied to two NIR data sets to test its performance. For both regression problems studied, high quality calibration models with very high compression ratios were obtained, providing improved predictive results and a considerably lower overfitting than other orthogonal signal correction methods. The generalized OWAVEC method presented here may be used as a global tool for simultaneous noise suppression, data compression and orthogonal correction of signals.
Keywords: Wavelet analysis; Orthogonalization; Data compression; Multivariate calibration; OWAVEC;

Classification of bread wheat flours in different quality categories by a wavelet-based feature selection/classification algorithm on NIR spectra by Marina Cocchi; Maria Corbellini; Giorgia Foca; Mara Lucisano; M. Ambrogina Pagani; Lorenzo Tassi; Alessandro Ulrici (100-107).
In the Italian context, bread wheat flour is commercially classified in different quality categories on the basis of a Synthetic Index of Quality (Indice Sintetico di Qualità, ISQ), which is defined by means of specific parameters, i.e., hectolitric weight, falling number, protein content, alveographic indexes (W, P/L) and farinograph stability. The analyses involved in the determination of these parameters are expensive, time consuming and require specialized personnel, thus there is concern to develop alternative methods to be applied during the commercial transactions, when the products need to be characterized in very short times. For this reason, a fast technique such as an automated classification on the basis of NIR spectra acquired on the wheat flour samples could be a very useful tool.In this work, various wheat flour samples belonging to four different ISQ classes have been analysed by means of NIR spectroscopy, and the obtained spectra have been classified both by SIMCA applied to the signals subjected to different pretreatment methods, and by using a wavelet-based feature selection/classification algorithm, called WPTER. Due to the high overlap of the two intermediate quality classes, it was not possible to classify all the data set signals. However, when considering only the two extreme categories, an acceptable degree of class separation can be gained after feature selection by WPTER. Moreover, this approach allowed us to locate the NIR spectral regions that are mainly involved in the assignment of the wheat flour samples to these two quality categories.
Keywords: Bread wheat flour; NIR; Classification; SIMCA; Wavelet transform; WPTER;

DRIFT-IR for quantitative characterization of polymorphic composition of sulfathiazole by Kati Pöllänen; Antti Häkkinen; Mikko Huhtanen; Satu-Pia Reinikainen; Milja Karjalainen; Jukka Rantanen; Marjatta Louhi-Kultanen; Lars Nyström (108-117).
The quality assurance of the product during the whole development and manufacturing cycle of pharmaceuticals through increased level of process understanding is the main aspect to be considered within process analytical technology (PAT). Therefore, development of new tools using, e.g., multivariate methods for quality control in different steps of manufacturing process are of great importance. In this context, diffuse reflectance fourier transform infrared (DRIFT-IR) spectroscopy together with multivariate statistical process control (MSPC) analysis, soft independent modeling of class analogy (SIMCA), orthogonal signal correction (OSC) preprocessing and partial least squares (PLS) regression methods were applied to polymorphic characterization of crystalline bulk product. X-ray powder diffraction (XRPD) was used as a reference technique. The use of several multivariate techniques in rapid evaluation of purity and polymorphic composition of bulk samples is demonstrated. This offers an additional method for quality monitoring of the bulk product after crystallization process within PAT concept.
Keywords: Polymorph characterization; DRIFT-IR; SIMCA; MSPC; OSC; PLS;

Analysis of variance–principal component analysis: A soft tool for proteomic discovery by Peter de B. Harrington; Nancy E. Vieira; Jimmy Espinoza; Jyh Kae Nien; Roberto Romero; Alfred L. Yergey (118-127).
A soft tool for detection of biomarkers in high dimensional data sets has been developed. The tool combines analysis of variance (ANOVA) and principal component analysis (PCA). Covariations are separated using ANOVA into main effects and interaction. The covariances for each effect are combined with the pure error and subjected to PCA. If the main effect is significant compared to the residual error, the first principal component will span this source of variation. This technique avoids rotation of the principal components and when significant the variable loadings are amenable to interpretation. ANOVA–PCA is demonstrated as a tool for optimization of a proteomic assay for biomarkers. Two independent sets of matrix assisted laser desorption/ionization-mass spectra (MALDI-MS) were collected from amniotic fluids. These studies gave consistent biomarkers for premature delivery.
Keywords: ANOVA–PCA; Analysis of variance–principal component analysis; Amniotic fluid; Premature delivery; MALDI-MS; Proteomic biomarker; Hotelling T2; Mass spectrometry; Matrix-assisted laser desorption/ionization;

Multivariate modelling of fish freshness index based on ion mobility spectrometry measurements by Olavi Raatikainen; Ville Reinikainen; Pentti Minkkinen; Tiina Ritvanen; Petri Muje; Juha Pursiainen; Teri Hiltunen; Paula Hyvönen; Atte von Wright; Satu-Pia Reinikainen (128-134).
Ion mobility spectrometry based analysis has been found to provide a powerful and reliable tool as a portable gas detector of different chemical compounds and mixtures. The suitability of the spectrometer in food quality assay was studied in a case study, in which fish quality changes during cold storage were modelled. Hexane extracts of gills of vendace (Coregonus albula L.) were analysed by using ion mobility spectrometry. For reference, microbial counts of gills, ATP-breakdown products (K-value) of fish flesh, electrical properties (Torrymeter scores) of fish skin and sensory evaluation scores were determined. Principal component analysis (PCA) was utilised to create an index for quality by compressing the reference data. Relationship between the quality index and the spectral data were modelled with N-mode partial least squares (PLS). Thus, the ion mobility data have three-dimensional nature, since signals from six channels are recorded simultaneously. Reliability of the predictions and ion mobility measurements was of great interest in the development procedure of this novel analytical method. The multivariate statistical process control (MSPC) charts were used to determine the reliability of the ion mobility measurements. The chemical composition of gill extracts changed during the storage period as indicated by the ion mobility spectrum profiles. The dominant variation in the data was related to the length of storage of fish and to seasonal variation. The results of the case study revealed that the spectrometer, with the analytical method, and with chemometric tools form a promising tool for food quality assays.
Keywords: Fish freshness; Ion mobility spectrometry; Food quality; Modelling; Multivariate;

Paper superficial waviness: Conception and implementation of an industrial statistical measurement system by Raquel Costa; Dina Angélico; Marco S. Reis; José M. Ataíde; Pedro M. Saraiva (135-142).
The development of proper measurement methodologies for product evaluation is a critical issue to papermakers since their customers are increasingly demanding in regard to new product development and product quality.This paper addresses the conception of a measurement system to assess objectively and systematically paper superficial waviness in industrial practice. Such a system is based on mechanical stylus profilometry. The measurement system conception process is presented in this article, considering all of its stages: (i) gage selection and auxiliary components creation, (ii) drawing of a measurement procedure, (iii) assessment of the system capacities (through a repeatability and reproducibility (R&R) study), (iv) design of an appropriate categorical scale for paper waviness classification, and (v) validation of the classification model.The definition of the categorical scale encompassed the sensorial and instrumental characterization of several sheets of paper. The corresponding classification model strongly relies on the quality of judgments made by a panel of experts, and therefore the definition of a golden standard was carefully conducted. Two distinctive methodologies were used to assess the perceptiveness of the judges regarding paper superficial waviness, and linear discriminant analysis with stepwise variable selection for dimensional reduction was then applied to build a final classification model.The system conceived can be very helpful in the field of product design and process development, besides its obvious application to the monitoring of paper superficial quality. In fact, it can play an important role as an instrument used to define process–structure and structure–properties relationships, which may help in achieving faster product design time cycles.
Keywords: Paper superficial waviness; Measurement system; Product design; Process quality control;

Extra virgin olive oil (EVOO) is the highest-quality type of olive oil. This makes it also the most expensive. For this reason, it is sometimes adulterated with cheaper oils. One of these is olive–pomace oil (OPO). The protected denomination of origin (PDO) “Siurana” distinction is given to the EVOO produced in a specific area of the south of Catalonia. Here we study the potential of excitation–emission fluorescence spectroscopy (EEFS) and three-way methods of analysis to detect OPO adulteration in PDO “Siurana” olive oils at low levels (5%). First, we apply unfold principal component analysis (unfold-PCA) and parallel factor analysis (PARAFAC) for exploratory analysis. Then, we use the Hotelling T 2 and Q statistics as a fast screening method for detecting adulteration. We show that discrimination between non-adulterated and adulterated samples can be improved using Fisher's linear discriminant analysis (LDA) and discriminant multi-way partial least squares (N-PLS) regression, the latter giving a 100% of correct classification. Finally, we quantify the level of adulteration using N-PLS.
Keywords: Olive oils; Adulteration; Fluorescence; Three-way methods; Discrimination;

The boostrap is a successful technique to obtain confidence limits for estimates where it is theoretically impossible to establish an exact expression thereunto. Trilinear partial least squares regression (tri-PLS) is an estimator for which this is the case; in the current paper we thus propose to apply the bootstrap in order to obtain confidence intervals for the predictions made by tri-PLS. By dint of an extensive simulation study, we show that bootstrap confidence intervals have a desirable coverage. Finally, we apply the method to an identification problem of micro-organisms and show that from the bootstrap confidence intervals, the organisms can (up to a misclassification probability of 3.5%) correctly be identified.
Keywords: Bootstrap; Confidence interval; Uncertainty; Trilinear partial least squares regression; Tri-PLS; Prediction;

Thermal-induced unfolding of α-chymotrypsin has been monitored with circular dichroism spectroscopy, which shows a far-UV-CD region sensitive to changes in the protein secondary structure and a near-UV-CD region, which gives information at the tertiary structure level. Changes in CD signals in both the far-UV and the near-UV are used to monitor comprehensively the loss of protein structure during unfolding.The application of the chemometric method multivariate curve resolution–alternating least-squares (MCR–ALS) to the spectroscopic measurements allowed for the recovery of the concentration profiles and spectra of three different protein conformations, one of them not obtainable experimentally. Joining the resolved information about the evolution of the tertiary structure and the results coming from methods devoted to the elucidation of the protein secondary structure, the three protein conformations can be characterised as: a native conformation, with both secondary and tertiary structure organized as in the natural active protein; a second conformation, with a modified secondary structure richer in β-sheet and a native-like tertiary structure, and a third conformation, with a secondary structure very similar to the second conformation and with the tertiary structure unfolded.
Keywords: Protein folding; Multivariate curve resolution; α-Chymotrypsin; MCR–ALS; Circular Dichroism;

Application of boosting to classification problems in chemometrics by M.H. Zhang; Q.S. Xu; F. Daeyaert; P.J. Lewi; D.L. Massart (167-176).
Application of boosting to both two-class and multi-class classification problems are studied. Five real chemical data sets are used. Each data is randomly divided into two subsets, one for training and the other for prediction. For two-class classification, each data is separated into a high response level class and a low response level class according to a threshold value. As a result, three data sets, wheat data, cream data and HIV data, show that boosting using classification and regression trees (CART) as a base learner may decrease the misclassification rate in prediction with respect to using a single CART. However, boosting for green tea data indicates that overfitting may occur when boosting is applied. For the chromatographic retention data, boosting performs worse than a single CART. The cream data and the HIV data are also used for multi-class classification. Both data sets demonstrate that boosting performs better than CART in multi-classification. Variable importance analysis suggests that the improvement made by boosting may be due to the use of more variables, which give more information on special types of samples in the training data.
Keywords: Boosting; Classification; AdaBoost; CART (classification and regression trees); Variable importance;

A biofuel data set is used to show multivariate calibration and prediction of the vital parameter moisture content from near-infrared spectra. At-line sampling of heterogeneous biofuel materials and near-infrared spectroscopy (1050 wavelengths) gave prediction errors of 3 and 1% for the moisture reference. The calibration methodology and prediction diagnostics are presented in the paper with an emphasis of using replicates and duplicates for outlier detection.
Keywords: Near-infrared spectroscopy; Multivariate calibration; Prediction; Moisture; Natural variability; At-line; Off-line; Outlier statistics;

Finding and checking robustness of analytical methods are two different problems which must be treated with different tools. Finding robustness is to discover an experimental region where nothing happens, which means that the response of interest is not influenced by changing significantly the levels of the various operating factors. Such a disclosure is not obvious and needs to use surface response designs associated with canonical analysis. As there are several types of surface response designs, the question is what are the advantages and the drawbacks of each of them. In general, the solution is rarely totally satisfactory. An analytical method could be robust for some factors and not for others. At this stage, the concept of total and partial robustness may be introduced.Checking robustness is to verify the recommended settings of the analytical method. This verification enables also to assess the response variations near the operational point for each factor. Plackett and Burman designs are often proposed for checking analytical method robustness. But these designs must be used carefully. The first precaution is to add several measurements at the central point to estimate the method repeatability and to be sure that the response surface has no curvature. If the response surface presents an important curvature, Plackett and Burman designs are not suitable to check robustness and other experimental designs must be used. In this case, star designs, which are “one factor at a time” designs seems strangely be the best way to verify the robustness of an analytical method.
Keywords: Robustness; Ruggedness; Design of experiment; Analytical method;

Prediction of heating values of biomass fuel from elemental composition by A. Friedl; E. Padouvas; H. Rotter; K. Varmuza (191-198).
The heating value of biomass is an important parameter for the design and the control of power plants using this type of fuel. The so-called higher heating value, HHV, is the enthalpy of complete combustion of a fuel including the condensation enthalpy of formed water. Numerous empirical equations have been published to relate the heating value to the elemental composition of fuels other than biomass.Data of 154 biomass samples of very different origin (for instance wood, grass, rye, rape, reed, brewery waste, and poultry litter) have been selected from the database BIOBIB. Each sample has been characterized by the contents (in mass% of dry material) of carbon, hydrogen, nitrogen, oxygen, sulfur, chlorine and ash. PCA of these data shows a clustering according to the origin of the samples.A subset of 122 samples, all consisting of plant materials, has been used to develop regression models for a prediction of HHV from the elemental composition. Models with best predictive ability have been obtained using the contents of carbon, C; hydrogen, H; and nitrogen, N, and applying the methods OLS and PLS with the variables C, C2, H, C × H and N. The standard errors of prediction of the best new models are considerably smaller than those obtained with models found in literature.
Keywords: PCA; PLS; Feature selection; Calorific value; Plant material;

Alcoholic fermentation runs under Saccharomyces cerevisiae yeasts were conducted on culture medium batches containing glucose as carbon source. Spectral changes during the process were monitored in-line with a near infrared (NIR) immersion probe. Data were analysed by using a multivariate curve resolution–alternating least-squares (MCR–ALS) method. Different regions of the NIR spectrum were examined in order to ensure optimum application of the ALS algorithm and elucidation of the chemical rank for the system. The ambiguity inherent in the ALS algorithm was resolved by using various combinations of inequality and equality constraints. Some combinations were found to perform quite well in terms of explained variance and lack of fit, even in the absence of information in the form of equality constraints. The resulting model exposed a highly significant relationship between the ALS response and the reference concentration. Application of the model to alcoholic fermentation runs performed under similar conditions resulted in also similar analytical figures of merit. This allows the MCR–ALS method to be used to obtain the analyte profiles as a function of time and the spectral profiles in evolving alcoholic fermentations.
Keywords: Alcoholic fermentation; Alternating least-squares; Saccharomyces cerevisiae; NIR; PLS; In-line monitoring;

Split-plot designs and normal probability graphs for the optimization of chemical systems by João A. Bortoloti; Cleber N. Borges; Roy E. Bruns (206-212).
An approximate procedure based on normal probability graphs for selecting significant parameters of models calculated from the results of split-plot designs is proposed. Its application can result in a substantial reduction in the number of experiments that need to be performed. The method is applied to three split-plot design results for real data reported in the literature: (1) three plasticizer mixture components with different extrusion rates and drying temperatures, (2) three fish pattie ingredients at different cooking and frying temperature and times and (3) Cr(VI) catalytic determinations employing three reagents of varying concentrations and three solvent components of varying proportions. Approximate models determined from the proposed procedure are compared with those determined using complete split-plot ANOVA analyses. The robustness of the procedure is tested for one of the split-plot design results using replication, main-plot error and sub-plot error variance estimates that change according to a 23 factorial design.
Keywords: Split-plot design; Normal probability graphs; Factorial design; Mixture designs;

Calibration of near infrared spectroscopy for solid fat content of fat blends analysis using nuclear magnetic resonance data by J.C. Rodrigues; A.C. Nascimento; A. Alves; N.M. Osório; A.S. Pires; J.H. Gusmão; M.M.R. da Fonseca; S. Ferreira-Dias (213-218).
The functional properties of fats are determined by the distribution pattern of fatty acid residues in their acylglycerols, which may be modified by ester interchange (transesterification).In the margarine industry, the time course of the transesterification of fat blends is monitored by assaying for the amount of the solid fraction at different temperatures (SFC-solid fat content) currently measured by nuclear magnetic resonance (NMR).The aim of this study was to evaluate the feasibility of near infrared spectroscopy (NIRS) to quantify the SFC of different fat blends using NMR data for calibration. SFC values of 128 samples, consisting of different blends of palm stearin, palm kernel oil and concentrates of triglycerides enriched with ω-3 polyunsaturated fatty acids were assayed by NMR prior to (64 samples) and following inorganic (10 samples) or lipase-catalysed transesterification (54 samples).Prior to SFC measurement by NMR, sample preparation takes about 90 min. With NIRS technique, a faster determination is achieved since NIR spectra for SFC estimations are directly obtained on the sample at room temperature.High correlations were obtained for cross-validation of the data estimated by NIRS models and NMR for SFC assays at 10 °C (R 2  = 0.91, RMSECV = 2.4), 20 °C (R 2  = 0.96, RMSECV = 1.7), 30 °C (R 2  = 0.96, RMSECV = 1.3) and 35 °C (R 2  = 0.96, RMSECV = 1.3) of the different blends tested. The obtained results show that NIRS is a reliable technique to replace NMR for SFC estimation.
Keywords: Lipase; Near infrared spectroscopy; Nuclear magnetic resonance; Omega-3 fatty acids; Solid fat content; Transesterification;

Fast model selection for robust calibration methods by S. Engelen; M. Hubert (219-228).
One of the main issues in principal component regression (PCR) and partial least squares regression (PLSR) is the selection of the number of principal components. To this end, the curve with the root mean squared error of cross-validated prediction (RMSECV) is often described in the literature as a very helpful graphical tool. In this paper, we focus on model selection for robust calibration methods. We first propose a robust RMSECV value and then use it to define a new criterion for the selecting of the optimal number of components. This robust component selection (RCS) statistic combines the goodness-of-fit and the predictive power of the model. As the algorithms to compute these robust PCR and PLSR estimators are more complex and slower than the classical approaches, cross-validation becomes very time consuming. Hence, we propose fast algorithms to compute the robust RMSECV values. We evaluate the developed procedures at several data sets.
Keywords: Robustness; Model selection; Cross-validation; PCR; PLS;

Chemometric classification of olives from three Portuguese cultivars of Olea europaea L. by Paula B.M. Pinheiro; Joaquim C.G. Esteves da Silva (229-235).
Olives from three cultivars (Olea europaea Cv. Cobrançosa, Madural and Verdeal Transmontana) were subject to a biometric characterization of the fruit and endocarp. Olives were harvested in a selected olive grove, where the three cultivars are present, guaranteeing the homogeneity of the pedologic and climate conditions. Five trees from each cultivar, to a total of 15 trees, and 40 olives for each tree were picked up, to a total of 200 olives per cultivar. Each olive was characterized by 26 fruit and endocarp parameters as recommended by Conseil Oléicole Internationale (COI). A linear discriminant analysis was performed in order to assess the discrimination capacity of the measurement parameters between the three cultivars. An almost full discrimination of the olives from different cultivars was only achieved when fruit and endocarp data was analyzed together.
Keywords: Olive trees; Linear discriminant analysis; Classification of olives;

On Pareto-optimal fronts for deciding about sensitivity and specificity in class-modelling problems by Ma Sagrario Sánchez; Ma Cruz Ortiz; Luis A. Sarabia; Rosa Lletí (236-245).
Sensitivity and specificity are two widely accepted parameters to qualify a model when working in class-modelling problems. Further, the trade-off between these two parameters is well known. In the present work the problem of building models taking into account both sensitivity and specificity is posed in its real nature, as a multi-objective optimisation problem because we have two, in general, conflicting objectives.To do this, a new training algorithm for neural networks has been programmed that allows the user to find a set of Pareto-optimal solutions, i.e., different models with different values for sensitivity and specificity in such a way that the user may decide among models depending on the goal of the study being done.The procedure is applied to some real data sets to see its versatility and help to understand and interpret the resulting models.
Keywords: Pattern-recognition; Class-modelling problems; Sensitivity; Specificity; Multi-objective optimisation; Pareto-optimal solutions; Evolutionary algorithms;

The biomass present in a wastewater treatment plant was surveyed and their morphological properties related with operating parameters such as the total suspended solids (TSS) and sludge volume index (SVI). For that purpose image analysis was used to provide the morphological data subsequently treated by partial least squares regression (PLS) multivariable statistical technique. The results denoted the existence of a severe bulking problem of non-zoogleal nature and the PLS analysis revealed a strong relationship between the TSS and the total aggregates area as well as a close correlation between the filamentous bacteria per suspended solids ratio and the SVI.
Keywords: Environmental sciences; Activated sludge; Image analysis; PLS; Environmetrics;

On the equivalence between total least squares and maximum likelihood PCA by M. Schuermans; I. Markovsky; Peter D. Wentzell; S. Van Huffel (254-267).
The maximum likelihood PCA (MLPCA) method has been devised in chemometrics as a generalization of the well-known PCA method in order to derive consistent estimators in the presence of errors with known error distribution. For similar reasons, the total least squares (TLS) method has been generalized in the field of computational mathematics and engineering to maintain consistency of the parameter estimates in linear models with measurement errors of known distribution. The basic motivation for TLS is the following. Let a set of multidimensional data points (vectors) be given. How can one obtain a linear model that explains these data? The idea is to modify all data points in such a way that some norm of the modification is minimized subject to the constraint that the modified vectors satisfy a linear relation. Although the name “total least squares” appeared in the literature only 25 years ago, this method of fitting is certainly not new and has a long history in the statistical literature, where the method is known as “orthogonal regression”, “errors-in-variables regression” or “measurement error modeling”. The purpose of this paper is to explore the tight equivalences between MLPCA and element-wise weighted TLS (EW-TLS). Despite their seemingly different problem formulation, it is shown that both methods can be reduced to the same mathematical kernel problem, i.e. finding the closest (in a certain sense) weighted low rank matrix approximation where the weight is derived from the distribution of the errors in the data. Different solution approaches, as used in MLPCA and EW-TLS, are discussed. In particular, we will discuss the weighted low rank approximation (WLRA), the MLPCA, the EW-TLS and the generalized TLS (GTLS) problems. These four approaches tackle an equivalent weighted low rank approximation problem, but different algorithms are used to come up with the best approximation matrix. We will compare their computation times on chemical data and discuss their convergence behavior.
Keywords: TLS; MLPCA; Rank reduction; Measurement errors;

The variance of screening and supersaturated design results as a measure for method robustness by B. Dejaegher; J. Smeyers-Verbeke; Y. Vander Heyden (268-279).
Screening designs are factorial designs to evaluate the importance of factors in a number of experiments that is at least one higher than the number of factors examined. Supersaturated designs are factorial designs with more factors than experiments. These designs do not allow a correct estimation of the factor effects due to a confounding of main effects. Therefore, it is evaluated whether, in robustness testing, the variance of a response can be used as a measure for the robustness of a method. A number of potential reference criteria (reference variances estimating reproducibility and limit values) also are evaluated for their applicability to decide whether the examined factors cause non-robustness. Finally, it was also examined which conclusions one statistically can draw from comparing the variances from the design experiments with the reference criteria. Two approaches are considered for the reference variances: a classical F-test and interval hypothesis testing. The use of some limit values was also discussed. It was found that the variance of a response could be used as a measure for robustness, but statistically this variance could not be interpreted in an acceptable way. Either a large probability to accept a non-robust or to reject a robust method occurs due to the small number of degrees of freedom to examine a given number of factors, especially when applying supersaturated designs.
Keywords: Screening designs; Supersaturated designs; Variance; Robustness; F-test; Interval hypothesis testing;

Caterpillar—an adaptive algorithm for detecting process changes from acoustic emission signals by Geir Rune Flåten; Ron Belchamber; Mike Collins; Anthony D. Walmsley (280-291).
An adaptive algorithm for detecting process changes in a process monitored by acoustic emission is presented as an alternative to traditional modelling techniques based on fixed or static models. This approach significantly reduces the need to remodel the process as operating conditions change.The central idea is that two moving windows are moved through the data side by side. The signal variation in one of them is modelled by a principal component analysis (PCA) model, and the samples in the other window are compared to the critical borders of the PCA model. Significant differences are interpreted as a process change, i.e. the acoustic emission from the process has changed. In this work acoustic emission data from a fluidised bed is analysed. Optimal settings for the algorithm are proposed and the robustness towards noise and other signal degradations is shown to be good. The algorithm seems to be batch independent, which means that regular re-calibration (blank runs) which is needed by reference model approaches can be avoided.
Keywords: Acoustic emission; Fluidised bed; Adaptive; Process monitoring; Chemometrics; SIMCA; Caterpillar;

Determination of optimal support vector regression parameters by genetic algorithms and simplex optimization by B. Üstün; W.J. Melssen; M. Oudenhuijzen; L.M.C. Buydens (292-305).
Traditionally, the partial least squares (PLS) regression technique is most commonly used for quantitative analysis of near-infrared spectroscopic data. However, the use of support vector regression (SVR), a recently introduced alternative regression technique, for quantitative spectral analysis has increased over the past few years especially due to its high generalization performance and its ability to model non-linear relationships as well. Unfortunately, the practical use of SVR is limited because of its set of parameters to be defined by the user. For this reason, it was necessary to find an automated reliable, accurate and robust optimization approach to select the optimal SVR parameter settings. This paper presents a SVR parameter optimization approach based on genetic algorithms and simplex optimization, which satisfies all of the above-mentioned points. Furthermore, a comparison is made between the performance of SVR and PLS on various (noisy) data sets. From these results, it can be concluded that SVR is less sensitive to spectral noise, and hence, more robust with respect to spectral variations due to experimental circumstances. Generally, in the context of performance and robustness, the results demonstrate that SVR is a good well-performing alternative for the analysis and modelling of NIR data than the commonly applied PLS technique.
Keywords: Support vector regression (SVR); Partial least squares (PLS); Near-infrared (NIR) spectroscopy; Genetic algorithms (GA); Simplex optimization;

Class-modeling using Kohonen artificial neural networks by Federico Marini; Jure Zupan; Antonio L. Magrì (306-314).
In this paper, a class-modeling technique based on Kohonen artificial neural networks is presented. In particular, in order for the Kohonen self-organizing map to operate as a class-modeling device, two main issues are identified: integrating the training set (composed of samples from a single category) with a set of uniformly distributed random vectors and computing a suitable probability distribution associated to the positions on the 2D layer of neurons. Both the identified features concur in defining an opportune class space.When used to analyze a real-world data set (classification of rice varieties), the proposed technique provided comparable and in some cases better results than the traditional chemometric techniques SIMCA and UNEQ.
Keywords: Class-modeling; Kohonen self-organizing maps; Artificial neural networks; Chemometrics; Pattern recognition;

Prediction of enantioselectivity using chirality codes and Classification and Regression Trees by S. Caetano; J. Aires-de-Sousa; M. Daszykowski; Y. Vander Heyden (315-326).
In this paper a new application of Classification and Regression Trees, concerning the prediction of enantioselectivity, is presented. The data consists on the elution order of enantiomers separated by High-Performance Liquid Chromatography with two different chiral stationary phases. The enantiomers of both datasets were classified in two groups, named First and Last, depending on their elution order, prior to the construction of the models. Classification and Regression Trees methodology was then applied to build classification trees that allowed the prediction of the elution order of the compounds by using chirality codes as explanatory variables. The chirality codes are a set of molecular descriptors that combine different parameters and are able to distinguish between enantiomers. This new approach determined quite simple models and achieved good predictions for both datasets considered. Finally the models obtained with Classification and Regression Trees were compared with Kohonen Neural Network results.This methodology was also applied to predict the quality of the separation between two enantiomers in a certain chiral stationary phase. Previously to the construction of the model, the molecules of one of the datasets were classified in three classes (Bad, Good and Very Good), according to their degree of separation (α), and the model was built using the absolute values of the chirality codes. The results obtained for the final classification tree were quite promising.
Keywords: Enantioselectivity; Molecular descriptors; Chirality codes; Classification and Regression Trees; Chiral stationary phase; Liquid chromatography;

Calibration maintenance from day to day is an interesting problem when an analytical method is proposed for routine daily use. In this work the performance criteria of the fluorescence in the determination of quinolones by means of the parallel factor analysis (PARAFAC) model has been established according to the ISO standards and the 2002/657/EC European Decision. The presence of these substances in animals for human consumption is regulated and forbidden in poultry producing eggs for human consumption (Regulation No. 1181/2002).The paper shows that the capability of detection of the fluorescence technique is satisfactory with respect to the legal regulation. The detection limits found for ciprofloxacin without and with interferent were 33 and 27 μg l−1 when the risks of false positive and false negative have been fixed at 5%. The corresponding hypothesis test has shown that there is neither a constant error nor a proportional error in the models constructed with PARAFAC, that is, the method can be considered accurate.
Keywords: Fluorescence; PARAFAC; ISO 11843; Capability of detection; Quinolones; Calibration maintenance;

Chemometric strategies for the study of the complexation of Al(III) ions with model molecule of humic substances from UV–vis data sets by C. Ruckebusch; L. Duponchel; J.P. Huvenne; A. Caudron; L. Boilet; J.P. Cornard; J.C. Merlin; A. de Juan (337-344).
Multivariate curve resolution with alternating least squares (MCR-ALS) is successfully applied to UV–vis spectroscopic data of a model system of metal ion/humic acid interactions, namely Al(III)/caffeic acid. MCR-ALS provides resolution into the contributions of the different equilibrium species. Because of rank-deficiency phenomena, the pure spectra and concentration profiles of all species can only be recovered through matrix augmentation. The data set is thus generated coupling three different experiments, two pH-dependent Al(III)/caffeic acid experiments and one acid–base caffeic acid titration. Under these conditions, the rank-deficiency is suppressed in the matrices of analytical interest and the correct number of chemical species can be detected and modelled. Furthermore, local rank information is derived from the application of evolving factor analysis to the full-rank augmented matrices. This approach provides prior knowledge about the complexation mechanism of Al(III) by caffeic acid and enables the characterisation of the different complexed species from spectrophotometric data.
Keywords: Multivariate curve resolution; Spectroscopy; Matrix augmentation; Al(III); Complexes;


Author Index (346-348).