Analytica Chimica Acta (v.705, #1-2)

CAC-2010: Twelfth international conference on chemometrics in analytical chemistry by Lutgarde Buydens; Piet Van Espen; Sarah Rutan (1).

Review of robust multivariate statistical methods in high dimension by Peter Filzmoser; Valentin Todorov (2-14).
Display Omitted► We explain the main concepts of robust statistics. ► We focus on robust estimation in high dimension. ► Robust regression, robust PLS, and robust PCA are considered. ► R code for computation is shown and discussed for real data examples.General ideas of robust statistics, and specifically robust statistical methods for calibration and dimension reduction are discussed. The emphasis is on analyzing high-dimensional data. The discussed methods are applied using the packages chemometrics and rrcov of the statistical software environment R. It is demonstrated how the functions can be applied to real high-dimensional data from chemometrics, and how the results can be interpreted.
Keywords: Robustness; Multivariate analysis; Partial least squares; Principal component analysis; Validation; Diagnostics;

Stability-based biomarker selection by Ron Wehrens; Pietro Franceschi; Urska Vrhovsek; Fulvio Mattivi (15-23).
Biomarker identification, i.e., finding those variables that indicate true differences between two or more populations, is an ever more important topic in the omics sciences. In most cases, the number of variables far exceeds the number of samples, making biomarker identification extremely difficult. We present a strategy based on the stability of putative biomarkers under perturbation of the data, and show that in several cases important gains can be achieved. The strategy is very general and can be applied with all common biomarker identification methods; it also has the advantage that it does not rely on error estimates from crossvalidation, that in this setting tend to be highly variable.
Keywords: Biomarker identification; Classification; Metabolomics; LC–MS;

In this work a method is proposed and demonstrated for the analysis of the macrocyclic lactones abamectin, doramectin, eprinomectin, ivermectin and moxidectin in bovine milk by liquid chromatography coupled to mass spectrometry (LC–MS/MS) and liquid chromatography with fluorescence detection (LC–FL). The method is based on liquid–liquid extraction followed by a low temperature purification (LLE–LTP) step. Moreover, the proposed method was validated according to the Commission Decision 2002/657/EC, using LC–MS/MS and LC–FL for confirmatory and quantitative analysis, respectively. For LC–MS/MS the recovery rates observed ranged from 101.2 to 141.6% with coefficient of variation from 2.6 to 19.8%. For LC–FL the recovery rates observed ranged from 100.2 to 105% and coefficient of variations from 2.9 to 8.8%. Matrix effects were negligible due to the low temperature purification step. The quantification limits were far below the maximum limits established by regulations of all countries consulted. The proposed method proved to be simple, easy, and adequate for high-throughput analysis of a large number of samples per day at low cost.
Keywords: Avermectins; Low temperature purification; Extraction; Liquid chromatography–tandem mass spectrometry; Liquid chromatography–fluorescence detection;

Comparison of various chemometric approaches for large near infrared spectroscopic data of feed and feed products by J.A. Fernández Pierna; B. Lecler; J.P. Conzen; A. Niemoeller; V. Baeten; P. Dardenne (30-34).
In the present study, different multivariate regression techniques have been applied to two large near-infrared data sets of feed and feed ingredients in order to fulfil the regulations and laws that exist about the chemical composition of these products. The aim of this paper was to compare the performances of different linear and nonlinear multivariate calibration techniques: PLS, ANN and LS-SVM. The results obtained show that ANN and LS-SVM are very powerful methods for non-linearity but LS-SVM can also perform quite well in the case of linear models. Using LS-SVM an improvement of the RMS for independent test sets of 10% is obtained in average compared to ANN and of 24% compared to PLS.
Keywords: NIR; Feed; Chemometrics; PLS; ANN; LS-SVM;

Raman spectroscopy and control charts based on the net analyte signal (NAS) were applied to polymorphic characterization of carbamazepine. Carbamazepine presents four polymorphic forms: I–IV (dihydrate). X-ray powder diffraction was used as a reference technique. The control charts were built generating three charts: the NAS chart that corresponds to the analyte of interest (form III in this case), the interference chart that corresponds to the contribution of other compounds in the sample and the residual chart that corresponds to nonsystematic variations. For each chart, statistical limits were developed using samples within the quality specifications. It was possible to identify the different polymorphic forms of carbamazepine present in pharmaceutical formulations. Thus, an alternative method for the quality monitoring of the carbamazepine polymorphic forms after the crystallization process is presented.
Keywords: Raman spectroscopy; Net analyte signal; Multivariate control charts; Carbamazepine;

On the increase of predictive performance with high-level data fusion by T.G. Doeswijk; A.K. Smilde; J.A. Hageman; J.A. Westerhuis; F.A. van Eeuwijk (41-47).
The combination of the different data sources for classification purposes, also called data fusion, can be done at different levels: low-level, i.e. concatenating data matrices, medium-level, i.e. concatenating data matrices after feature selection and high-level, i.e. combining model outputs. In this paper the predictive performance of high-level data fusion is investigated.Partial least squares is used on each of the data sets and dummy variables representing the classes are used as response variables. Based on the estimated responses y ˆ j for data set j and class k, a Gaussian distribution p ( g k | y ˆ j ) is fitted. A simulation study is performed that shows the theoretical performance of high-level data fusion for two classes and two data sets. Within group correlations of the predicted responses of the two models and differences between the predictive ability of each of the separate models and the fused models are studied.Results show that the error rate is always less than or equal to the best performing subset and can theoretically approach zero. Negative within group correlations always improve the predictive performance. However, if the data sets have a joint basis, as with metabolomics data, this is not likely to happen. For equally performing individual classifiers the best results are expected for small within group correlations. Fusion of a non-predictive classifier with a classifier that exhibits discriminative ability lead to increased predictive performance if the within group correlations are strong. An example with real life data shows the applicability of the simulation results.
Keywords: Data fusion; Metabolomics; Classification; Error rate;

Random projection for dimensionality reduction—Applied to time-of-flight secondary ion mass spectrometry data by Kurt Varmuza; Cécile Engrand; Peter Filzmoser; Martin Hilchenbach; Jochen Kissel; Harald Krüger; Johan Silén; Mario Trieloff (48-55).
Random projection (RP) is a simple and fast linear method for dimensionality reduction of high-dimensional multivariate data, independent from the data. The method is briefly described and a new memory-saving algorithm is presented for the generation of random projection vectors. Application of RP to data from scanning experiments with a time-of-flight secondary ion mass spectrometer (TOF-SIMS) showed that data reduced by RP have a satisfying discriminant property for separating target material and minerals without using any knowledge about the composition of the sample. A selection method – based on low dimensional RP data – is described and successfully tested for automatic recognition of characteristic, diverse locations of a sample surface. RP is demonstrated as an unbiased, powerful method, especially for large data sets, severe hardware restrictions (such as in space experiments) or the need for fast data evaluation of hyperspectral data.
Keywords: Chemometrics; Time-of-flight secondary ion mass spectrometry; Minerals; Projection; Simulation;

Data integration and network reconstruction with ∼omics data using Random Forest regression in potato by Animesh Acharjee; Bjorn Kloosterman; Ric C.H. de Vos; Jeroen S. Werij; Christian W.B. Bachem; Richard G.F. Visser; Chris Maliepaard (56-63).
► High-throughput technologies have led to data collection in fields like transcriptomics, metabolomics and proteomics and, as a result, large amounts of data have become available. However, the integration of these∼omics data sets in relation to phenotypic traits is still problematic. ►. We have obtained population-wide gene expression and metabolite (LC–MS) data from tubers of a diploid potato population and present a novel approach to study the various ∼omics datasets to allow the construction of networks integrating gene expression, metabolites and phenotypic traits using Random Forest regression. ► Network reconstruction has led to the integration of known and uncharacterized metabolites with genes associated with the carotenoid biosynthesis pathway. Such approach enables the construction of meaningful networks with regard to known and unknown components and metabolite pathways.In the post-genomic era, high-throughput technologies have led to data collection in fields like transcriptomics, metabolomics and proteomics and, as a result, large amounts of data have become available. However, the integration of these ∼omics data sets in relation to phenotypic traits is still problematic in order to advance crop breeding. We have obtained population-wide gene expression and metabolite (LC–MS) data from tubers of a diploid potato population and present a novel approach to study the various ∼omics datasets to allow the construction of networks integrating gene expression, metabolites and phenotypic traits. We used Random Forest regression to select subsets of the metabolites and transcripts which show association with potato tuber flesh color and enzymatic discoloration. Network reconstruction has led to the integration of known and uncharacterized metabolites with genes associated with the carotenoid biosynthesis pathway. We show that this approach enables the construction of meaningful networks with regard to known and unknown components and metabolite pathways.
Keywords: Data integration; Random Forest; Network reconstruction; Tuber flesh color; Potato;

Baseline correction methods to deal with artifacts in femtosecond transient absorption spectroscopy by Olivier Devos; Nicolas Mouton; Michel Sliwa; Cyril Ruckebusch (64-71).
Display Omitted► Strong signal distortions due to artifact contributions in femtosecond spectroscopy. ► Data pre-processing using baseline correction methods for artifact removal. ► Asymmetric least squares smoothing performed well for artifact removal. ► Only mild discrepancies in the artifact-corrected spectra. ► Very good recovery of the femtosecond kinetics.In femtosecond transient absorption spectroscopy, artifact contributions are usually observed at ultra-short time scale. These complex signals are very challenging because of their nature, related to ultrafast phenomena, and because they strongly distort the structure of the spectrokinetic data. The purpose of this work is to evaluate the potential of baseline correction methods for femtosecond transient absorption spectroscopy data pre-processing. Indeed, artifacts removal should ideally be performed before multivariate data analysis. The work is thus mainly focused on two different approaches which are filtering by discrete wavelet transform, on the one hand, and smoothing by asymmetric least squares, on the other hand. The results obtained both on simulated data and on femtosecond pump–probe spectroscopy data are discussed. It can be concluded that asymmetric least squares smoothing procedure turns out to perform satisfactory for artifacts removal. Indeed, only mild discrepancies are observed in the transient spectra and, most important, good recovery of the kinetics is obtained at ultra-short time scale.
Keywords: Time-resolved spectroscopy; Artifact; Pre-processing; Wavelet; Asymmetric least squares;

Non-linear modeling of 1H NMR metabonomic data using kernel-based orthogonal projections to latent structures optimized by simulated annealing by Judith M. Fonville; Max Bylesjö; Muireann Coen; Jeremy K. Nicholson; Elaine Holmes; John C. Lindon; Mattias Rantalainen (72-80).
Display Omitted► Non-linear modeling of metabonomic data using K-OPLS. ► automated optimization of the kernel parameter by simulated annealing. ► K-OPLS provides improved prediction performance for exemplar spectral data sets. ► software implementation available for R and Matlab under GPL v2 license.Linear multivariate projection methods are frequently applied for predictive modeling of spectroscopic data in metabonomic studies. The OPLS method is a commonly used computational procedure for characterizing spectral metabonomic data, largely due to its favorable model interpretation properties providing separate descriptions of predictive variation and response-orthogonal structured noise. However, when the relationship between descriptor variables and the response is non-linear, conventional linear models will perform sub-optimally. In this study we have evaluated to what extent a non-linear model, kernel-based orthogonal projections to latent structures (K-OPLS), can provide enhanced predictive performance compared to the linear OPLS model. Just like its linear counterpart, K-OPLS provides separate model components for predictive variation and response-orthogonal structured noise. The improved model interpretation by this separate modeling is a property unique to K-OPLS in comparison to other kernel-based models. Simulated annealing (SA) was used for effective and automated optimization of the kernel-function parameter in K-OPLS (SA-K-OPLS).Our results reveal that the non-linear K-OPLS model provides improved prediction performance in three separate metabonomic data sets compared to the linear OPLS model. We also demonstrate how response-orthogonal K-OPLS components provide valuable biological interpretation of model and data. The metabonomic data sets were acquired using proton Nuclear Magnetic Resonance (NMR) spectroscopy, and include a study of the liver toxin galactosamine, a study of the nephrotoxin mercuric chloride and a study of Trypanosoma brucei brucei infection. Automated and user-friendly procedures for the kernel-optimization have been incorporated into version 1.1.1 of the freely available K-OPLS software package for both R and Matlab to enable easy application of K-OPLS for non-linear prediction modeling.
Keywords: Metabonomics; Kernel-based orthogonal projections to latent structures; Kernel models; Simulated annealing; Non-linear; Classification; K-OPLS; OPLS;

Calibration transfer for excitation–emission fluorescence measurements by Jonas Thygesen; Frans van den Berg (81-87).
Display Omitted► New methods for three-way calibration transfer are presented. ► Modifications of two-way methods are also investigated. ► A linear pixel-based model is developed; it has performance similar to methods found in literature.The main part of the wide array of different calibration transfer methods found in literature is dedicated to two-way data arrangements (m  ×  n matrices). Less work has been done within the area of calibration transfer for three-way data structures (m  ×  n  ×  l tensors) such as calibrations made for excitation–emission-matrix (EEM) fluorescence spectra. There are two possible ways to attack the problem for EEM transfer. Either the tensors are unfolded to two-way data, whereby the existing methods can be applied, or new methods dedicated to three-way calibration transfer have to be developed. This paper presents and compares both.It was possible to make a local linear pixel-based model that could be used for transfer of EEM's. This new method has a similar performance to the classical methods found in literature, direct- and piecewise direct standardization. The three-way advantages made it possible to use as few as four samples to build useable transfer models. Care has to be taken though when choosing the samples. When subset recalibration of the systems is compared to calibration transfer, better performance is seen for the transferred calibrations. Overall the three-way calibration transfer methods have a slightly better performance than the two-way methods.
Keywords: Calibration transfer; Fluorescence spectroscopy; Parallel factor analysis; Vitamin B2;

Display Omitted► The potential of two sampling techniques for FT-IR spectroscopy, as well as the effect of homogenization as sample pre-treatment, was examined in order to measure the composition of raw milk. ► μATR IR spectroscopy resulted in excellent predictions for the crude protein, lactose and urea content of raw milk, while homogenization was necessary to obtain good measurements for milk fat. ► Although the results for HTT IR spectroscopy were not as good, they were still acceptable. For HTT, homogenization was essential for good fat and urea measurements.Milk production is a dominant factor in the metabolism of dairy cows involving a very intensive interaction with the blood circulation. As a result, the extracted milk contains valuable information on the metabolic status of the cow. On-line measurement of milk components during milking two or more times a day would promote early detection of systemic and local alterations, thus providing a great input for strategic and management decisions. The objective of this study was to investigate the potential of mid-infrared (mid-IR) spectroscopy to measure the milk composition using two different measurement modes: micro attenuated total reflection (μATR) and high throughput transmission (HTT). Partial least squares (PLS) regression was used for prediction of fat, crude protein, lactose and urea after preprocessing IR data and selecting the most informative wavenumber variables. The prediction accuracies were determined separately for raw and homogenized copies of a wide range of milk samples in order to estimate the possibility for on-line analysis of the milk. In case of fat content both measurement modes resulted in an excellent prediction for homogenized samples (R 2  > 0.92) but in poor results for raw samples (R 2  < 0.70). Homogenization was however not mandatory to achieve good predictions for crude protein and lactose with both μATR and HTT, and urea with μATR spectroscopy. Excellent results were obtained for prediction of crude protein, lactose and urea content (R 2  > 0.99, 0.98 and 0.86 respectively) in raw and homogenized milk using μATR IR spectroscopy. These results were significantly better than those obtained by HTT IR spectroscopy. However, the prediction performance of HTT was still good for crude protein and lactose content (R 2  > 0.86 and 0.78 respectively) in raw and homogenized samples. However, the detection of urea in milk with HTT spectroscopy was significantly better (R 2  = 0.69 versus 0.16) after homogenization of the milk samples. Based on these observations it can be concluded that μATR approach is most suitable for rapid at line or even on-line milk composition measurement, although homogenization is crucial to achieve good prediction of the fat content.
Keywords: Fourier transform infrared spectroscopy; Attenuated total reflection; High throughput transmission; Ultrasonic homogenization; Milk analysis; Health monitoring;

Classification models for neocryptolepine derivatives as inhibitors of the β-haematin formation by B. Dejaegher; L. Dhooghe; M. Goodarzi; S. Apers; L. Pieters; Y. Vander Heyden (98-110).
Display Omitted► Classification of various neocryptolepine derivatives according to their anti-malarial activity. ► Use of LDA, QDA, CART, PLS-DA, OPLS-DA, OAO-SVM-C, and OAA-SVM-C for classification. ► CART model preferred for three-class classification according to activity. ► LDA and QDA models preferred for two-class classification according to activity.This paper describes the construction of a QSAR model to relate the structures of various derivatives of neocryptolepine to their anti-malarial activities. QSAR classification models were build using Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Classification and Regression Trees (CART), Partial Least Squares – Discriminant Analysis (PLS-DA), Orthogonal Projections to Latent Structures – Discriminant Analysis (OPLS-DA), and Support Vector Machines for Classification (SVM-C), using four sets of molecular descriptors as explanatory variables. Prior to classification, the molecules were divided into a training and a test set using the duplex algorithm. The different classification models were compared regarding their predictive ability, simplicity, and interpretability. Both binary and multi-class classification models were constructed. For classification into three classes, CART and One-Against-One (OAO)-SVM-C were found to be the best predictive methods, while for classification into two classes, LDA, QDA and CART were.
Keywords: β-Haematin inhibition; Classification models; Linear Discriminant Analysis; Quadratic Discriminant Analysis; Classification and Regression Trees; Partial Least Squares – Discriminant Analysis; Orthogonal Projection to Latent Structures – Discriminant Analysis; Support Vector Machines for Classification;

Quality control of Citri reticulatae pericarpium: Exploratory analysis and discrimination by Christophe Tistaert; Line Thierry; Andrzej Szandrach; B. Dejaegher; Guorong Fan; Michel Frédérich; Yvan Vander Heyden (111-122).
Display Omitted► Quality control of Citri reticulatae pericarpium based on entire fingerprint analysis. ► Update official quality control criterion. ► Discrimination based on probabilistic Discriminant Partial Least Squares (p-DPLS). ► Evaluation of the model discovering six discriminating constituents.Extracts of Citri reticulatae pericarpium (PCR) are commonly used in the Traditional Chinese Medicine. The quality control of PCR is currently performed by single marker analysis, which can hardly describe the complexity of such natural samples. In this study, a fingerprint methodology for PCR based on high-performance liquid chromatography (HPLC) was developed and validated. A total of 69 fingerprints of authenticated PCR samples, commercial PCR samples, mixed peel samples, and other Citrus peels were recorded. Exploratory data analysis allowed optimizing the extraction procedure and detecting mixed peel samples. Once the optimizations were performed and the method validated, discrimination between the authentic PCR samples and all other samples was performed by p-Discriminant Partial Least Squares. The established model was able to differentiate between classes with a high reliability for each sample. Furthermore, evaluation of the score and loading plots of the model indicated nobiletin, tangeretin, naringin and hesperidin as important markers for the quality control of PCR.
Keywords: Herbal fingerprinting; Quality control; Partial least squares; High-performance liquid chromatography (HPLC);

Opening the kernel of kernel partial least squares and support vector machines by G.J. Postma; P.W.T. Krooshof; L.M.C. Buydens (123-134).
Display Omitted► We provide a solution to visualize the contribution of variables to kernel based regression methods. ► This variable information is lost in methods like KPLS and support vector regression due to the kernel. ► The influence and non-linearity of the variables are visualized using so-called pseudo sample trajectories. ► We have tested the method on several artificial and real linear and non-linear data sets. ► Our method clearly indicates the important variables.Kernel partial least squares (KPLS) and support vector regression (SVR) have become popular techniques for regression of complex non-linear data sets. The modeling is performed by mapping the data in a higher dimensional feature space through the kernel transformation. The disadvantage of such a transformation is, however, that information about the contribution of the original variables in the regression is lost. In this paper we introduce a method which can retrieve and visualize the contribution of the variables to the regression model and the way the variables contribute to the regression of complex data sets. The method is based on the visualization of trajectories using so-called pseudo samples representing the original variables in the data. We test and illustrate the proposed method to several synthetic and real benchmark data sets. The results show that for linear and non-linear regression models the important variables were identified with corresponding linear or non-linear trajectories. The results were verified by comparing with ordinary PLS regression and by selecting those variables which were indicated as important and rebuilding a model with only those variables.
Keywords: Kernel partial least squares; Support vector regression; Kernel transformation; variable selection; Pseudo-samples; Trajectories;

► FSD and PLS regression are integrated for the quantitative analysis of NIR spectroscopy. ► The grid search optimization method is used to select the optimal parameters of the model. ► The performance of the model is compared with the second derivative and PLS models. ► The model provides a significant improvement in the prediction ability of the PLS model. ► A modification of the FSD–PLS method was introduced to suppress the baseline variations.In this paper a new model based on frequency self deconvolution (FSD) is proposed for the quantitative analysis of a near infrared (NIR) spectrum. The model couples FSD and partial least square regression (PLS). The grid search optimization method is used to select the optimal values of the full width at half height (FWHH) and the truncation point of the apodization function. The proposed FSD–PLS provides a significant improvement in the prediction ability of the PLS model. Furthermore, a modification of the new FSD–PLS method is introduced to enable the removal of the baseline variations from the NIR spectra. The proposed models were validated using absorbance spectra of mixtures composed from glucose, urea and triacetin in a phosphate buffer solution where the concentrations of the components are selected to be within their physiological range in blood. The whole experiments were carried out in a non-controlled environment to show that the model can suppress effectively most of the experimental variations. The results show that the standard error of prediction (SEP) decreases from 35.58 mg dL−1 using 8 factors for the PLS model to 15.53 mg dL−1 by using 12 factors for the modified FSD–PLS model. The proposed models are also shown to yield a slightly improved performance than a newly developed second derivative-PLS model without incurring the shortcoming associated with the derivative approach in not providing interpretable results and in degrading the SNR of the spectra at a faster rate.
Keywords: Frequency self deconvolution; Derivative; NIR; PLS;

.Display Omitted► Optimum conditions of pigment dying process of high performance fibers were proposed. ► The feed-forward bottleneck neural network (FFBN) as a mapping technique was implemented. ► We showed the influence of different factors (parameters of pigment dying) on different output responses (quality of pigment dying).Process optimization involves the minimization (or maximization) of an objective function, that can be established from a technical and (or) economic viewpoint taking into account safety of process. The basic idea of the optimization method using neural network (NN) is to replace the model equations (which traditionally obtained using, for example, the surface response design or others methods) by an equivalent NN.The feed-forward bottleneck neural network (FFBN) as a mapping technique is described and evaluated. From the 2D maps the optimal parameters of pigment dyeing of high performance fibers on the bases of poly-amide benzimidazole (PABI) and polyimide (arimid) are discussed. The studied fibers were treated in 32 experiments under the conditions as proposed by the Design of Experiment (DOE), varying five influencing factors. Neural network mapping method enables visualization of process and shows the influence of different factors on different output responses. Optimum parameters were selected upon compromise decision.
Keywords: Optimization methods; Feed-forward bottleneck (FFBN) neural network; Pigment dyeing; High performance fibers;

Detecting outlying samples in a parallel factor analysis model by Sanne Engelen; Mia Hubert (155-165).
Display Omitted► Classical PARAFAC is sensitive to outlying samples. ► A robust PARAFAC method is presented which can cope with outlying samples. ► An outlier map identifies the outliers in a graphical way.To explore multi-way data, different methods have been proposed. Here, we study the popular PARAFAC (Parallel factor analysis) model, which expresses multi-way data in a more compact way, without ignoring the underlying complex structure. To estimate the score and loading matrices, an alternating least squares procedure is typically used. It is however well known that least squares techniques suffer from outlying observations, making the models useless when outliers are present in the data. In this paper, we present a robust PARAFAC method. Essentially, it searches for an outlier-free subset of the data, on which we can then perform the classical PARAFAC algorithm. An outlier map is constructed to identify outliers. Simulations and examples show the robustness of our approach.
Keywords: Robustness; Parallel factor analysis; Multi-way data; Outliers;

Display Omitted► Linear and non-linear quantitative structure-activity relationship models for predicting non-nucleoside HIV-1 reverse transcriptase inhibitors were built. ► Ant colony optimization as a swarm intelligence feature selection technique was used to select relevant descriptors. ► The absence of multicollinearty between the selected descriptors was checked. ► External (test set) and internal (cross-validation) were used to validate the models.For a series of thiocarbamates, non-nucleoside HIV-1 reverse transcriptase inhibitors, few descriptors have been selected from a large pool of theoretical molecular descriptors by means of the ant colony optimization (ACO) feature selection method. The selected descriptors were correlated with the bioactivities of the molecules using the well known multiple linear regression (MLR) and partial least squares (PLS) regression techniques, and, to account for nonlinearity, also PLS coupled to radial basis function (RBF) on the one hand and radial basis function neural network (RBFNN) on the other. In this case study, the RBF/PLS results were better than those from the other modeling techniques applied. The prediction ability of the ACO/RBF/PLS-based quantitative structure–activity relationship (QSAR) model was found to be significantly superior to comparative molecular field analysis (CoMFA) and comparative molecular similarity index analysis (CoMSIA) models previously established for this series of compounds. It was also demonstrated that RBF as a nonlinear approach is useful in deriving simple and predictive QSAR models, without the need to recourse to expeditious 3D methodologies.
Keywords: Thiocarbamates; HIV-1 reverse transcriptase; Ant colony optimization; Radial basis function; Partial least squares;

Chemometric methods applied to the calibration of a Vis–NIR sensor for gas engine's condition monitoring by Alberto Villar; Eneko Gorritxategi; Deitze Otaduy; Jose I. Ciria; Luis A. Fernandez (174-181).
Display Omitted► We describe the calibration process of a Visible–Near Infrared sensor for the condition monitoring of a gas engine's lubricating oil. ► Chemometrics techniques were applied to determine Base Number (BN), Acid Number (AN), amount of insolubles in pentane and viscosity at 40 °C. ► In order to improve sensor data different preprocessing methods were applied taking into account both the oil parameters and sensor data. ► Although the results are promising, the models should be improved in order to decrease the prediction error.This paper describes the calibration process of a Visible–Near Infrared sensor for the condition monitoring of a gas engine's lubricating oil correlating transmittance oil spectra with the degradation of a gas engine's oil via a regression model. Chemometric techniques were applied to determine different parameters: Base Number (BN), Acid Number (AN), insolubles in pentane and viscosity at 40 °C. A Visible–Near Infrared (400–1100 nm) sensor developed in Tekniker research center was used to obtain the spectra of artificial and real gas engine oils.In order to improve sensor's data, different preprocessing methods such as smoothing by Saviztky–Golay, moving average with Multivariate Scatter Correction or Standard Normal Variate to eliminate the scatter effect were applied. A combination of these preprocessing methods was applied to each parameter. The regression models were developed by Partial Least Squares Regression (PLSR). In the end, it was shown that only some models were valid, fulfilling a set of quality requirements. The paper shows which models achieved the established validation requirements and which preprocessing methods perform better. A discussion follows regarding the potential improvement in the robustness of the models.
Keywords: Oil condition monitoring; Partial least squares; Gas engine; On-line sensor; Calibration; Visible–Near Infrared;

The article summarizes the usefulness of Multivariate Curve Resolution to provide distribution maps (C matrix) and pure spectra (S T matrix) of compounds from raw biomedical hyperspectral images. An additional interesting aspect is the possibility to obtain interpretable segmentation schemes by using the obtained MCR scores (C matrix) as starting point.Display Omitted► MCR-ALS provides distribution maps and pure spectra of compounds in biomedical images ► MCR scores (C matrix) can be used as starting point for image segmentation purposes. ► Use of MCR scores offers fast computation and gives interpretable segmentation schemes. ► Use of MCR scores allows selecting compound contributions for segmentation ► Use of MCR scores allows profile pretreatment for segmentation (e.g. autoscaling).MCR-ALS is a resolution method that has been applied in many different fields, such as process analysis, environmental data and, recently, hyperspectral image analysis. In this context, the algorithm provides the distribution maps and the pure spectra of the image constituents from the sole information in the raw image measurement. Based on the distribution maps and spectra obtained, additional information can be easily derived, such as identification of constituents when libraries are available or quantitation within the image, expressed as constituent signal contribution. This work summarizes first the protocol followed for the resolution on two examples of kidney calculi, taken as representations of images with major and minor compounds, respectively.Image segmentation allows separating regions of images according to their pixel similarity and is also relevant in the biomedical field to differentiate healthy from non-healthy regions in tissues or to identify sample regions with distinct properties. Information on pixel similarity is enclosed not only in pixel spectra, but also in other smaller pixel representations, such as PCA scores. In this paper, we propose the use of MCR scores (concentration profiles) for segmentation purposes. K-means results obtained from different pixel representations of the data set are compared. The main advantages of the use of MCR scores are the interpretability of the class centroids and the compound-wise selection and preprocessing of the input information in the segmentation scheme.
Keywords: Image segmentation; MCR-ALS; K-means; Hyperspectral images; Asymmetric least-squares;

Evaluating the reliability of analytical results using a probability criterion: A Bayesian perspective by Eric Rozet; Bernadette Govaerts; Pierre Lebrun; Karim Michail; Eric Ziemons; Reinhold Wintersteiger; Serge Rudaz; Bruno Boulanger; Philippe Hubert (193-206).
Display Omitted► Estimation of analytical methods reliability over the whole concentration range. ► Analyst can see how far the results will be reliable for its future intended use. ► A detailed description of the Bayesian algorithm is given for easy implementation. ► Bayesian reliability profile improves estimation of reliability probability. ► Application to the validation of a novel SPE-HPLC-UV bioanalytical method.Methods validation is mandatory in order to assess the fitness of purpose of the developed analytical method. Of core importance at the end of the validation is the evaluation of the reliability of the individual results that will be generated during the routine application of the method. Regulatory guidelines provide a general framework to assess the validity of a method, but none address the issue of results reliability. In this study, a Bayesian approach is proposed to address this concern. Results reliability is defined here as “the probability (π) of an analytical method to provide analytical results (X) within predefined acceptance limitsλ) around their reference or conventional true concentration values (μ T ) over a defined concentration range and under given environmental and operating conditions.” By providing the minimum reliability probability (π min) needed for the subsequent routine application of the method, as well as specifications or acceptance limits (±λ), the proposed Bayesian approach provides the effective probability of obtaining reliable future analytical results over the whole concentration range investigated. This is summarised in a single graph: the reliability profile. This Bayesian reliability profile is also compared to two frequentist approaches, the first one derived from the work of Dewé et al. [W. Dewé, B. Govaerts, B. Boulanger, E. Rozet, P. Chiap, Ph. Hubert, Chemometr. Intell. Lab. Syst. 85 (2007) 262–268] and the second proposed by Govaerts et al. [B. Govaerts, W. Dewé, M. Maumy, B. Boulanger, Qual. Reliab. Eng. Int. 24 (2008) 667–680]. Furthermore, to illustrate the applicability of the Bayesian reliability profile, this approach is also applied here to a bioanalytical method dedicated to the determination of ketoglutaric acid (KG) and hydroxymethylfurfural (HMF) in human plasma by SPE-HPLC-UV.
Keywords: Results reliability; Validation; Reliability profile; Bayesian approach;

Display Omitted► A selection of the best features for multivariate forensic glass classification using SEM–EDX was performed. ► The feature selection process was carried out by means of an exhaustive search, with an Empirical Cross-Entropy objective function. ► Results show remarkable accuracy of the best variables selected following the proposed procedure for the task of classifying glass fragments into windows or containers.In this work, a selection of the best features for multivariate forensic glass classification using Scanning Electron Microscopy coupled with an Energy Dispersive X-ray spectrometer (SEM–EDX) has been performed. This has been motivated by the fact that the databases available for forensic glass classification are sparse nowadays, and the acquisition of SEM–EDX data is both costly and time-consuming for forensic laboratories. The database used for this work consists of 278 glass objects for which 7 variables, based on their elemental compositions obtained with SEM–EDX, are available. Two categories are considered for the classification task, namely containers and car/building windows, both of them typical in forensic casework. A multivariate model is proposed for the computation of the likelihood ratios. The feature selection process is carried out by means of an exhaustive search, with an Empirical Cross-Entropy (ECE) objective function. The ECE metric takes into account not only the discriminating power of the model in use, but also its calibration, which indicates whether or not the likelihood ratios are interpretable in a probabilistic way. Thus, the proposed model is applied to all the 63 possible univariate, bivariate and trivariate combinations taken from the 7 variables in the database, and its performance is ranked by its ECE. Results show remarkable accuracy of the best variables selected following the proposed procedure for the task of classifying glass fragments into windows (from cars or buildings) or containers, obtaining high (almost perfect) discriminating power and good calibration. This allows the proposed models to be used in casework. We also present an in-depth analysis which reveals the benefits of the proposed ECE metric as an assessment tool for classification models based on likelihood ratios.
Keywords: Glass classification; Feature selection; Empirical Cross-Entropy; Forensic evaluation of the evidence; Likelihood ratio; Physico-chemical multivariate data; SEM–EDX;

Deconvolution of pulse trains with the L 0 penalty by Johan de Rooi; Paul Eilers (218-226).
Display Omitted► Deconvolution of pulse trains is performed using penalized regression. ► We propose the use of an L 0 penalty and compare it with the more common L 2 and L 1 penalties. ► The model is extended with a smooth component to handle drifting baselines.The output of many instruments can be modeled as a convolution of an impulse response and a series of sharp spikes. Deconvolution considers the inverse problem: estimate the input spike train from an observed (noisy) output signal. We approach this task as a linear inverse problem, solved using penalized regression. We propose the use of an L 0 penalty and compare it with the more common L 2 and L 1 penalties. In all cases a simple and iterative weighted regression procedure can be used. The model is extended with a smooth component to handle drifting baselines. Application to three different data sets shows excellent results.
Keywords: Spike trains; Spectra; Blind deconvolution; Ill-conditioned; Penalized regression;

Characterisation of heavy oils using near-infrared spectroscopy: Optimisation of pre-processing methods and variable selection by Jérémy Laxalde; Cyril Ruckebusch; Olivier Devos; Noémie Caillol; François Wahl; Ludovic Duponchel (227-234).
Display Omitted► Quantitative determination of heavy petroleum products from NIR spectroscopy. ► Global optimization of wavelength selection and pre-processing. ► Model accuracy evaluation with a statistical randomisation test.In this study, chemometric predictive models were developed from near infrared (NIR) spectra for the quantitative determination of saturates, aromatics, resins and asphaltens (SARA) in heavy petroleum products. Model optimisation was based on adequate pre-processing and/or variable selection. In addition to classical methods, the potential of a genetic algorithm (GA) optimisation, which allows the co-optimisation of pre-processing methods and variable selection, was evaluated. The prediction results obtained with the different models were compared and decision regarding their statistical significance was taken applying a randomization t-test. Finally, the results obtained for the root mean square errors of prediction (and the corresponding concentration range) expressed in %(w/w), are 1.51 (14.1–99.1) for saturates, 1.59 (0.7–61.1) for aromatics, 0.77 (0–34.5) for resins and 1.26 (0–14.7) for asphaltens. In addition, the usefulness of the proposed optimisation method for global interpretation is shown, in accordance with the known chemical composition of SARA fractions.
Keywords: Genetic algorithm; Variable selection; Spectral pre-processing; Near infrared spectrocopy; Partial least squares regression; Heavy oils;

Display Omitted► Image analysis and PCA techniques were applied to an activated sludge system. ► Aggregated and filamentous biomass, Gram status and viability were characterized. ► Several abnormalities were identified combining image analysis and PCA. ► Monitoring activated sludge improved using chemometric methods and image analysis.This work focuses on the use of chemometric techniques for identifying activated sludge process abnormalities. Chemometric methods combined with image analysis can improve activated sludge systems monitoring and minimize the need of analytical measurements. For that purpose data was collected from aggregated and filamentous biomass, biomass composition on Gram-positive/Gram-negative bacteria and viable/damaged bacteria, and operational parameters. Principal component analysis (PCA) was subsequently applied to identify activated sludge abnormalities, allowing the identification of several disturbances, namely filamentous bulking, pinpoint flocs formation, and zoogleal bulking as well as normal conditions by grouping the collected samples in corresponding clusters.
Keywords: Activated sludge; Image analysis; Morphology; Physiology; Chemometric techniques;

Modelling spatial and temporal variations in the water quality of an artificial water reservoir in the semiarid Midwest of Argentina by Fabricio D. Cid; Rosa I. Antón; Rafael Pardo; Marisol Vega; Enrique Caviedes-Vidal (243-252).
Display Omitted► Water quality of an Argentinean reservoir has been investigated by N-way PCA. ► PARAFAC mode modelled spatial and seasonal variations of water composition. ► Two factors related with organic and lead pollution have been identified. ► The most polluted areas of the reservoir were located, and polluting sources identified.Temporal and spatial patterns of water quality of an important artificial water reservoir located in the semiarid Midwest of Argentina were investigated using chemometric techniques. Surface water samples were collected at 38 points of the water reservoir during eleven sampling campaigns between October 1998 and June 2000, covering the warm wet season and the cold dry season, and analyzed for dissolved oxygen (DO), conductivity, pH, ammonium, nitrate, nitrite, total dissolved solids (TDS), alkalinity, hardness, bicarbonate, chloride, sulfate, calcium, magnesium, fluoride, sodium, potassium, iron, aluminum, silica, phosphate, sulfide, arsenic, chromium, lead, cadmium, chemical oxygen demand (COD), biochemical oxygen demand (BOD), viable aerobic bacteria (VAB) and total coliform bacteria (TC).Concentrations of lead, ammonium, nitrite and coliforms were higher than the maximum allowable limits for drinking water in a large proportion of the water samples. To obtain a general representation of the spatial and temporal trends of the water quality parameters at the reservoir, the three-dimensional dataset (sampling sites × parameters × sampling campaigns) has been analyzed by matrix augmentation principal component analysis (MA-PCA) and N-way principal component analysis (N-PCA) using Tucker3 and PARAFAC (Parallel Factor Analysis) models. MA-PCA produced a component accounting for the general behavior of parameters associated with organic pollution. The Tucker3 models were not appropriate for modelling the water quality dataset. The two-factor PARAFAC model provided the best picture to understand the spatial and temporal variation of the water quality parameters of the reservoir. The first PARAFAC factor contains useful information regarding the relation of organic pollution with seasonality, whereas the second factor also encloses information concerning lead pollution. The most polluted areas in the reservoir and the polluting sources were identified by plotting PARAFAC loadings as a function of the UTM (Universal Transverse Mercator) coordinates.
Keywords: Water quality; Water reservoir; Modelling; N-way principal component analysis; Parallel Factor Analysis; Tucker3;

Display Omitted• We compare five interpolation methods for the first dimension of LC × LC data. • We align the data by a linear shift after using these methods for PARAFAC analysis. • Gaussian fitting gave the best results with PARAFAC for simulated data. • Data aligned by the interpolation methods gave improved %RSDs.Simulated and experimental data were used to measure the effectiveness of common interpolation techniques during chromatographic alignment of comprehensive two-dimensional liquid chromatography–diode array detector (LC × LC–DAD) data. Interpolation was used to generate a sufficient number of data points in the sampled first chromatographic dimension to allow for alignment of retention times from different injections. Five different interpolation methods, linear interpolation followed by cross correlation, piecewise cubic Hermite interpolating polynomial, cubic spline, Fourier zero-filling, and Gaussian fitting, were investigated. The fully aligned chromatograms, in both the first and second chromatographic dimensions, were analyzed by parallel factor analysis to determine the relative area for each peak in each injection. A calibration curve was generated for the simulated data set. The standard error of prediction and percent relative standard deviation were calculated for the simulated peak for each technique. The Gaussian fitting interpolation technique resulted in the lowest standard error of prediction and average relative standard deviation for the simulated data. However, upon applying the interpolation techniques to the experimental data, most of the interpolation methods were not found to produce statistically different relative peak areas from each other. While most of the techniques were not statistically different, the performance was improved relative to the PARAFAC results obtained when analyzing the unaligned data.
Keywords: Data alignment; Interpolation; Comprehensive; Two dimensional liquid chromatography; Chemometrics; Parallel factor analysis;

Decision trees in selection of featured determined food quality by B. Dębska; B. Guzowska-Świder (261-271).
Display Omitted► Developed decision tree model classify beer samples according to their quality grade. ► Classification is based on the beer chemical and sensory dataset. ► The model indicates which features are the most discriminating for classification.The determination of food quality, authenticity and the detection of adulterations are problems of increasing importance in food chemistry. Recently, chemometric classification techniques and pattern recognition analysis methods for wine and other alcoholic beverages have received great attention and have been largely used. Beer is a complex mixture of components: on one hand a volatile fraction, which is responsible for its aroma, and on the other hand, a non-volatile fraction or extract consisting of a great variety of substances with distinct characteristics. The aim of this study was to consider parameters which contribute to beer differentiation according to the quality grade. Chemical (e.g. pH, acidity, dry extract, alcohol content, CO2 content) and sensory features (e.g. bitter taste, color) were determined in 70 beer samples and used as variables in decision tree techniques. This pattern recognition techniques applied to the dataset were able to extract information useful in obtaining a satisfactory classification of beer samples according to their quality grade. Feature selection procedures indicated which features are the most discriminating for classification.
Keywords: Principal component analysis; ID3 algorithm; Decision tree; Classification; Feature selection;

Time series hyperspectral chemical imaging data: Challenges, solutions and applications by A.A. Gowen; F. Marini; C. Esquerre; C. O’Donnell; G. Downey; J. Burger (272-282).
.Display Omitted► Time series hyperspectral chemical imaging data. ► Overview of chemometric methods. ► Application of multivariate curve resolution and parallel factor analysis.Hyperspectral chemical imaging (HCI) integrates imaging and spectroscopy resulting in three-dimensional data structures, hypercubes, with two spatial and one wavelength dimension. Each spatial image pixel in a hypercube contains a spectrum with >100 datapoints. While HCI facilitates enhanced monitoring of multi-component systems; time series HCI offers the possibility of a more comprehensive understanding of the dynamics of such systems and processes. This implies a need for modeling strategies that can cope with the large multivariate data structures generated in time series HCI experiments. The challenges posed by such data include dimensionality reduction, temporal morphological variation of samples and instrumental drift. This article presents potential solutions to these challenges, including multiway analysis, object tracking, multivariate curve resolution and non-linear regression. Several real world examples of time series HCI data are presented to illustrate the proposed solutions.
Keywords: Hyperspectral; Chemical; Imaging; Time; Series;

Application of artificial neural network in food classification by B. Dębska; B. Guzowska-Świder (283-291).
► Developed ANN models classify beer samples according to their quality grade. ► Classification is based on the beer chemical and sensory dataset. ► The model indicates which features are the most discriminating for classification.Artificial neural network (ANN) classifiers have been successfully implemented for various quality inspection and grading tasks of diverse food products. ANN are very good pattern classifiers because of their ability to learn patterns that are not linearly separable and concepts dealing with uncertainty, noise and random events. In this research, the ANN was used to build the classification model based on the relevant features of beer. Samples of the same brand of beer but with varying manufacturing dates, originating from miscellaneous manufacturing lots, have been represented in the multidimensional space by data vectors, which was an assembly of 12 features (% of alcohol, pH, % of CO2 etc.). The classification has been performed for two subsets, the first that included samples of good quality beer and the other containing samples of unsatisfactory quality. ANN techniques allowed the discrimination between qualities of beer samples with up to 100% of correct classifications.
Keywords: Neural networks; Classification; Food;

Display Omitted► Three new variable reduction methods were developed, called Predictive-Property-Ranked Variable Reduction with Complexity Adapted Models (PPRVR-CAM) methods. ► PPRVR-CAM methods have a possibility for decreasing the PLS model complexity during variable reduction. ► The methods are able to retain significantly smaller numbers of informative variables than the existing methods based on predictive-property-ranked variables, UVE-GA-PLS and UVE-iPLS, without loss of prediction ability. ► Important variables, with a chemical meaning relevant to the response, are not excluded in the stepwise backward variable selection procedures.The calibration performance of partial least squares for one response variable (PLS1) can be improved by elimination of uninformative variables. Many methods are based on so-called predictive variable properties, which are functions of various PLS-model parameters, and which may change during the variable reduction process. In these methods variable reduction is made on the variables ranked in descending order for a given variable property. The methods start with full spectrum modelling. Iteratively, until a specified number of remaining variables is reached, the variable with the smallest property value is eliminated; a new PLS model is calculated, followed by a renewed ranking of the variables. The Stepwise Variable Reduction methods using Predictive-Property-Ranked Variables are denoted as SVR-PPRV. In the existing SVR-PPRV methods the PLS model complexity is kept constant during the variable reduction process. In this study, three new SVR-PPRV methods are proposed, in which a possibility for decreasing the PLS model complexity during the variable reduction process is build in.Therefore we denote our methods as PPRVR-CAM methods (Predictive-Property-Ranked Variable Reduction with Complexity Adapted Models). The selective and predictive abilities of the new methods are investigated and tested, using the absolute PLS regression coefficients as predictive property. They were compared with two modifications of existing SVR-PPRV methods (with constant PLS model complexity) and with two reference methods: uninformative variable elimination followed by either a genetic algorithm for PLS (UVE-GA-PLS) or an interval PLS (UVE-iPLS). The performance of the methods is investigated in conjunction with two data sets from near-infrared sources (NIR) and one simulated set. The selective and predictive performances of the variable reduction methods are compared statistically using the Wilcoxon signed rank test.The three newly developed PPRVR-CAM methods were able to retain significantly smaller numbers of informative variables than the existing SVR-PPRV, UVE-GA-PLS and UVE-iPLS methods without loss of prediction ability. Contrary to UVE-GA-PLS and UVE-iPLS, there is no variability in the number of retained variables in each PPRV(R) method. Renewed variable ranking, after deletion of a variable, followed by remodelling, combined with the possibility to decrease the PLS model complexity, is beneficial. A preferred PPRVR-CAM method is proposed.
Keywords: Variable reduction; PLS1; PPRVR-CAM; UVE-GA-PLS; UVE-iPLS; Wilcoxon signed rank test;

Diprotonation process of meso-tetraphenylporphyrin derivatives designed for Photodynamic Therapy of cancers: From Multivariate Curve Resolution to predictive QSPR modeling by Benoît Chauvin; Athena Kasselouri; Pierre Chaminade; Rita Quiameso; Ioannis Nicolis; Philippe Maillard; Patrice Prognon (306-314).
Display Omitted► Diprotonation of 17 meso-tetraphenylporphyrin derivatives. ► MCR-ALS resolution of multi-component mixtures. ► Determination of stepwise protonation constants. ► Prediction of protonation constants from ET-State indices.Tetrapyrrole rings possess four nitrogen atoms, two of which act as Bröndsted bases in acidic media. The two protonation steps occur on a close pH range, particularly in the case of meso-tetraphenylporphyrin (TPP) derivatives. If the cause of this phenomenon is well known – a protonation-induced distortion of the porphyrin ring – data on stepwise protonation constants and on electronic absorption spectra of monoprotonated TPPs are sparse. A multivariate approach has been systematically applied to a series of glycoconjugated and hydroxylated TPPs, potential anticancer drugs usable in Photodynamic Therapy. The dual purpose was determination of protonation constants and linking substitution with basicity. Hard-modeling version of MCR-ALS (Multivariate Curve Resolution Alternating Least Squares) has given access to spectra and distribution profile of pure components. Spectra of monoprotonated species (H3TPP+) in solution resemble those of diprotonated species (H4TPP2+), mainly differing by a slight blue-shift of bands. Overlap of H3TPP+ and H4TPP2+ spectra reinforces the difficulty to evidence an intermediate form only present in low relative abundance. Depending on macrocycle substitution, pK values ranged from 3.5 ± 0.1 to 5.1 ± 0.1 for the first protonation and from 3.2 ± 0.2 to 4.9 ± 0.1 for the second one. Inner nitrogens’ basicity is affected by position, number and nature of peripheral substituents depending on their electrodonating character. pK values have been used to establish a predictive Multiple Linear Regression (MLR) model, relying on atom-type electrotopological indices. This model accurately describes our results and should be applied to new TPP derivatives in a drug-design perspective.
Keywords: Meso-tetraphenylporphyrin; Diprotonation; Electronic absorption spectroscopy; Multivariate Curve Resolution; Quantitative Structure–Property Relationship;

Simulated chromatogram (gradient ion chromatography) obtained with the experimental parameters from simplex optimization procedure regarding minimal distance between adjacent peaks (below), in comparison with the experimental chromatogram (above). Gradient programs used: isocratic elution at 11.4 mM KOH for 1.5 min, linear gradient from 11.4 mM to 58.8 mM KOH in 15 min, then isocratic elution.Display OmittedOptimization procedure of gradient separations in ion-exchange chromatography using simplex optimization method in combination with the computer simulation program for ion-exchange chromatography is presented. The optimization of parameters describing gradient profile for the separation in ion chromatography is based on the optimization criterion obtained from calculated chromatograms. The optimization criterion depends on the parameters used for calculations and thus exhibits the quality of gradient conditions for the separation of the analytes. Simplex method is used to calculate new gradient profiles in order to reach optimum separations for the selected set of analytes. The Simplex algorithm works stepwise, for each new combination of parameters that describe the gradient profile a new calculation is performed and from the calculated chromatogram the optimization criterion is determined. The proposed method is efficient and may reduce the time and cost of analyses of complex samples with ion-exchange chromatography.
Keywords: Simplex optimization; Ion-exchange chromatography; Resolution; Optimization criterion; Gradient separation; Computer simulation;

Experimental determination and prediction of bilitranslocase transport activity by Špela Župerl; Stefano Fornasaro; Marjana Novič; Sabina Passamonti (322-333).
Predicted versus experimental values pK I of bilitranslocase inhibitors obtained with the counter-propagation neural network model for the compounds from the training, test and validation set.Display Omitted► We report on new experimental results on bilitranslocase transport activity inhibitors. ► We develop classification and prediction models for new small molecules. ► Applicability domain of the models is assessed. ► The interpretation of influential descriptors provide an insight into transport mechanism.The transport activity of a membrane protein, bilitranslocase (T.C. # 2.A.65.1.1), which acts as a transporter of bilirubin from blood to liver cells, was experimentally determined for a large set of various endogenous compounds, drugs, purine and pyrimidine derivatives. On these grounds, the structure–activity models were developed following the OECD principles of QSAR models and their predictive ability for new chemicals was evaluated. The applicability domain of the models was estimated by Euclidean distances criteria according to the applied modeling method. The selection of the most influential structural variables was an important stage in the adopted modeling methodology. The interpretation of selected variables was performed in order to get an insight into the mechanism of transport through the cell membrane via bilitranslocase. Validation of the optimized models was performed by a previously determined validation set. The classification model was build to separate active from inactive compounds. The resulting accuracy, sensitivity, and specificity were 0.73, 0.89, and 0.64, respectively. Only active compounds were used to develop a predictive model for bilitranslocase inhibition constants. The model showed good predictive ability; Root Mean Squared error of the validation set, RMSV  = 0.29 log units.
Keywords: Drug target; Bilitranslocase transport activity; Inhibition constants; Classification; Predictive models; Artificial neural network; Variable selection; Applicability domain;

Detection and chemical profiling of medicine counterfeits by Raman spectroscopy and chemometrics by Klara Dégardin; Yves Roggo; Frederic Been; Pierre Margot (334-341).
Display Omitted► Analysis of counterfeit medicines in a forensic perspective. ► Raman spectroscopy combined with chemometrics (SVM, PCA, distance measures). ► Robust and quick method validated and used in routine with excellent results.Raman spectroscopy combined with chemometrics has recently become a widespread technique for the analysis of pharmaceutical solid forms. The application presented in this paper is the investigation of counterfeit medicines. This increasingly serious issue involves networks that are an integral part of industrialized organized crime. Efficient analytical tools are consequently required to fight against it. Quick and reliable authentication means are needed to allow the deployment of measures from the company and the authorities. For this purpose a method in two steps has been implemented here. The first step enables the identification of pharmaceutical tablets and capsules and the detection of their counterfeits. A nonlinear classification method, the Support Vector Machines (SVM), is computed together with a correlation with the database and the detection of Active Pharmaceutical Ingredient (API) peaks in the suspect product. If a counterfeit is detected, the second step allows its chemical profiling among former counterfeits in a forensic intelligence perspective. For this second step a classification based on Principal Component Analysis (PCA) and correlation distance measurements is applied to the Raman spectra of the counterfeits.
Keywords: Raman spectroscopy; Chemometrics; Counterfeit medicines; Chemical identification; Profiling; Forensic intelligence;