On 28 August at 10.00 Geven Piir will defend his PhD thesis in Molecular Engineering titled "Environmental risk assessment of chemicals using QSAR methods" in the Faculty of Science and Technology of University of Tartu (UT Chemicum, Ravila 14a - 1021, Tartu).
Supervisor: Sulev Sild (PhD), Institute of Chemistry, University of Tartu
Opponent: Prof. Alexandre Varnek, University of Strasbourg (France)
Bioconcentration is an important endpoint for the determination of the fate and behaviour of chemicals in the environment. One area where BCF is extensively used is environmental risk assessment. However, experimental measurement of BCF for one chemical can take up to six months, cost around 100,000 euros, and need about one hundred animals. Therefore, thousands of chemicals are not being experimentally measured. This creates the need for the development of faster and more economical QSAR models to predict BCF for chemicals with no experimental data. To fill the gaps, many theoretical models have been developed. Wide chemical space makes it hard to use one universal model for all the chemicals. Therefore, at risk assessment, applicability of the chosen model is assessed for each chemical. On top of that, for more reliable results, multiple models are used.
The goal of this thesis is to provide an outline for risk assessment procedures, bioconcentration factor and different QSAR methodologies. The modelling part of the thesis is divided into two. The first part focuses on the regression analysis and the second part on the classification problems. At first, a global regression model was proposed for predicting BCF. The global model could predict a wide variety of chemicals and provide information about the model’s applicability domain. The creation of the global model laid the foundation for the exploration of the possibilities to improve prediction quality using smaller, more focused data sets. Most of the subsets of focused models showed better predictive power compared to the global model. Additionally, consensus model was compared against the global model and local models. Proposed consensus model outperformed all of them. To separate bio-accumulative and non-bio-accumulative chemicals three classification models with different training set compositions were proposed. All three developed models had their strengths in different classification scenarios, but the most all-purpose model was the model where classes were distributed evenly. To identify whether a chemical fits into the boundaries of the model, a new approach was proposed for assigning applicability domain for Random Forest based models. Applying AD shows how many similar chemicals were used to develop the model and how well they were predicted. The information provided by the AD schema allows making a more confident final decision about the correctness of the prediction.
Building a QSAR model is not a trivial task. The purpose of the model declares which aspects should receive special attention. For risk assessment, it is important to use relevant endpoints and unambiguous algorithms. All the models built during this work use well-defined algorithms and BCF as an endpoint. Attention was paid to the requirement of model validation and defined applicability domain. In addition, all the used descriptors have a sound mechanistic interpretation in relation to BCF. Therefore, all of these models can be used in environmental risk assessment to get additional information about the bioaccumulation potential of chemicals.