Thesis supervisor: professor Juhan Sedman (University of Tartu) ja professor Jaak Vilo (University of Tartu).
Opponent: professor Boris Lenhard, Imperial College London, Suurbritannia
Summary
In order to understand the basic principles of how organisms function, and to be able to affect the biological processes, we need to understand relationships between genes and proteins. Modern high-throughput technology enables to study different sides of biological processes in a rapid manner. This, however, has led to a steady growth of amount of data available. The need for more sophisticated methods for analysing raw data, for combining different data sources and to visualise the results, has emerged. Additionally, computational modeling is required to test if our understanding of biological processes is supported by the available data.
In this doctoral thesis, variety of bioinformatics methods demonstrate how to combine different type of high-throughput data for identifying relationships between genes. In this thesis it was shown that through combining various data types from different sources adds value to already published data. In this thesis, data from publications about embryonic stem cell regulation was collected together and made available through Embryonic Stem Cell Database (ESCDb). Complementary data in the database allows researchers to find relationships between genes that would not be possible when analysing only one dataset at a time. One of the main findings of this study illustrates how using computational modelling on data from the ESCDb allowed to find a novel pluripotency regulator — IL11.
Additionally, integration of different data types led to identification of alternative gene regulatory modules of core pluripotency regulator OCT4. Similarly, combination of conservation data and regulatory motif analysis led to identification of three new regulators of adipocyte differentiation. This thesis also covers novel methodology, VisHiC, for automatic identification and visualisation of functionally related gene sets. This methodology allows to find relevant gene sets for further characterisation from large high-throughput datasets.
This doctoral thesis demonstrates that integration of different high-throughput datasets enables establishing gene-gene relationships that would not be possible when looking at a single data type in isolation.