Thesis supervisors: professor Jaak Vilo and professor Juhan Sedman (Univeristy of Tartu).
Opponent: Dr. Gabriella Rustici, EMBL-EBI, Cambridge University.
Summary
Over the last decades large volume of high-throughput expression data has been generated across the globe and collected into large databases such as GEO and Arrayexpress. Information about relations between proteins, genes, metabolites and enzymes have been characterised and systemised in pathway databases such as KEGG and Reactome.
By combining high-throughput expression data and pathway information we can understand better depicted cellular processes. In this thesis we describe KEGGanim tool, that combines high- throughput expression data and KEGG pathway images for better interpretation of the experimental results. KEGGanim generates interactive animations across conditions of the high- throughput expression data, allowing to observe both temporal and spacial effect of expression dynamics. Animations created with this tool are suitable to be used in slide presentations, on the web or in publications.
The large volume of public data can be used to infer connections between genes based on their expression profile similarity across many biological conditions. This allows to identify shared regulatory mechanisms, common functions and involvement in similar biological processes.
We have developed methodology to perform query based co-expression analysis across hundreds of publicly available datasets. Gene co-expression is calculated in each individual dataset and combined into global prioritised gene list by rank aggregation method. This makes it possible to re- use already existing expression data and allows to discover signals that would otherwise be difficult to find from a single dataset. The implemented web tool Multi Experiment Matrix (MEM) allows interactive data visualisation and down-stream analysis such as further characterisation of found gene lists as well as additional information about individual genes and datasets. The proposed rank aggregation method is suitable to be used in other meta-analysis pipelines beside MEM.