Reanalysis of publicly available bioinformatics datasets may provide an important source of new knowledge. The aim of this project is to classify endocrine-resistant tumours from publicly available datasets using known multi-gene signatures relevant to endocrine resistance. This could be used to suggest molecular mechanisms associated with resistance in individual tumours and to see how different mechanisms are represented in different datasets.

The analysis presented on this web-page was completed by Dr. Alexey Larionov for his MSc project in applied bioinformatics. The project was completed in summer 2012 in the Edinburgh Cancer Research Centre. The project co-supervisors were Prof. David Cameron (Edinburgh University) and Dr. Sarah Morgan (Cranfield University). Full version of the thesis is available on the link at the bottom left and from the Cranfield University library.

The project objectives included:

1) Search for endocrine resistant cases in available public transcriptomic datasets
2) Search for transcriptional signatures associated with endocrine resistance
3) Pre-process the selected datasets for later analysis
4) Translate signatures to the namespaces of the selected datasets
5) Design the classification algorithm
6) Classify the resistant cases according to the molecular mechanisms represented by the signatures
7) Present the pipeline and results on a dedicated web-site

The pipeline has been designed and successfully applied for classification of 9 datasets using 7 transcriptional signatures.

The designed pipeline consists of:

1) Procedures for a manually curated selection of relevant datasets and signatures;
2) Procedures for semi-automatic data pre-processing, allowing cross-platform analysis;
3) A new, fully automated, classification algorithm (Iterative Consensus PAM).

The main features of the developed pipeline include:

1) It is based on un-supervised partitioning;
2) It allows for "non-classifiable" samples;
3) The procedure does not require a training set;
4) The procedure can be used in a cross-platform context (Affymetrix & Illumina).

The summary of selected datasets and signatures, core of the classification algorithm and the classification results are presented in the respective pages of this web site.