Integration of functional information of genes in fuzzy clustering of short time series gene expression data


Recent studies have shown that incorporation of available biological information often leads to biologically more relevant results. Motivated by such studies, we extend template based clustering algorithm to incorporate functional annotation information available for genes. Functional similarities between two genes are calculated based on their annotation in the Gene Ontology (GO) database. To these end three methods of calculating functional similarity are explored. We have measured the correlation between average pairwise similarity score and average membership function values to check the validity of assumption that biologically and functionally related genes are also similar in their expression profiles as well as in their GO functional annotation. We observe that Jiang and Conrath's measure is highly correlated with average membership function value of genes. So we use this method for further analysis. With the incorporation of functional similarity score, we have more choices for the objective function to find out the best clustering of gene expression data. We have performed a comparative study to find the combination of objective functions that leads to more biologically relevant information. We have found that different choices of the objective function lead to different sets of templates, while some common templates are identified by all of them. Based on the aim of the study we suggest either to use all three objectives or to use the two objectives related to functional similarity and quantization error.