Functional networks.

Microarray data was processed as described in Huttenhower et al.. Standards for Bayesian integration were constructed as previously described in Huttenhower et al. with the important addition that homolog pairs were excluded from positive examples.

Functional neighborhoods

Functional interaction networks are used to define a query gene's functional neighborhoods which is a set of genes with strong connections to the query. We use TreeFam B families to map gene neighborhoods from different organisms onto a species-independent space of meta-genes. The TreeFam system defines families that evolved from a single gene in the last common ancestor of all animals (with closely related plant and fungi genes included). A small number of genes that appeared in more than one TreeFam family were excluded from consideration. To map a gene's neighborhood onto the meta-gene set a meta-gene is considered present if any of the member genes are, thus for multi-gene families a connection to any of the members is sufficient.

Similarity score

To determine the functional similarity of genes from different organisms we compute the hypergeometric p-value of their meta-gene neighborhood overlap. The background set of TreeFam families used for the p-value computation is specific to the organism pair considered and is defined as all TreeFam families that contained at least one gene from each organism such that the gene is also present in our microarray compendium. Likewise for the purpose of the p-value calculation the size of each gene's TreeFam neighborhood is considered to be the the set of those TreeFam families that are both present in the gene's neighborhood and in the organism-pair-specific background set.

Enrichment Computations

To compute GO enrichments for Treefam families we consider a family to be annotated to a particular term if any of the member genes have an experimental annotation for that term. GO enrichment is computed as hypergeometric p-values with the background count taken from the organism-pairs-specific background families defined above. Thus, while the annotations are not organism-specific the enrichment computation does depend on the organism pair being considered. All p-values are FDR adjusted and cut-off at 0.05.