Microarray data was processed as described in Huttenhower et al.
Standards for Bayesian integration were constructed as previously described in Huttenhower et al.
with the important addition that homolog pairs were excluded from positive examples.
Functional interaction networks are used to define a query gene's functional neighborhoods which is a set of genes with strong
connections to the query. We use TreeFam
to map gene neighborhoods from different organisms onto a species-independent space of meta-genes.
The TreeFam system defines families that evolved from a single gene in
the last common ancestor of all animals (with closely related plant and fungi genes included).
A small number of genes that appeared in more than one TreeFam family were excluded from consideration.
To map a gene's neighborhood onto the meta-gene set a meta-gene is considered present if any of the member genes are,
thus for multi-gene families a connection to any of the members is sufficient.
To determine the functional similarity
of genes from different organisms we compute the hypergeometric p-value of their meta-gene neighborhood overlap.
The background set of TreeFam families used for the p-value computation is specific to the organism pair considered
and is defined as all TreeFam families that contained at least one gene from each organism such that the gene is also
present in our microarray compendium. Likewise for the purpose of the p-value calculation the size of each gene's
TreeFam neighborhood is considered to be the the set of those TreeFam families that are both present in the gene's
neighborhood and in the organism-pair-specific background set.
To compute GO enrichments for Treefam families we consider a family to be annotated to a particular term if any of
the member genes have an experimental annotation for that term. GO enrichment is computed as hypergeometric p-values
with the background count taken from the organism-pairs-specific background families defined above. Thus, while the
annotations are not organism-specific the enrichment computation does depend on the organism pair being considered.
All p-values are FDR adjusted and cut-off at 0.05.