About autworks - Methods
Diseases and gene lists
We downloaded a complete set of 433 neurological disorders from the National Institute of Neurological Disorders and Strokes (NINDS) online database. We generated lists of genes for each disorder by taking the union of genes returned from OMIM [6] and GeneCards [6]. We computed the intersection of each disease gene list with the list for Autism and ranked the results in descending order. This allowed us to circumscribe a list of diseases with the greatest number of genes in common with Autism, resulting in a set of 40 diseases with possible molecular similarities to Autism.
Disease relationship tree
The seed lists provided by OMIM and GeneCards were combined and transformed into a matrix of binary gene presence / absence. The matrix was then analyzed using maximum parsimony in PAUP* [7] to reconstruct the relationships among the 41 diseases. Distance based clustering approaches (neighbor joining and UPMGA) produced similar results.
Molecular network reconstruction
We selected a cluster of the most closely related diseases from the disease tree (methods above) for subsequent gene and process network analysis. The gene lists for all 12 most closely related diseases were extracted from the complete set of 433 neurological disorders. These lists were then sent to the Search Tool for Retrieval of Interacting Genes/Proteins (STRING) [8] to build networks consisting of edges from 5 separate lines of evidence: Conserved neighborhoods,Co-occurrence,Co-expression, Databases, and Textmining. Briefly these lines of evidence consist of the following:
Neighborhoods (N): synteny derived from SwissProt and Ensembl
Co-occurrence (PhyPro): phylogenetic profiles derived from COG database [9, 10]
Co-expression (Ex):co-regulation of genes measured using microarrays imported from ArrayProspector [11]
Databases (Db): validated small- scale interactions, protein complexes, and annotated pathways from BIND [12], KEGG [13] and MIPS [14]
Text (Txt): co-mention of gene names from PubMed abstracts
The networks were assembled to include links only among the original set of genes, i.e., no additional nodes were added to increase the connectedness of the networks, using the default value for edge confidence (0.4, ‘medium’ confidence as defined by STRING). STRING’s edge confidence is calculated by using KEGG as a benchmark. Any predicted association for which both proteins are assigned to the same ‘KEGG pathway’ is counted as a true positive [8]. These resultant edge lists were then imported into a relational database for subsequent analysis. Set analysis of the graphs was conducted in SQL.
REFERENCES
| 1. | Belmonte, M.K. and T. Bourgeron, Fragile X syndrome and autism at the intersection of genetic and neural networks. Nat Neurosci, 2006. 9(10): p. 1221-5. |
| 2. | Marcotte, L. and P.B. Crino, The neurobiology of the tuberous sclerosis complex. Neuromolecular Med, 2006. 8(4): p. 531-46. |
| 3. | Wong, V., Study of the relationship between tuberous sclerosis complex and autistic disorder. J Child Neurol, 2006. 21(3): p. 199-204. |
| 4. | Moretti, P. and H.Y. Zoghbi, MeCP2 dysfunction in Rett syndrome and related disorders. Curr Opin Genet Dev, 2006. 16(3): p. 276-81. |
| 5. | Manning, M.A., et al., Terminal 22q deletion syndrome: a newly recognized cause of speech and language disability in the autism spectrum. Pediatrics, 2004. 114(2): p. 451-7. |
| 6. | http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM. |
| 7. | Swofford, D.L., PAUP* Phylogenetic Analysis Using Parsimony (*and Other Methods). 2002, Sinauer Associates: Sunderland, Massachusetts. |
| 8. | von Mering, C., et al., STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res, 2005. 33(Database issue): p. D433-7. |
| 9. | Tatusov, R.L., E.V. Koonin, and D.J. Lipman, A genomic perspective on protein families. Science, 1997. 278(5338): p. 631-7. |
| 10. | Tatusov, R.L., et al., The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res, 2000. 28(1): p. 33-6. |
| 11. | Jensen, L.J., et al., ArrayProspector: a web resource of functional associations inferred from microarray expression data. Nucleic Acids Res, 2004. 32(Web Server issue): p. W445-8. |
| 12. | Bader, G.D., et al., BIND--The Biomolecular Interaction Network Database. Nucleic Acids Res, 2001. 29(1): p. 242-5. |
| 13. | Kanehisa, M., et al., The KEGG resource for deciphering the genome. Nucleic Acids Res, 2004. 32(Database issue): p. D277-80. |
| 14. | http://mips.gsf.de/. |