Background:

Disease-specific genetic information has been increasing at rapid rates due to major technological advances. Numerous systems designed to capture and organize this mounting sea of information have emerged, but these resources differ dramatically in their disease coverage and genetic depth. With few exceptions, researchers must search a variety of sites manually to assemble a complete set of genetic evidence for a particular disease of interest. Therefore, we elected to design a real-time aggregation tool that provides both comprehensive coverage and reliable gene-to-disease rankings for any disease. Our tool, called Genotator, automatically integrates data from 11 publically accessible clinical genetics resources and uses these data in a straightforward formula to rank genes in order of disease relevance. We tested the accuracy of coverage of Genotator in two separate disease use cases, Autism Spectrum Disorder and Parkinson Disease for which there exist specialty curated databases.

Results:

Genotator demonstrated that most of the 11 selected databases contain unique information about the genetic composition of disease, with 1186 genes found in only one of the 11 databases. These findings confirm that the integration of these databases provides a more complete picture than would be possible from any one database alone. Genotator successfully identified over 75% of the top 50 ranked genes for Autism, and over 75% of the top ranked Parkinson Disease candidates.

Conclusions:

As a meta-query engine, Genotator provides high coverage of both historical genetic research as well as recent advances in the genetic understanding of specific diseases. As such, Genotator provides a real-time aggregation of ranked data that remains current with the pace of research in the disease fields. s algorithm appropriately transforms query terms to match the input requirements of each targeted databases and accurately resolves named synonyms to ensure full coverage of the genetic results with official nomenclature. Genotator generates an excel-style output that is consistent across disease queries and readily importable to other applications.