Application Logo
Welcome to GeneConnect v 1.0

The NCI caBIGTM project is creating a common, extensible informatics platform that integrates diverse data types and supports interoperable analytic tools. This platform will allow research groups to tap into the rich collection of emerging cancer research data while supporting their individual investigations. However, because many software applications utilize non-overlapping sets of genomic identifiers in their object models, they won't interoperate. GeneConnect is a caBIGTM mapping service that makes this interoperability possible by interlinking approved genomic identifiers. These include:

  • Ensembl Gene ID
  • Ensembl Transcript ID
  • Ensembl Protein ID
  • Entrez Gene ID
  • UniGene ID
  • GenBank mRNA Accession Number
  • GenBank Protein Accession Number
  • RefSeq mRNA Accession Number
  • RefSeq Protein Accession Number
  • UniProtKB Primary Accession Number

To interlink all of these identifiers, database annotations (either direct or inferred) and an alignment engine have been used to construct pairwise connections, and then all-to-all relationships have been calculated by traversing all possible combinations of edges in the graph (See Figure) using every node as the starting point. For each query, composed of one or more input identifiers and a set of paths that may be traversed, the Path Score and Frequency are calculated. These are defined as:

  • Path Score: Path Score is calculated for each set of genomic identifiers in the result set. The Path Score is the frequency that a given set of genomic identifiers was obtained across all traversed paths, given the query criteria composed of one or more input identifiers and a set of paths that may be traversed.
  • Frequency: Frequency is calculated for each genomic identifier in the result set. The Frequency denotes how often a given genomic identifier was obtained from a given data source across all traversed paths, given the query criteria composed of one or more input identifiers and a set of paths that may be traversed.

ApplicationProperties.getValue(
GeneConnect Build Information
Number of pairwise links 42
Number of distinct genomic identifier sets 22162231
Number of possible paths through the GeneConnect graph 4106
Database Version Information
Ensembl Version 40
UniGene HomoSapiens Build#194 (26-July-2006)
EntrezGene HomoSapiens Build (1-August-2005)
GenBank Nucleotide Data currently not available
GenBank Protein Data currently not available
UniprotKB Version 8.0 Release(30-May-2006)
RefSeq Release 18
   CONTACT US       PRIVACY NOTICE       DISCLAIMER       ACCESSIBILITY       REPORT PROBLEMS