Abstract GeneCards® is a compendium of human genes and their encoded protein, with major focus on functional genomics and medical aspects including involvement in diseases. GeneCards offers concise information about the structure and function of human genes. It extracts and integrates a carefully selected subset of the gene information, obtained from major data sources, public and proprietary, successfully overcoming barriers of data format heterogeneity.

GeneCards is unique in its combination of user friendly interface, as well as the organization and display of just the right mix of links and detailed information. Data are divided into “fields” such as “Aliases”, “Location”, “Ontologies”, “Expression” and “SNPs”, all searchable by keywords. Since GeneCards employs automatic data mining from dozens of sources, it provides a truly complete summary for each gene.

Note: GeneCards was developed over the last 10 years by a world- leading bioinformatics team at the Weizmann Institute of Science in Israel.

GeneCards features/capabilities include:

1) GeneCards has a major focus on medical aspects and roles in disease. It offers information about the functions of all human genes that have an approved symbol, as well as selected others.

2) GeneCards extracts and integrates a subset of the information stored in major data sources dealing with human genes and their products. Information is divided in to “fields” such as “chromosomal location”, ontologies”, “pathways” and “expression”.

3) Since GeneCards relies on information from many data sources it provides a complete summary on each gene.

4) GeneCards is unique in its user friendly interface and organization. It displays just the right mix of links and detailed information on each gene (a GeneCard for each gene).

4) Searching GeneCards finds more genes associated with a disease or protein characteristic in comparison to other databases.

5) GeneCards data is available in text and Extensible Markup Language (XML) formats.

GeneNote, GeneAnnot, GeneLoc and GeneTide are a suite of specialized databases that are integrated with GeneCards.

These databases concentrate on 'gene expression' in normal human tissues, gene microarray annotation, gene location and organization and analysis of Expressed Sequence Tags (ESTs). The suite of GeneCards related databases helps researchers overcome the bottleneck of data analysis in biology and offers a profound understanding of the role of individual genes and of the way genes function together.

GeneNote -- GeneNote is a database of human genes and their expression profiles in healthy tissues. It is based on Weizmann Institute of Science DNA array experiments, which were performed on the Affymetrix HG-U95 set A-E.

It offers:

1) An expression profile (tissue vector) for each gene in the human genome.

2) Gene and tissue clustering based on expression profiles.

3) A full genome ranking procedure according to the gene's tendency for tissue specificity, from tissue-specific to housekeeping genes. GeneAnnot -- GeneAnnot provides data on annotation of probe sets by direct sequence comparisons of probes to mRNA sequences. Annotation of probe sets is revised and improved by direct sequence comparison of probes to GenBank, RefSeq and Ensembl mRNA sequences.

Whenever possible, probe sets are related to GeneCards genes, while assigning sensitivity and specificity scores to each probe-set to gene match. In the remaining cases, probe sets are annotated by their relation to GenBank mRNA sequences and UniGene clusters. The results are integrated with GeneCards, GeneLoc and GeneNote databases.

GeneLoc -- The GeneLoc algorithm creates an integrated map of the human genome. GeneLoc unifies gene collections, eliminates redundancies, and assigns each gene a meaningful location-based identifier, which also serves as its GeneCards ID. GeneLoc currently uses gene sets from NCBI and Ensembl.

It compares these collections, deciding which entries should be consolidated and which are discrete. Since the gene annotations use the same assembly and coordinate scheme, GeneLoc effects this gene integration by comparing genomic locations. The resulting GeneLoc 'gene territory' reflects the range of the unified genes, taking into account every exon.

Additionally, DNA segments, classified by categories (such as STSs mapped by various methods and EST clusters) are presented, alongside the genes, on a Megabase-scale map, with further information and links to relevant databases.

GeneTide -- GeneTide is an automated system for human transcripts (mRNA & ESTs) annotation and elucidation of de-novo genes.

GeneTide integrates various data resources in order to create a comprehensive list of human genes. This is done by association between the set of over ~5.5 million human ESTs currently available from dbEST and mRNA sequences from GenBank to the set of ~35,000 human genes as defined in GeneCards.

Using GeneTide, transcripts (mRNA & EST) can be:

1) Proven to belong to an existing GeneCards gene.

2) Used to define de-novo genes.

3) Demonstrated to be an artifact or to be contaminated (genomic DNA, vector, etc.).

