PhenoGen Informatics

Category Genomics>Gene Expression Databases/Tools, Genomics>Gene Expression Analysis/Profiling/Tools, and Genomics>Genetic Data Analysis/Tools

Abstract The PhenoGen Informatics website is a comprehensive toolbox for storing, analyzing, and integrating microarray data and related genotype and phenotype data.

The site is particularly suited for combining Quantitative Trait Loci (QTL) and microarray data to search for candidate genes contributing to complex traits.

In addition, the site allows, if desired by the investigators, storage and sharing of data. Investigators can conduct in-silico microarray experiments using their own and/or shared data.

The PhenoGen toolbox was originally created to facilitate interactions within the INIA consortium of investigators.

In brief, the goals and purpose of the Integrative Neuroscience Initiative on Alcoholism (INIA) consortium is to identify the molecular, cellular, and behavioral neuroadaptions that occur in the ‘brain reward circuits’ associated with the extended amygdala and its connections as a result of exposure to ethanol.

Although PhenoGen web tools were initially created for the consortium members, the integrated tools described here can be used by the global scientific community.

The PhenoGen website consolidates many data analysis and interpretation tools in an easy point-and-click command format and it can also facilitate the sharing of data between investigators across the globe.

The manufacturer has extensive and up-to-date transcriptome databases for whole brain gene expression for BXD RI and inbred strains of mice and RI rats that can be used in an in-silico analysis of correlation with phenotypes arising from the functions of the central nervous system, and to identify “candidate genes” using behavioral and expression QTL data.

In addition to “QTL Tools” the manufacturer provides a number of tools for promoter/upstream sequence analysis, literature search, and tools to obtain annotation for a given list of genes.

Although there are a number of other web-based tools available to carry out many of these analyses individually, PhenoGen provides ‘one-stop access’ to most of these tools and to ‘gene expression’ databases necessary to identify “candidate genes” for complex traits.

Though at present the majority of the data available at PhenoGen is related to gene expression, the tools on the website can be adapted to handle other types of high throughput data, such as data derived from proteomic analysis.

Another advantage of using the PhenoGen database, and associated analytical tools, is that users do Not need expensive computational hardware and do Not require extensive knowledge of programming languages.

PhenoGen Database Content/Access --

The PhenoGen website is currently constructed to allow data to be classified as “Semi-public” or “Open Access”. All of the information about the data uploaded at the PhenoGen website is visible to every registered user.

At present PhenoGen has microarray data from over 1,000 samples from different categories. Each sample represents mRNA obtained from an individual (animal, human or insect) sample and hybridized to an array (either Affymetrix, CodeLink™ or a Custom oligonucleotide array).

Data from any of these well-identified individual arrays can be used to conduct an in-silico experiment.

Registered users have full access to data that are classified as “Open Access” and do Not need to obtain permission from the curator of the data. However, users cannot access or download the “Semi-public” data unless the curator of the data (the Principal Investigator) grants permission to do so.

Registered users can use the data for in-silico experiments on the website or can download the data for use with their own statistical software.

At PhenoGen the curator(s) of the data (the Principal Investigators) also have an option to submit the data to microarray data repositories, such as ArrayExpress (see G6G Abstract Number 20012), as required by a number of journals.

PhenoGen Utility and Tools--

A series of quality control steps can be carried out, once the user has selected arrays to perform the in-silico experiment. This should be done to ensure compatibility and overall quality of the arrays.

The data from arrays in an in-silico experiment can be normalized, filtered and statistically analyzed utilizing several normalization and statistical procedures available on-site.

At numerous points in this process the user can download data, raw or normalized, from experiments being performed on site, for use with other statistical packages of his/her choice.

As with some other databases: PhenoGen offers a range of options for microarray data normalization, filtering and statistical analyses, including corrections for multiple comparisons.

Users can compare gene expression profiles in two (2) groups using any one of the available options, or can use one-way or two-way ANOVA models to check for overall differences when comparing more than two groups.

Furthermore, once an in silico experiment has been created, and the microarray data normalized, the user can search (query) the database to determine the expression levels of any particular transcript(s).

After choosing the correct (created) experiment, the user enters the probe-set ID, or gene name or symbol (or any other annotation ID from the most popular genomic databases) and clicks “search”, leading to display of expression data for the gene or genes in the chosen experiment. These data can be downloaded.

In addition to the standard statistical tools for assessing differential gene expression between or among groups, users can analyze the correlation between ‘gene expression’ levels and ‘phenotype’ (behavioral, biochemical or physiological).

Users can upload phenotype data (as a “.txt” file) for evaluating the correlation of gene expression with the phenotype.

Another distinguishing feature of PhenoGen is its multiple offerings for further data analysis, once a list of differentially expressed or correlated genes is generated on site or up-loaded de novo.

The availability of techniques of genetic mapping and statistical analysis has allowed association of complex behavioral traits with genomic loci (QTL analysis).

In short, QTLs are the genomic regions on the chromosomes that can explain a portion of the genetic variation within a given complex trait. Most complex traits are also significantly susceptible to environmental influences.

A premise of QTL analysis is that the genetic material that contributes to the variance in the trait of interest is located in the area of the genome defined by the QTL(s) for the trait.

The PhenoGen website allows users to access information for ‘gene location’ in the genomes of mouse, rat and human, to access data (MGI) on QTLs for a number of traits, and to analyze whether the location of genes falls within relevant phenotypic QTLs.

In addition to the ‘QTL Query tools’, the PhenoGen website offers a wide variety of tools to ‘interpret’ a gene list derived on site or up-loaded by the user. Such a list can include a few or hundreds of differentially expressed genes derived from a typical microarray experiment.

At PhenoGen, users have access to tools, including annotation (basic and advanced), promoter analysis (to understand transcriptional regulation) and literature searches (including “co-citation” searches) for the entries in a list of differentially expressed genes.

To understand the transcriptional regulation of differentially expressed genes, users can use either oPOSSUM or MEME on the PhenoGen site.

oPPOSUM uses ‘human-mouse orthologs’ in calculating the over- representation of conserved transcription factor binding sites.

On the other hand, MEME explores the occurrences of previously uncharacterized transcriptional motifs.

Alternatively, the user can download the ‘upstream sequences’ of genes of interest using the PhenoGen site and carry out similar analysis using other tools.

PhenoGen Literature search --

The ‘literature search’ option on PhenoGen is an automated literature search that can be tailored to particular area(s) of interest by selecting a set of query terms.

The automated literature search looks for articles in PubMed that mention any of the genes, including synonyms, in the gene list generated on site or uploaded by the user, and one or more of the chosen query terms.

The results of the search are organized by the user-defined categories and by gene name, and contain direct links to PubMed citations.

Also included in the results of a search is a list of articles where two or more of the genes from the gene list is cited in the same article (‘co- citation results’). This allows the user to easily identify established relationships between genes.

System Requirements

Web-based.

Manufacturer

Manufacturer Web Site PhenoGen Informatics

Price Contact manufacturer.

G6G Abstract Number 20512

G6G Manufacturer Number 104131