The Chromatin Database (ChromDB)

Category Cross-Omics>Knowledge Bases/Databases/Tools and Cross-Omics>Sequence Analysis/Tools

Abstract The Chromatin Database (ChromDB) website displays chromatin-associated proteins, including RNAi-associated proteins, for a broad range of organisms.

The manufacturer’s primary focus is to display sets of highly curated plant genes predicted to encode proteins associated with chromatin remodeling.

The manufacturer’s intent is to make this intensively curated sequence information available to the research and teaching communities in support of comparative analyses toward understanding the chromatin proteome in plants, especially in important crop species such as corn and rice.

Model animal and fungal proteins are included in the database to facilitate a complete, comparative analysis of the chromatin proteome and to make the database applicable to all chromatin researchers and educators.

Chromatin biology and chromatin remodeling are complex processes involving a multitude of proteins that regulate the dynamic changes in chromatin structure which either repress or activate transcription.

The manufacturers strive to organize ChromDB data in a straightforward and comparative manner to help users understand the complement of proteins involved in packaging DNA into chromatin.

ChromDB Database contents --

ChromDB sequences fall into two (2) categories: genomic-based and transcript-based.

Genomic-based sequences are limited to plant genomes [A. thaliana, Oryza sativa (japonica cultivar- and indica cultivar-groups), Medicago truncatula, Populus trichocarpa, Physcomitrella patens (moss) and Z. mays] and algal and diatom genomes (Chlamydomonas reinhardtii, Ostreococcus lucimarinus and Phaeodactylum tricornutum).

The plant genomes are highlighted on the left-side toolbar on the ChromDB home page. Other important plant species are included in the database as transcript-based sequences which are derived from EST contigs or singlets. The use of EST contigs results in partial sequences especially for larger proteins.

Partial protein sequences, usually protein domains or the C-termini, are used as BLAST queries when identifying EST contigs. The use of a limited span of protein, rather than the entire sequence, limits redundancy that could result from the inclusion of multiple, non-overlapping contigs representing different regions of the same transcript.

Transcript-based plant sequences are converted to genome-based as sequencing projects produce sufficient data to make a conversion worthwhile.

ChromDB does Not display whole chromosomes; thus for genomic-based organisms, the genomic sequence is limited to a span of nucleotides containing the predicted transcript splice model and 5' and 3' untranslated regions.

Plant sequences are obtained from a variety of sources, e.g. NCBI databases, the Department of Energy Joint Genome Institute, The Arabidopsis Information Resource; the J. Craig Venter Institute, [formally The Institute for Genomic Research (TIGR)] and PlantGDB.

All plant sequences are curated by ChromDB staff members to provide the best transcript models.

Important animal and fungal model organisms, such as Homo sapiens, Drosophila melanogaster and S. cerevisiae, are available as transcript-based sequences and are obtained from the NCBI Reference Sequence (RefSeq).

The manufacturers focus on sequenced genomes and do Not derive EST contigs for non-plant organisms. These transcripts are rarely curated by ChromDB staff, except for predicted transcript models that need substantial improvement as indicated by multiple sequence analysis and only when RefSeq accessions affect the quality of a phylogenetic tree.

All database sequences are assigned a ChromDB ID (identifier) which denotes both the transcript and the protein. These identifiers, as well as formal gene names, loci and aliases are included in the database and can be used to search for gene records.

ChromDB Database protein groups --

There are over 90 protein groups displayed at ChromDB; however, they can be grouped into parent categories reflecting different functional aspects of chromatin biology. The next, lower level of organization is the individual protein groups, i.e. the three- to five-letter designations.

The more complex groups such as the CHR proteins (SWI/SNF chromatin remodeling ATPase super family) can be broken down further into distinct phylogenetic groups, e.g. SNF2, CHD1, and RAD16. This protein group classification scheme forms the basis for advanced searching and generating reports.

The major divisions of protein groups are as follows:

Histones and Histone Linker Proteins, Nucleosome Organization (includes assembly and displacement), Histone Modifications, Histone Modification Binding-Proteins, Modified-Histone-Binding Proteins, DNA Modifying Proteins, Non-Histone DNA-Binding Proteins, RNAi Components and Chromosome Dynamics.

ChromDB Database access and interface --

The contents of the database can be searched, compared, and visualized, using a variety of search functions, viewers and report tools. A user manual is available through a ‘Help’ link at the top of each web page.

Two (2) search options are provided, a limited ‘Quick Search’ text box and a menu-driven ‘Advanced Search’ that provides the means for comparative searching.

The ‘Quick Search’ text box is located at the top of every web-page and accepts single entries for gene names, the ChromDB ID, the formal gene name, an alternative alias or a locus.

In those cases where the same formal gene name is used for multiple organisms, a list is generated showing each gene name and organism, as well as the ChromDB ID. Additionally, this text box accepts an organism name (either the scientific or common name) or an NCBI accession.

An ‘Advanced Search’ is available from the link on the left-side menu. This link brings up a menu-driven format that allows the user to customize a search in a variety of ways using three (3) different criteria: organisms, protein groups, and the type of report.

The first two criteria have alternative options. For the organisms, the default is an alphabetical list of scientific names, and a link is provided to switch to a taxon classification (e.g. plants, animals, fungi).

For the Protein Group selection, the default is the functional groups and a link is provided to display an alphabetical list of all protein groups. Alternatively, a link is provided that displays a text box for entering a list of gene names as well as the report selection.

The ‘gene record page’ is the central navigation portal for accessing information relating to each database gene.

ChromDB uses the GMOD tool, GBrowse, as an individual gene-based visualization tool and Not as a genome wide or chromosome visualization tool.

For genomic-based organisms, the GBrowse view is based on the genomic sequence and the display includes the transcript splice model, protein domains (aligned against the transcript model) and NCBI accessions.

The inclusion of the protein domains aligned to exons is useful in discerning the effect of alternate transcript splicing on protein domain structure.

Each individual plant can have specialized tracks, for example Arabidopsis displays have a track for Agrobacterium T-DNA insertion events.

For transcript-based plant organisms, the GBrowse display is based on the transcript and tracks include NCBI accessions and protein domains. For non-plant organisms, the GBrowse display is limited to the transcript, protein domains, and the RefSeq accession.

ChromDB also provides a local BLAST server. In addition to similarity searching, this tool is useful in determining if a gene is present in the database.

Users can select the standard BLAST programs as well as preset databases such as plants, animals, or fungi. On the results page, each match is linked back to that gene’s Gene Record Page where more information can be obtained about that gene.

External links are provided to users in several places. There is a link on the homepage on the left tool bar and within the web pages. For example, each plant genome page has a list of appropriate links, e.g. TAIR (The Arabidopsis Information Resource), among others, for A. thaliana, and The Craig Venter Institute (TIGR) for a number of organisms.

ChromDB Comparative tools --

Part of the manufacturer’s mission is to provide the community with the means to make comparative analyses of chromatin-associated proteins among a diverse group of organisms. The links for these comparative tools and viewers are provided on the left-side tool bar on each web page.

Most of these features use the same menu-driven interface discussed above for the ‘Advanced Search’ feature.

Examples of these features are: the ability to form FASTA files (entire sequence or a protein domain) and viewers for Pfam and SMART protein domains and exon structure.

The manufacturers encourage potential users to explore the website home page to discover all database features. Information about the selection menus and the tools can be found on the general Help page.

System Requirements

Contact manufacturer.

Manufacturer

Manufacturer Web Site The Chromatin Database (ChromDB)

Price Contact manufacturer.

G6G Abstract Number 20764

G6G Manufacturer Number 104342