Abstract The Sequence Analysis Collection offers essential bioinformatics tools and algorithms for creating practical sequence analysis workflows that complement any laboratory research. With over 100 different component functions, you can analyze and annotate DNA and protein sequences using a variety of industry accepted methods. Product can be used to combine components into logical networks that automate the analysis and assembly of sequence families.

Sequence Similarity Searching - For uncomplicated sequence similarity searching, use the Smith-Waterman component or any of the products standard Basic Local Alignment Search Tool (BLAST) components, including BLASTn, BLASTp, BLASTx, tBLASTn, and tBLASTx. Display the results of these searches with the products Similarity Search Viewer, and extract or fetch to further analyze the individual hits. Pipeline Pilot (see Note 1) also makes it easy to build custom BLAST databases on-the-fly, providing more targeted and intelligent sequence similarity searches.

Manipulation and Annotation - An assortment of sequence manipulation and annotation components are available. For DNA sequences, these components include functions such as primer identification, guanine-cytosine (GC) content, six-frame translations, reverse complement, and small interfering RNA (siRNA) target site prediction. For protein sequences, you can determine back translation, secondary structure prediction, and isoelectric point.

Alignment and Profile Searching - To identify potential homologs among a collection of different organisms and quickly select variants among nucleotide sequence regions, you can use the products alignment and profiling components. To allow for multiple sequence alignments of either DNA or protein sequences, ClustalW (a general purpose multiple sequence alignment program for DNA or proteins) is included. For profiling tasks, hidden Markov model (HMM) Build, Align, Search, and Pfam (a database containing information about protein domains and families) are available.

Pattern Matching - A variety of standard tools are implemented so you can search for interesting patterns or motifs within a biological sequence. These algorithms enable the identification of potential PROSITE (a database of protein families and domains) regions, GC rich regions, proteolytic cleavage sites, restriction enzyme sites, signal peptide cleavage sites, open reading frames, or regular expression patterns.

Result Viewers - You can also visualize sequence information and related features with the products sequence viewers. These viewers include a custom report [portable document format (PDF) or hypertext markup language (HTML)], plain text view, or Artemis viewer (for nucleotide sequences). Also, multiple sequence alignments can be displayed in a custom report (PDF or HTML), plain text, or JalView (a multiple sequence alignment editor & viewer written in the Java programming language).

Third-Party Tool Integration - The Sequence Analysis collection includes examples of integration with BioPerl, NCBI BLAST, GCG programs (an integrated package featuring a comprehensive collection of DNA-, RNA-, and protein-sequence-analysis tools), EMBOSS tools, and BioJava. You can use these examples as templates to extend the available functionality to include other programs of interest. Your Pipeline Pilot integration options include Java, Perl, Simple Object Access Protocol (SOAP), Visual Basic (VB) Script, or writing simple command-line wrappers.

Note 1: Pipeline Pilot Overview - Pipeline Pilot streamlines the integration and analysis of vast quantities of data flooding the research informatics world. It makes the most of your information resources through industrial-scale data flow control and advanced mining capabilities. You can graphically compose data processing networks, known as 'protocols', using hundreds of different configurable components for operations such as data retrieval, manipulation, computational filtering, and display. These protocols are automatically captured as you create them and you can publish them for project/enterprise use. From a Web interface, your colleagues can invoke your protocols and run them using their own data, etc.

