CLC Genomics Workbench
Category Cross-Omics>Next Generation Sequence Analysis/Tools
Abstract CLC Genomics Workbench is a new solution for analyzing and visualizing Next Generation Sequencing (NGS) data. It incorporates cutting-edge technology and algorithms, while also supporting and integrating with the rest of your typical NGS workflow.
CLC Genomics Workbench includes all features of 'CLC Main Workbench' (see G6G Abstract Number 20096A) and the following additional functionalities:
1) De novo assembly -- The de novo assembly of CLC Genomics Workbench supports both short and long reads, it supports paired-ends reads, and it supports Sanger, 454, Illumina Genome Analyzer, Helicos, and SOLiD sequencing data.
The de novo assembly process has two stages: First, contig sequences are created by aligning all the reads. Second, all the reads are assembled using the contig sequence as reference.
2) Reference assembly -- The reference assembly of CLC Genomics Workbench supports both short and long reads, it supports paired-ends reads, and it supports Sanger, 454, Solexa, Helicos, and SOLiD sequencing data.
3) Reference assembly of mixed datasets (e.g. 454 and Illumina Genome Analyser).
4) Reference assembly of genomes of any size.
5) Assembly of standard read data and support for assembly of paired end reads / mate pair reads of any sequencing technology.
6) Advanced graphical tools for the detection of large scale mutations and rearrangements:
- a) Single reads - coverage and conflicts - When you only have single reads data, coverage is one of the main resources for interpretation.
- To assist in this interpretation, CLC Genomics Workbench displays a 'coverage graph' along the contig by clicking the checkbox in the Side Panel.
- b) Paired-ends reads - graphical overview - Paired-ends data allows for much more advanced approaches to detecting genome rearrangements than single reads, and CLC Genomics Workbench therefore facilitates several ways of analyzing such paired end data.
- c) Paired end reads - insertions and deletions - CLC Genomics Workbench includes a number of graphical options of identifying genomic insertions and deletions when the sequencing produces paired-end reads.
- d) Paired end reads - duplications and inversions - CLC Genomics Workbench includes a number of graphical options of identifying genomic duplications and inversions when the sequencing produces paired-end reads.
7) Multiplex Sequencing by Name - When you do batch sequencing of different samples, you can use multiplexing techniques to run different samples in the same run.
There is often a data analysis challenge to separate the sequencing reads, so that the reads from one sample are assembled together.
8) Support for Multiplex Sequencing by Tag - With many of the new high- throughput processes there is a need for being able to input several different samples to the same sequencing run.
One method is to tag the sequences with a unique identifier during the preparation of the sample for sequencing [Meyer et al., 2007].
9) Masking of reference assembly based on annotations like e.g. exons.
10) Integration with CLC bio’s High Performance computing solutions (see below), making assemblies very fast.
11) Interactive and zoom-able viewing of genome assemblies, including sequencing reads, quality data, and reference sequences. Full integration of the viewers included in the downstream analyses.
12) Quality reporting and statistics on raw data - Reporting of assembly output - CLC Genomics Workbench allows for three (3) types of output reporting:
- a) Assembly report: This will generate a summary report.
- b) List of non-assembled sequences: This will put all the reads that could Not be assembled into a sequence list.
- c) Table including all contigs: de novo assembly can potentially generate a lot of contigs, and this option creates a table which makes it easier to get an overview of all the contigs.
- The table includes the following information: Length of consensus sequence; Number of reads; Average coverage; and Total number of conflicts.
13) Trimming and filtering sequences - CLC Genomics Workbench offers a number of ways to trim and filter out sequence reads prior to assembly:
- a) Trim using quality scores;
- b) Trim using ambiguous nucleotides;
- c) Trim contamination from vectors in UniVec database;
- d) Trim contamination from saved sequences;
- e) Hit limit; and
- f) Discard reads below a certain length.
14) Single Nucleotide Polymorphism (SNP) detection - Instead of manually checking all the conflicts of a contig to discover significant single-nucleotide variations, CLC Genomics Workbench offers automated SNP detection.
The SNP detection in CLC Genomics Workbench is based on the Neighborhood Quality Standard (NQS) algorithm of [Altshuler et al., 2000] (also see [Brockman et al., 2008] for more information).
Based on your specifications on what you consider a valid SNP, the SNP detection will scan through the entire contig and report all the SNPs that meet the requirements.
15) Support for integration with the CLC Bioinformatics Database.
16) CLC Genomics Workbench is fully integrated with 'CLC NGS Cell', CLC bio’s command line solution for ‘super fast assembly’ of Next Generation Sequencing data.
The command-line interface of CLC NGS Cell enables the functionalities to be included in scripts and other Next Generation Sequencing work-flows.
CLC NGS Cell is utilizing SIMD instructions to parallelize and accelerate the assembly algorithms, making the program one of the fastest Next Generation Sequencing assembler at present.
Note: SIMD (Single Instruction, Multiple Data) is a technique employed to achieve data level parallelism, as in a vector processor.
System Requirements
CLC Genomics Workbench is available on Windows, Mac OS X, and Linux platforms.
Manufacturer
- CLC bio A/S
- Finlandsgade 10-12
- Katrinebjerg
- 8200 Aarhus N
- Denmark
- Main telephone: +45 70 22 32 44
- Sales: +45 70 22 55 09
- Fax: +45 70 22 55 19
- Main email: info@clcbio.com
- Support: support@clcbio.com
- Sales: sales@clcbio.com
Manufacturer Web Site CLC Genomics Workbench
Price Academic license US $4,995; Industrial license US $9,990. VAT number: DK 28 30 50 87
G6G Abstract Number 20277
G6G Manufacturer Number 100520