Online Quantitative Transcriptome Analysis (Oqtans)

Category Cross-Omics>Next Generation Sequence Analysis/Tools

Abstract Oqtans (Online Quantitative Transcriptome Analysis) is one of the first integrative online platforms for quantitatively analyzing RNA-Seq experiments.

It is based on the Galaxy-framework and provides tools for read mapping, transcript reconstruction and quantitation as well as differential expression analysis.

The current revolution in sequencing technologies allows the user to obtain a much more detailed picture of transcriptomes.

Studying them under different conditions or in mutants will lead to a considerably improved understanding of the underlying mechanisms of gene expression and processing.

An important prerequisite is to be able to accurately determine the full complement of RNA transcripts and to infer their abundance in the cell.

However, the analysis is made considerably more difficult by various limitations and biases in next-generation sequencing (NGS) technologies.

This Machine Learning powered platform for quantitatively analyzing RNA-seq experiments is integrated in the easy-to-use Galaxy framework (see below...) and builds on recent methods developed on the Max Planck Campus in Tübingen (Germany) for NGS sequence analysis:

1) PALMapper - PALMapper is a short read mapper which efficiently computes both unspliced and spliced alignments at high accuracy by taking advantage of base quality information and computational splice site predictions (see below...).

2) mTIM - mTIM is a machine learning-based transcript reconstruction method, which exploits features derived from RNA-seq read alignments and from computational splice sites predictions to infer the exon-intron structure of the corresponding transcripts (see below...).

3) rQuant - rQuant is a method based on ‘quadratic programming’ that simultaneously estimates biases inherent in library preparation, sequencing, and read mapping and accurately determines the abundances of given transcripts (see below...).

4) rDiff - rDiff is a set of statistical test techniques that determine significant differences between two RNA-Seq experiments to find differentially expressed regions with or without knowledge of the transcripts.

Galaxy framework --

Galaxy is an open, web-based platform for accessible, reproducible, and transparent computational biomedical research.

1) Accessible - Users without programming experience can easily specify parameters and run tools and workflows.

2) Reproducible - Galaxy captures information so that any user can repeat and understand a complete computational analysis.

3) Transparent - Users share and publish analyses via the web and create Pages, interactive, web-based documents that describe a complete analysis.

PALMapper --

PALMapper is a method that combines the spliced alignment method QPALMA with the short read alignment tool GenomeMapper.

PALMapper efficiently computes both spliced and unspliced alignments at high accuracy while taking advantage of base quality information and splice site predictions.

QPALMA that relies on a machine learning strategy is highly sensitive but suffers from its time consumption in the alignment step, which can be impractical for large genomes or extremely large introns.

To speed this up and thus, to improve efficiency, the manufacturer of PALMapper combined it with GenomeMapper that quickly carries out an initial read mapping, which will then guide a banded Semi-Global and spliced alignment algorithm that allows for long gaps that correspond to introns.

PALMapper considerably reduced time consumption without decreasing accuracy compared to QPALMA. In fact, according to the manufacturer, it runs around 50 times faster and hence allows you to align around 7 million reads per hour on a single AMD CPU core.

The maufacturer of PALMapper’s study for C. elegans furthermore showed that PALMapper predicts introns with very high sensitivity (72%) and specificity (82%) when using the annotation as ground truth.

mTiM (margin-based transcript mapping) --

mTiM is a machine learning-based transcript reconstruction method that exploits features derived from spliced and unspliced RNA-seq read alignments and from computational splice sites predictions to infer the exon-intron structure of the corresponding transcripts.

The inference technique used to train mTiM on RNA-seq data aligned in regions of well-annotated transcript structures are based on Hidden Markov Support Vector Machines (HMSVMs).

This machine learning technique is related to Hidden Markov Models, which are employed in many gene finding systems, but HMSVMs are trained using a discriminative, large-margin approach with a novel Bundle method for efficient parameter optimization.

Parameter learning in general and the discriminative training algorithm in particular have been shown to confer high noise tolerance in related applications.

In contrast to most gene finding systems, mTiM is strongly evidence-based and models only very few genic sequence motifs (only splice sites); whereas most gene finders are more strongly sequence-based with a much more complex model of genic sequence characteristics.

Most importantly, mTiM does Not require an open reading frame (it does Not model coding sequence at all) and is thus able to predict non-coding transcripts as well.

Unlike purely alignment-based methods, it can fill gaps in the read coverage, an advantage for predicting complete transcripts, in particularly for weakly expressed genes.

For instance, since read coverage is only one out of several features used to detect transcript boundaries, other features (such as splice site predictions) can help to distinguish introns that lack strong alignment support from intergenic regions.

rQuant - rQuant is a method based on ‘quadratic programming’ that simultaneously estimates biases inherent in library preparation, sequencing, and read mapping and accurately determines the abundances of given transcripts.

The rQuant method is based on quadratic programming (as stated above...). Given a gene annotation and position-wise exon/intron read coverage from read alignments, the manufacturer of rQuant determines the abundances for each annotated transcript by minimizing a suitable loss function.

rQuant penalizes the deviation of the observed from the expected read coverage given the transcript weights.

The observed read coverage is typically non-uniformly distributed over the transcript due to several biases in the generation of the sequencing libraries and the sequencing. This leads to distortions of the transcript abundances, if Not corrected properly.

The manufacturer of rQuant therefore extended their approach to jointly optimize transcript profiles, modeling the coverage deviations depending on the position in the transcript.

This method can be applied without knowledge of the underlying transcript abundances and equally benefits from loci with and without alternative transcripts.

All the above tools show very accurate results and perform better or on par with the state-of-the-art for short read alignments, transcript identification and quantification as well as differential expression analysis.

Their combination into an advanced workflow integrated in the Galaxy framework makes it possible to easily and effectively conduct a complete quantitative RNA-Seq analysis.

Oqtans can be accessed on the manufacturer’s server, locally or in the cloud.

System Requirements

Contact manufacturer.

Manufacturer

Manufacturer Web Site Online Quantitative Transcriptome Analysis (Oqtans)

Price Contact manufacturer.

G6G Abstract Number 20775

G6G Manufacturer Number 104352