Abstract SEQUEST is a method for performing protein identification & peptide sequencing by utilizing mass spectrometry fragmentation patterns to search protein and nucleotide databases.

SEQUEST converts the character-based representation of amino acid sequences in a protein database to fragmentation patterns which are compared against the tandem mass spectrometry (MS/MS) spectrum generated on the target peptide.

The algorithm initially identifies amino acid sequences in the database that match the measured mass of the peptide, compares fragment ions against the MS/MS spectrum, and generates a preliminary score for each amino acid sequence.

A cross correlation analysis is then performed on the top 500 preliminary scoring peptides by correlating theoretical, reconstructed spectra against the experimental spectrum. Output results are displayed accordingly.

In short, SEQUEST performs automated peptide/protein sequencing via database searching of MS/MS spectra without the need for any manual sequence interpretation, though it can make use of interpreted sequence information if available.

The following is additional software that augments SEQUEST.

DTASelect and Contrast -- were designed to make interpretation and comparison of proteomic data faster and more effective.

DTASelect organizes and filters SEQUEST identifications, reducing the time required to interpret the results for each sample. Contrast differentiates multiple samples and comprises an advanced meta- analytical tool.

DTASelect - SEQUEST is very good at matching uninterrupted tandem mass spectra to database peptide sequences. DTASelect is designed to reassemble this peptide information into usable protein information.

The program can do more than simply report the protein content of a sample; it features many customizable filters to specify which identifications should be kept and which discarded.

It also features reports to investigate post-translational modifications, align sequences of peptides to identify poorly sequenced regions, and analyze chromatography efficiency.

The software makes the process of analyzing SEQUEST results far faster and more consistent than possible before, even for data sets containing a million spectra or more. By automating SEQUEST analysis, DTASelect enables experiments of far greater scope.

Contrast - Differentiating biological samples by protein content is an important application of proteomics. The Contrast program uses the filters present in DTASelect to highlight the most important identifications of samples and then compares them.

Unlike most relatively simple comparison algorithms, Contrast can differentiate up to 63 different samples at once. Contrast handles differential analysis between experimental and control samples simply and flexibly.

Census -- is a software tool that facilitates automated quantitative analysis using either stable isotope labeling or an isotope free strategy.

Using high-resolution and high mass accuracy data from an linear trap quadrupole (LTQ)-Orbitrap hybrid mass spectrometer as input for Census, the manufacturer was able to quantify roughly three (3) times as many peptides as previous used software (i.e., RelEx - developed by the Yates Lab).

While some of the increase can be attributed to the benefits inherent to the instrumentation, improvements in Census are also responsible.

One of the reasons for the increase in accurately quantified peptides is that Census minimizes the contributions of interfering peaks and chemical noise by taking advantage of the high mass accuracy of the Orbitrap using a small mass accuracy tolerance for each isotopic peak.

In addition, a dynamic peak finding algorithm is employed that makes use of database search results for improved accuracy and quantification efficiency.

Finally, a weighted means of the peptides are calculated to determine the protein ratios.

ProLuCID -- is a fast and sensitive tandem mass spectra-based protein identification program. This algorithm uses a binomial probability as a preliminary scoring scheme to select candidate peptides for final scoring.

The binomial probability scores generated by ProLuCID have No significant molecular weight bias and are independent of database size.

The final scores are computed using a modified cross-correlation function which models isotopic distributions of fragment ions of candidate peptides, which ultimately results in higher sensitivity and specificity than that obtained with SEQUEST.

In addition, ProLuCID takes advantage of high resolution MS/MS which significantly improves specificity when compared to low resolution tandem MS data.

GutenTag -- is software to identify peptides by the 'sequence tagging' technique. SEQUEST searches a sequence database by mass, but GutenTag searches with short sequences derived directly from the spectrum.

The technique, called "sequence tagging," infers a short region of sequence (the tag) directly from the spectrum. Then it searches a sequence database for sequences which match this sequence tag and the flanking masses.

GutenTag automates the process of sequence tag inference with a more accurate model of fragment ions than found elsewhere. It can retain multiple tags for each spectrum, ranking them by their scores.

It can search a sequence database for all of these tags in a single pass. It can evaluate the candidate sequences returned from the database to determine which the correct match for each spectrum is. In short, GutenTag makes the sequence tagging approach usable on real-world data.

RawExtractor -- is a software program to extract MS and MS/MS spectra from RAW files generated by Thermo mass spectrometers, such as LTQ, LTQ-Orbitrap, LCQ, and stores the spectra in ms1, ms2 or mzXML file format.

The spectra files generated by RawExtractor program are used as input for protein identification programs SEQUEST, ProLuCID and quantitatation program Census.

MudPIT (Multidimensional Protein Identification Technology) -- is a technique for the separation and identification of complex protein and peptide mixtures.

Rather than use traditional 2D gel electrophoresis, MudPIT separates peptides in 2D liquid chromatography. In this way, the separation can be interfaced directly with the ion source of a mass spectrometer.

Technique - MudPIT uses columns consisting of strong cation exchange (SCX) material back-to-back with reversed phase (RP) material inside fused silica capillaries.

The chromatography proceeds in cycles, each comprising an increase in salt concentration to "bump" peptides off of the SCX followed by a gradient of increasing hydrophobicity, to progressively elute peptides from the RP into the ion source.

The mass spectrometer's data-dependent acquisition isolates peptides as they elute and subjects them to Collision-Induced Dissociation, recording the fragment ions in a tandem mass spectrum.

These spectra are matched to ‘database peptide sequences’ by the SEQUEST algorithm. SEQUEST's peptide identifications are assembled and filtered into protein-level information by the DTASelect algorithm.

DFCalc (DNA Fragment Calculator) -- is software designed to assist the interpretation of tandem mass spectra from DNA molecules. The program predicts the fragment ions for DNA known sequences, producing a list to be compared against a spectrum.

DFCalc allows the user to select which classes of ions are included in the prediction (selecting among A, B, C, D, W, X, Y, and Z ions) as well as which variants of the 5' ions are included (base losses or water losses).

Ions resulting from double fragmentations can also be predicted using this software.

