Catalogue of Somatic Mutations in Cancer (COSMIC)

Category Cross-Omics>Knowledge Bases/Databases/Tools

Abstract The Catalogue of Somatic Mutations in Cancer (COSMIC) curates comprehensive information on somatic mutations in human cancer.

COSMIC is designed to gather, curate, organize and present the world’s information on somatic mutations in cancer and make it freely available in a variety of useful ways, most easily accessible through its website.

All cancers arise as a result of the acquisition of a series of fixed DNA sequence abnormalities, mutations, many of which ultimately confer a growth advantage upon the cells in which they have occurred. There is a vast amount of information available in the published scientific literature about these changes.

COSMIC is designed to store and display somatic mutation information and related details and contains information relating to human cancers (as stated above...).

Some key features of COSMIC are:

1) Contains information on publications, samples and mutations. It includes samples which have been found to be negative for mutations during screening therefore enabling frequency data to be calculated for mutations in different genes in different cancer types.

2) Samples entered include benign neoplasms and other benign proliferations, in situ and invasive tumors, recurrences, metastases, and cancer cell lines.

The mutation data and associated information is extracted from the primary literature and entered into the COSMIC database.

In order to provide a consistent view of the data, histology and tissue ontology has been created and all mutations are mapped to a single version of each gene.

The data can be queried by tissue, histology or gene and displayed as a graph, as a table or exported in various formats.

How does COSMIC work?

Gene selection -

The manufacturers have assembled a list of genes that are somatically mutated in human cancer. From this list the manufacturers are selecting genes for entry into COSMIC with an emphasis on genes for which there are No existing databases.

Gene sequences -

All of the mutations in COSMIC are mapped to a single version of each gene sequence. The gene sequences are held in COSMIC and available in the Downloads section of the manufacturer’s website.

Selecting papers from the literature -

To identify papers reporting somatic mutations PubMed is broadly searched for papers containing relevant mutation data [example search: (ras OR genes, ras) AND human AND mutation].

Those identified from their abstracts to include somatic mutation information relating to cancer or pre-cancerous conditions are then selected for curating.

After examination of the information in the full text of the paper, the sample and mutation data are extracted.

Any papers containing incomplete data (e.g. mutations that are reported but Not fully described) or data of insufficient quality (e.g. errors identified in the data) are Not fully curated but are added to a list of “additional references containing somatic mutation information”.

Mutation frequency -

A central aim of COSMIC is to provide somatic mutation frequencies. These are available in the Main display windows. However, it is important to understand how they are calculated and the possible limitations of the data.

Has the sample been screened before?

There are examples where the same data is reported twice, perhaps in a follow-up study with reference to further data or as a positive control, for example using cell lines with known mutations. Where possible the manufacturers have noted sample names and within papers have removed any redundancy.

However, between papers it is Not possible to confirm two (2) samples with the same name are indeed the same sample.

The manufacturers have therefore included both samples and both results in COSMIC. If you want to review this information the sample name, mutation and paper reference are displayed in the ‘Mutation Details’ view.

What mutation detection method was employed?

Mutation screening methods differ in their sensitivity and the sensitivity of a particular method can vary from laboratory to laboratory.

Most methods identify all classes of small intragenic mutation (base substitutions and small insertions/deletions). However, the protein truncation test will Not detect mutations that cause missense amino acid substitutions.

Was the whole gene screened?

Some genes are characterized by mutation hot spots, for example BRAF, RAS and TP53. These genes are often screened for somatic mutations only in the region most likely to contain mutations.

This strategy will obviously miss mutations located elsewhere in the gene and hence will provide a distorted view of the distribution of mutations in the gene and perhaps underestimate the frequency of mutations.

Are all the mutations real?

For many putative somatic mutations that have been reported in the published literature, definitive evidence that they are somatically acquired (through demonstration of their absence in normal DNA from the same individual as the tumor) is Not available.

Therefore, occasional germline variants may have inadvertently been represented in publications as somatic mutations and entered in the database.

In addition, simple laboratory errors which result in an incorrect normal DNA sample (i.e. from a different individual) being analyzed as a control for a particular tumor sample may provide apparently persuasive, but misleading, evidence of somatic origin.

Finally, DNA amplification methods have an intrinsic error rate, and these errors may subsequently be interpreted as somatic mutations. There is some evidence that this may be a particular problem in analyses of archival formalin-fixed, paraffin embedded material.

COSMIC Classification system --

The classification of tumor types and subtypes with somatic mutations in the published literature is extremely variable. Classification systems and terminologies differ between reports and indeed may have changed over time.

Rather than simply entering a neoplasm using the term employed in the published report, COSMIC uses its own internal classification system to provide tissue and histology consistency within the database and reduce redundancy.

The tissue and histology information in the reviewed papers is translated using the COSMIC classification system before entry into the database.

It is possible that in some instances the manufacturers have misunderstood terminology and hence misclassified mutations. Moreover, some users may Not favor the manufacturers classification.

In general, however, the manufacturers have aimed to retain as much useful information as possible, while still providing a relatively simple classification with generally understood terminology.

The COSMIC classification system is available as a tab delimited text or Excel file in the Download section of the manufacturer’s website. Every sample is defined by both tissue and histology.

