Abstract MedScan is a fully automated desktop system based on advanced natural language processing (NLP) technology, extracting information about protein functions and pathways from Medical Literature Analysis and Retrieval System Online (MEDLINE) abstracts and other scientific text sources. It uses dictionaries to identify individual biomedical terms (proteins, cellular processes, small molecules, diseases, etc.) referred to in literature articles, and applies advanced NLP techniques to detect the relationships within the article and extract these terms and the relationships; the overall process of detection, identification, extraction and assembling, is termed Information Harvesting.

Information extracted by MedScan represents the multiple aspects of protein function, including protein modification, cellular localization, protein-protein interactions, gene expression regulation, molecular transport and synthesis, as well as association with diseases, and regulation of various cellular processes. This scope can be broadened by modifying information extraction rules and the dictionaries. Dictionaries can be assembled on any topic or area that is represented in the literature you wish to harvest.

Product features/capabilities include:

1) The system is specifically targeted at the language of scientific abstracts, which results in high recovery rate and precision.

2) MedScan presents captured data as a datasheet, or as an intuitive pathway diagram - a snapshot of all information available in selected abstracts.

3) MedScan is implemented in C++ (a programming language), which makes it at least 100 times faster than other available NLP engines. It can translate 10,000 MEDLINE abstracts into a pathway diagram in less than a minute. As the result, MedScan can work in an on-line mode, browsing through MEDLINE abstracts and extracting information on the fly.

4) MedScan is superior to other literature mining tools as it uses a unique full sentence parsing algorithm, and therefore is Not limited to capturing word co-occurrence. MedScan understands the role and meaning of all words in a sentence and is capable of extracting functional associations, for example, the events of regulation between proteins, small molecules and pathways. MedScan recognizes Not only the types of regulatory mechanisms involved, but also the effects of regulation and experimental conditions.

5) MedScan offers customizable dictionaries and customizable information extraction rules and patterns.

6) MedScan provides multiple input formats: PubMed Extensible Markup Language (XML), Hypertext Markup Language (HTML), Microsoft Word, plain text, some forms of Portable Document Format (PDF), archives.

7) MedScan offers integration with Pathway Studio (see G6G Abstract Number 20020) software for visualization and analysis of the extracted information on a pathway diagram.

8) Product provides integration with PubMed and Google search engines.

