Abstract The BioLT Literature Mining Tool is a customizable tool for intuitive and structured text mining. BioLT combines biological and medical term dictionaries with advanced free-text querying capabilities. The tool provides comprehensive and structured answers to complex questions. Search results can be used for iterative refinement and extension of queries.

Products features/capabilities include:

Automatically generated expert knowledge -- The BioLT tool delivers clearly structured results with extraordinary recall and precision. The BioLT tool automatically generates comprehensive results comparable to the knowledge of expert scientists. The BioLT text- mining approach can work for many disease areas (such as cancer, cardiovascular, neurological and infectious diseases) and for additional biological research areas as well.

Text-mining technology -- In contrast to classical information retrieval systems, the BioLT software preprocesses the underlying text databases (such as scientific or patent information) with specific background information. The system first recognizes all chunks of text (phrases), special patterns for scientific notations and words belonging to terminology dictionaries. After the syntactic analysis, the system tries to determine the meaning of ambiguous terms.

To ensure the most complete results, potentially false meanings are marked, but are Not deleted from the knowledge database. The resulting text databases are manually curated by experts to create the thematic dictionaries used by the BioLT system. The BioLT tool uses the BioRS Integration and Retrieval System to add Boolean free-text search capabilities. Diverse analysis parameters including the scope of the search, the level of precision, and the resolution of terms with multiple meanings and the statistical representation of the results can be selected.

Integration into biological and medical project management -- The BioLT tool uses high-quality thematic dictionaries to identify relationships between research objects. The dictionaries can be extended and customized.

The following dictionaries are currently available:

1) Disease -- 260,000 entries.

2) Gene name -- 130,000 human gene names, including name variants.

3) Compound -- 82,000 entries.

4) Pathway -- 61,000 entries.

5) Organism -- 275,000 entries.

6) Other sub domains (e.g., polymorphism, therapy, tissues, cells).

These relationship data sets can be imported into the BioXM Knowledge Management Environment for further curation. With the upload, they are automatically integrated into a user defined biological or medical context. Thus, BioLT results become part of an efficient infrastructure even for large distributed R&D projects.

Additional features/capabilities include:

1) Complete and precise data mining from free text.

2) Advanced query language (using Boolean operators, dictionaries, wildcards and topics).

3) Automatic vocabulary generation for any domain.

4) Acronym detection tool for greater precision.

5) Several ranking methods (chi-square, cosine, etc.).

Flexible access --

1) Yearly subscription to the Biomax Web portal to extract information from the Medical Literature Analysis and Retrieval System Online (MEDLINE) database using any common Web browser.

2) Customized installation to extract information from other public or proprietary text sources.

