I2E (Interactive Information Extraction)

Abstract Interactive Information Extraction (I2E) is a text search and mining system that can be used in diverse application areas such as systems biology, biomarker detection, safety/tox and drug repositioning in pharmaceutical R&D to key opinion leader, sentiment and competitive analysis for business intelligence.

I2E's advance Natural Language Processing (NLP) capabilities mine large collections of documents, extracting relevant facts and relationships from unstructured and semi-structured content such as scientific papers, internal project reports, patent documents, or news feeds.

Product features/capabilities include:

Searching in Context --

I2E makes it easy to plug-in domain-specific knowledge using thesauri, taxonomies or ontologies, which describe the concepts in a given field and their relationships. I2E is used with commercial, public domain and in-house knowledge bases.

The system is equally able to answer questions that are natural to a ‘life scientist’ who may be interested in biological targets, biomarkers, pathways and potential new therapeutics, as for the business analyst who wants to know the key trends and influencers in a particular market.

Versatility --

I2E is Not a point solution but offers a spectrum of capabilities that can be applied flexibly to address text search and mining challenges. Strategies range from simple document retrieval using keyword search to advanced information extraction approaches, including retrieval of facts, relationships and entities using linguistic structure. Thus users match the right approach to their specific task and combine complementary techniques to get the answers they need.

Getting Directly to the Answers --

I2E gets straight to high value, relevant, re-usable knowledge. Results are highly structured; enabling rapid analysis and export directly to spreadsheets like Excel or network visualizers like Cytoscape (see G6G Abstract Number 20092).

Researchers and information professionals get the flexibility and analytical rigor that they need. Tabular reports summarize the extracted information in a compact, clustered format with easy drill-down to supporting evidence, including direct links to source documents.

User Focus --

I2E is highly user-centric. The I2E Express interface is easy to use and similar in appearance to a conventional search engine. For more advanced use, I2E Pro lets the user view, construct, and manage sophisticated queries using an intuitive drag and drop interface. Optimized queries can then be saved and published for use by less experienced users. So that validated queries can be applied in a simple and repeatable way, for example to run regular queries on a growing database of literature.

Quality and Accuracy --

I2E’s advanced linguistic queries enable users to extract high value information. Searching with entities and concepts (e.g. gene, protein) is highly flexible. Linguistic wildcard features allow users to ask open questions and search for entities, verbs and unknown relationships. Searching using positional constraints can identify stronger relationships by finding terms that often appear together, for example within proximity of “n” words, or in the same sentence, or in a particular region of a document.

Unlike other systems, I2E is Not limited to searching inside a single sentence, but can also filter based on the broader context of the document. More distant and indirect relationships can be detected by combining results sets or visualizing information networks to find relationships across multiple documents.

Performance and Robustness --

I2E is scalable to millions of documents, such as the whole of Medline (Medical Literature Analysis and Retrieval System Online), and works with large documents such as patents. In addition I2E is able to combine large numbers of search terms. For example classes of genes, proteins, diseases or compounds that may contain up to tens of thousands of terms, can be encapsulated within a single “class” search term. Search performance is fast and customers report that discovering relevant information using I2E has been shown to be at least ten times faster than conventional keyword search-based approaches.

Control and Interactivity --

The flexibility, transparency, speed, and structured results delivered by I2E mean that users can refine queries on-the-fly. I2E makes NLP based text search and mining interactive and intuitive, putting power into the hands of the user themselves.

Integration --

I2E can play a key role embedding information extraction into more complex workflows. This can be achieved by integrating with analytic workflow software or by exporting results in data formats compatible with standard enterprise software systems.

