KNIME

Category Cross-Omics>Workflow Knowledge Bases/Systems/Tools

Abstract KNIME (KoNstanz Information MinEr), pronounced [naim], is a modular data exploration platform that enables the user to visually create data flows (often referred to as pipelines or workflows), selectively execute some or all analysis steps, and later investigate the results through interactive views on data and models.

The KNIME base version already incorporates over 100 processing nodes for data Input/Output (I/O), preprocessing and cleansing, modeling, analysis and ‘data mining’ as well as various interactive views, such as scatter plots, parallel coordinates and others.

It integrates all analysis modules of the well known Weka data mining environment (see below...) and additional plug-ins allow R-scripts to be run, offering access to a vast library of statistical routines.

Weka - Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code.

Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new ‘machine learning’ schemes (see G6G Abstract Number 20534).

KNIME is based on the Eclipse platform (see below...) and, through its modular Application Programming Interface (API), is extensible.

When desired, custom nodes and types can be implemented in KNIME within hours thus extending KNIME to comprehend and provide first-tier support for highly domain-specific data.

This modularity and extensibility permits KNIME to be employed in ‘commercial production environments’ as well as teaching and research prototyping settings.

Eclipse - Eclipse is an open source community, whose projects are focused on building an open development platform comprised of extensible frameworks, tools and runtimes for building, deploying and managing software across the software development lifecycle.

KNIME Features/capabilities --

One key feature behind the success of KNIME is its inherent ‘modular workflow’ approach, which documents and stores the analysis process in the order it was conceived and implemented, while ensuring that intermediate results are always available.

Core KNIME features include:

1) Scalability through sophisticated data handling (intelligent automatic caching of data in the background while maximizing throughput performance).

2) Highly and easily extensible via a well-defined API for plug-in extensions.

3) Intuitive user interface.

4) Import/export of workflows (for exchanging with other KNIME users).

5) Parallel execution on multi-core systems.

6) Command line version for “headless” batch executions.

New KNIME feature:

KNIME Report Designer - You can now use KNIME's advanced workflow editor to load, preprocess, transform and analyze your data, and the Report Designer plug-in to create sophisticated reports including individually formatted charts, tables and other data overviews.

The KNIME Report Designer is based on BIRT, the Business and Reporting Tools Eclipse project, therefore enabling access to the wealth of reporting tools from this environment.

Available KNIME modules cover a vast range of functionality, such as:

1) I/O: retrieves data from files or databases.

2) Data Manipulation: pre-processes your input data with filtering, group- by, pivoting, binning, normalization, aggregation, joining, sampling, partitioning, etc.

3) Views: inspects the data and results with several interactive views, supporting interactive data exploration.

4) Highlighting: ensures highlighted data points in one view are also immediately highlighted in all other views.

5) Mining: uses state-of-the-art Data Mining algorithms like Clustering, Rule induction, Decision Tree, Association Rules, Naïve Bayes, Fuzzy Rules, K Nearest Neighbor, Multi-dimensional Scaling, Neural Networks (NNs), Support Vector Machines (SVNs), etc. to better understand your data.

KNIME Applications --

A few example applications of KNIME are listed below to illustrate some of its capabilities:

1) Cell Miner -

Note: Experimental/Internal use - Not all nodes are part of the KNIME distribution.

KNIME has been used to analyze cell images. A new data cell for images has been integrated and a picture file reader node added to the repository. A segmenter node has been implemented to locate cells in the images.

Multiple feature extraction nodes were used to extract data for the classifier algorithm. The learner interactively adapts to the different cell types and subsequently classifies huge numbers of images.

For additional info see the publication: Nicolas Cebron, Michael R. Berthold, Adaptive Active Classification of Cell Assay Images, Knowledge Discovery in Databases: PKDD 2006 (PKDD/ECML, Berlin, Germany), vol. 4213, pp. 79-90, Springer Berlin / Heidelberg, 2006.

2) Virtual High Throughput Screening --

KNIME has also been successfully applied to vHTS data (Virtual High Throughput Screening). The challenge of processing huge amounts of data [several gigabytes (GB)] is mastered by most KNIME nodes without any difficulties.

The results of predicting the activity of yet untested compounds can be visualized by the Enrichment Plotter, for example, which was specially developed for this purpose.

Another useful tool for inspecting the data is the so-called neighborgram, where the neighborhood of data points labeled as “active” is shown. (The neighborgrams exist as an additional feature for KNIME and can be downloaded). The colors of the points indicate the activity of the molecules represented by the data points.

First set of Enterprise products for KNIME --

KNIME Cluster Execution -

KNIME Desktop already has built-in support for Multi Core Machines, but KNIME Cluster Execution now extends this feature far beyond the limits of a single machine. Using KNIME Cluster Execution, workflows can be executed on the existing cluster infrastructure.

Entire workflows can be distributed while fine grain control allows individual nodes to be assigned to dedicated cluster computers or the time-consuming execution of one node to be split across several cluster computers.

KNIME Report Server -

KNIME Report Server is based on KNIME Server technology. E.g. in combination with KNIME Server Lite power users can store KNIME workflows and report templates on a central server and end users can access them via their web browser and launch the report creation.

The final report is then exported to the users PC in the desired format. This provides KNIME workflow and reporting capabilities to any user in your organization.

KNIME Server Lite -

KNIME Server Lite provides basic workflow-sharing functionality. Users can access a central repository to upload or download workflows. The workflows on the server can be executed remotely and the client can be disconnected for long running tasks.

System Requirements

Manufacturer

Manufacturer Web Site KNIME

Price Contact manufacturer.

G6G Abstract Number 20533

G6G Manufacturer Number 104149