Kepler

Category Cross-Omics>Workflow Knowledge Bases/Systems/Tools

Abstract Kepler is a software application for analyzing and modeling scientific data.

Kepler is designed to help scientists, analysts, and computer programmers create, execute, and share models and analyses across a broad range of scientific and engineering disciplines.

Using Kepler's graphical interface and components, scientists with little background in computer science can create executable models, called “scientific workflows”, for flexibly accessing scientific data (streaming sensor data, medical and satellite images, molecular biology, simulation output, observational data, etc.) and executing complex analyses on this data.

The software builds upon the mature Ptolemy II framework, developed at the University of California, Berkeley. Ptolemy II is a software framework designed for modeling, design, and simulation of concurrent, real-time, embedded systems.

Ptolemy II is a Java-based component assembly framework with a graphical user interface (GUI) called Vergil. Kepler inherits modeling and design capabilities from Ptolemy, including the Vergil GUI and workflow scheduling and execution capabilities.

Kepler also inherits from Ptolemy the ‘actor-oriented modeling’ paradigm that separates workflow components (“actors”) from the overall workflow orchestration (conducted by “directors”), making components more easily reusable.

Through the actor-oriented and hierarchical modeling features built into Ptolemy, Kepler scientific workflows can operate at very different levels of granularity, from low-level “plumbing workflows” (that explicitly move data around, start and monitor remote jobs, for example) to high-level “conceptual workflows” that interlink complex, domain-specific data analysis steps.

Kepler extensions to Ptolemy include an ever increasing number of components aimed particularly at scientific applications, e.g., for remote data and metadata access, data transformations, data analysis, interfacing with legacy applications, Web service invocation and deployment, provenance tracking, etc.

Target application areas include bioinformatics, cheminformatics, ecoinformatics, and geoinformatics workflows, among others.

Kepler features/capabilities include:

1) Kepler is freely available under the BSD License.

2) Kepler provides a graphical user interface (GUI) and a run-time engine that can execute workflows either from within the graphical interface or from a command line.

3) Kepler workflows can be nested, allowing complex tasks to be composed from simpler components, and enabling workflow designers to build re-usable, modular sub-workflows that can be saved and used for many different applications.

4) Kepler workflows can leverage the computational power of grid technologies (e.g., Globus, SRB, Web and Soaplab Services), as well as take advantage of Kepler’s native support for parallel processing.

5) Kepler workflows and customized components can be saved, reused, and shared with colleagues using the Kepler archive format (KAR).

6) Kepler ships with a ‘searchable library’ containing over 350 ready-to- use processing components (actors) that can be easily customized, connected and then run from a desktop environment to perform an analysis, automate data management, and integrate applications efficiently. Highlights include:

7) Kepler's Component Repository provides a centralized server where components and workflows can be uploaded, downloaded, searched and shared with the community or designated users.

8) Using one of several actors specifically designed to ingest and output data, a wide variety of data sources can be accessed and used by workflows.

9) Currently, Kepler has support for data described by Ecological Metadata Language (EML), data accessible using the DiGIR protocol, the OPeNDAP protocol, GridFTP, JDBC, SRB, and others.

10) Kepler provides direct access to the EarthGrid, a distributed network providing scientists access to ecological, biodiversity, and environmental data and analytic resources.

Kepler Case Studies - Molecular Biology --

1) The Kepler project in conjunction with the Scientific Process Automation (SPA) project has developed a set of special “bio-services” actors that allow the scientist to invoke standard tools such as BLAST or TRANSFAC (see G6G Abstract Number 20121) locally or remotely as web services.

2) The Promoter Identification Workflow (PIW) links genomic biology techniques such as microarrays with bioinformatics tools such as BLAST to identify and characterize eukaryotic promoters.

Starting from microarray data, cluster analysis algorithms are used to identify genes that share similar patterns of ‘gene expression profiles’ which are then predicted to be co-regulated as part of an interactive biochemical pathway.

Given the gene-ids, gene sequences are retrieved from a remote database (e.g., GenBank) and fed to a tool (e.g., BLAST) that finds similar sequences. In subsequent steps, transcription factor binding sites and promoters are identified to create a promoter model that can be iteratively refined.

Kepler Sample Demo Workflow - Web Services Workflow --

The Web Services Workflow uses Kepler's Web Service actor to invoke a genomics data web service, which accesses and queries a remote genomics database and returns a genetic sequence. The name of the sequence (i.e., the gene accession number) is passed to the Web Services actor by a String Constant actor.

After the service has executed, the Web Service actor outputs the retrieved gene sequence so that it can be displayed in multiple formats using three (3) different Display actors:

One for XML (the format in which the results are returned by default); one for a sequence of elements extracted from the XML; and one for an HTML document that can be displayed on a website.

In addition, the workflow uses a fourth Display actor to display errors returned by the remote server (e.g., server down or incorrect input).

Composite actors - Both the Sequence Getter Using XPath and HTML Generator Using XSLT are composite actors composed of other actors that together perform a function (e.g., convert XML into a sequence of elements and an HTML file, respectively).

Composite actors hide some of the complexity of underlying operations, and also permit the operations to be easily reused in other workflows.

System Requirements

Kepler is a large application that has substantial hardware requirements. These include 512MB of RAM (1 GB or more recommended), at least 300 MB of disk space, and at least a 2GHz CPU. Kepler runs on modern Windows, Macintosh (OS X), and Linux systems using Java 1.5.x or greater.

Manufacturer

Manufacturer Web Site Kepler

Price Contact manufacturer.

G6G Abstract Number 20527

G6G Manufacturer Number 104144