Galaxy

Category Cross-Omics>Workflow Knowledge Bases/Systems/Tools, Cross-Omics>Next Generation Sequence Analysis/Tools and Genomics>Genetic Data Analysis/Tools

Abstract Galaxy is an open web-based platform for genomic research. Galaxy automatically tracks and manages data provenance and provides support for capturing the context and intent of computational methods.

Galaxy Pages are interactive, web-based documents that provide users with a medium to communicate a complete computational analysis.

Galaxy is a popular, web-based genomic workbench that enables users to perform computational analyses of genomic data (as stated above...). The public Galaxy service makes analysis tools, genomic data, tutorial demonstrations, persistent workspaces, and publication services available to any scientist that has access to the Internet.

Local Galaxy servers can be set up by downloading the Galaxy application and customizing it to meet particular needs. Galaxy has established a significant community of users and developers.

Galaxy is the manufacturer's approach to building a collaborative environment for performing complex analyses, with automatic and unobtrusive provenance tracking, and use this as the basis for a system that allows transparent sharing of Not only the precise computational details underlying an analysis, but also intent, context, and narrative.

Galaxy Pages (see below...) are the principal means to communicate research performed in Galaxy. Pages are interactive, web-based documents that users create to describe a complete genomics experiment.

Galaxy Accessibility --

The most important feature of Galaxy's analysis workspace is what users do Not need to do or learn: Galaxy users do Not need to program nor do they need to learn the implementation details of any single tool.

Galaxy enables users to perform integrative genomic analyses by providing a unified, web-based interface for obtaining genomic data and applying computational tools to analyze the data. Users can import datasets into their workspaces from many established data warehouses or upload their own datasets.

Interfaces to computational tools are automatically generated from abstract descriptions to ensure a consistent look and feel.

Galaxy analysis workspace -

The Galaxy analysis workspace is where users perform genomic analyses. The workspace has four (4) areas: the navigation bar, tool panel, detail panel, and history panel.

The navigation bar provides links to Galaxy's major components, including the analysis workspace, workflows, data libraries, and user repositories (histories, workflows, Pages).

The tool panel lists the analysis tools and data sources available to the user.

The detail panel displays interfaces for tools selected by the user.

The history panel shows data and the results of analyses performed by the user, as well as automatically tracked metadata and user-generated annotations.

Every action by the user generates a new history item, which can then be used in subsequent analyses, downloaded, or visualized.

Galaxy's history panel helps to facilitate reproducibility by showing provenance of data and by enabling users to extract a workflow from a history, rerun analysis steps, visualize output datasets, tag datasets for searching and grouping, and annotate steps with information about their purpose or importance.

The Galaxy analysis environment is made possible by the model Galaxy uses for integrating tools. A tool can be any piece of software (written in any language) for which a command line invocation can be constructed.

To add a new tool to Galaxy, a developer writes a configuration file that describes how to run the tool, including detailed specification of input and output parameters. This specification allows the Galaxy framework to work with the tool abstractly, for example, automatically generating web interfaces for tools.

Although this approach is less flexible than working in a programming language directly (for researchers that can program), it is this precise specification of tool behavior that serves as a substrate for making computation accessible and addressing transparency and reproducibility, making it ideal for command-line averse biomedical researchers.

Galaxy Reproducibility --

Galaxy enables users to apply tools to datasets and hence perform computational analyses; the next step in supporting computational research is ensuring these analyses are reproducible.

This requires capturing sufficient metadata - descriptive information about datasets, tools, and their invocations (that is, a number of sequences in a dataset or a version of genomic assembly are examples of metadata) - to repeat an analysis exactly.

When a user performs an analysis using Galaxy, it automatically generates metadata for each analysis step. Galaxy's metadata includes every piece of information necessary to track provenance and ensure repeatability of that step: input datasets, tools used, parameter values, and output datasets.

Galaxy groups a series of analysis steps into a history, and users can create, copy, and version histories. All datasets in a history - initial, intermediate, and final - are viewable, and the user can rerun any analysis step.

Galaxy workflow editor -

Galaxy's workflow editor provides a graphical user interface (GUI) for creating and modifying workflows. The editor has four (4) areas: navigation bar, tool bar, editor panel, and details panel.

A user adds tools from the tool panel to the editor panel and configures each step in the workflow using the details panel.

The details panel also enables a user to add tags to a workflow and annotate a workflow and workflow steps. Workflows are run in Galaxy's analysis workspace; like all tools executed in Galaxy, Galaxy automatically generates history items and provenance information for each tool executed via a workflow.

A workflow is located next to all other tools in Galaxy's tool menu and behaves the same as all other tools when it is run. Workflows and all Galaxy metadata are integrated. Executing a workflow generates a group of datasets and corresponding metadata, which are placed in the current history.

Users can add annotations and tags to workflows and workflow steps just as they can for histories.

Galaxy Transparency --

Galaxy promotes transparency via three (3) methods:

1) A sharing model for Galaxy items - datasets, histories, and workflows - and public repositories of published items;

2) A web-based framework for displaying shared or published Galaxy items; and

3) Pages - custom web-based documents that enable users to communicate their experiment at every level of detail and in such a way that readers can view, reproduce, and extend their experiment without leaving Galaxy or their web browser.

Galaxy public repositories and published items -

Galaxy's sharing model, public repositories, and display framework provide users with means to share datasets, histories, and workflows via web links. Galaxy's sharing model provides progressive levels of sharing, including the ability to publish an item.

Publishing an item generates a link to the item and lists it in Galaxy's public repository. Published items have predictable, short, and clear links in order to facilitate sharing and recall; a user can edit an item's link as well. Users can search, sort, and filter the public repository by name, author, tag, and annotation to find items of interest.

Galaxy displays all shared or published items as web-pages with their automatic and user metadata and with additional links. An item's web-page provides a link so that anyone viewing an item can import the item into their analysis workspace and start using it.

The page also highlights information about the item and additional links: its author, links to related items, the item's community tags (the most popular tags that users have applied to the item), and the user's item tags. Tags link back to the public repository and show items that share the same tag.

Galaxy Pages -

Galaxy Pages are the principal means for communicating accessible, reproducible, and transparent computational research through Galaxy.

Pages are custom web-based documents that enable users to communicate about an entire computational experiment, and Pages represent a step towards the next generation of online publication or publication supplement.

A Page, like a publication or supplement, includes a mix of text and graphs describing the experimental analyses. In addition to standard content, a Page also includes embedded Galaxy items from the experiment: datasets, histories, and workflows.

These embedded items provide an added layer of interactivity, providing additional details and links to use the items as well.

Pages enable readers to understand an experiment at every level of detail. When a reader first visits a Page, they can read its text, view images, and see an overview of embedded items - an item's name, type, and annotation.

Should the reader want more detail, they can expand an embedded item and view its details. For histories and workflows, expanding the item shows each step; history steps can be individually expanded as well.

All metadata for both history and workflow steps are included as well.

Hence, a reader can view a Page in its entirety and then expand embedded items to view every detail of every step in an experiment, from parameter settings to annotations, without leaving the Page.

Pages also enable readers to actively use and reuse embedded items. A reader can copy any embedded item into their analysis workspace and begin using that item immediately.

This functionality makes reproducing an analysis simple: a reader can import a history and rerun it, or they can import a workflow and input datasets and run the workflow.

Once a history or workflow is imported from a Page, a reader can also modify or extend the analysis as well or reuse a workflow in another analysis. Using Pages, readers can quickly become analysts by importing embedded items and can do so without leaving their web browser or Galaxy.

Galaxy Application/Implementation --

The Galaxy Application is an application built using the Galaxy framework that provides access to tools through an interface (for example, a web-based interface) and provides features for performing reproducible computational research as described above. A Galaxy server, or Instance, is a deployment of this application with a specific set of tools.

Galaxy is implemented primarily in the Python programming language. It is distributed as a standalone package that includes an embedded web server and SQL (structured query language) database, but can be configured to use an external web server or database.

Regular updates are distributed through a version control system, and Galaxy automatically manages database and dependency updates. A Galaxy instance can utilize compute clusters for running jobs, and can be easily interfaced with a portable batch system (PBS) or Sun Grid Engine (SGE) clusters.

System Requirements

Web-based and contact manufacturer.

Manufacturer

Manufacturer Web Site Galaxy

Price Contact manufacturer.

G6G Abstract Number 20782

G6G Manufacturer Number 104358