Abstract The Gene Expression Collection allows you to process, analyze, visualize, annotate, and report on gene expression experiments, including the individual target genes. Core functionality is based on BioConductor - - the open source and open development software project for the analysis and comprehension of genomic data.

By harnessing the graphical protocol building capabilities of SciTegic Pipeline Pilot, the Gene Expression Collection allows you to construct complex BioConductor-based workflows without writing code, while also making it easy for you to couple Gene Expression analysis with other SciTegic Pipeline Pilot-based processes, such as sequence analysis (see G6G Abstract Number 20069) and reporting. Products features/capabilities include:

With the Gene Expression Collection you can:

1) Analyze and annotate gene expression experiments.

2) Use BioConductor tools without programming in R, the well-known public domain package for statistical computing and graphics on which BioConductor is based.

3) Easily create protocols to compare different analysis methods.

4) Integrate R/BioConductor analyses with analyses conducted with other SciTegic Pipeline Pilot Collections, including the R Statistics and Text Analytics Collections (see G6G Abstract Number 20063).

5) Create comprehensive reports containing elements from both R graphics and the Reporting Collection.

Experiment Readers -- Affymetrix® and Agilent experiments can be read and processed using several BioConductor packages (affy, affyPLM, plier, limma). Processing includes background correction, normalization, and summarization.

Excel experiment readers can import data directly from existing spreadsheets. GEO and SOFT readers enable processing of data from NCBI’s Gene Expression Omnibus (see G6G Abstract Number 20013).

Experiment Analysis -- Fold change and pairwise differential expression are quickly calculated for full experiments using Student’s t-test or Wilcoxon tests with optional control for multiple comparisons. Gene subsets can be defined based on calculations or annotations -- all without copying or cloning data.

Subsets can also be defined using generic components like Outlier Filter or Top N Filter. Set operations (e.g., union, intersection, and subtraction) can be performed on existing subsets. Clustering components, including hierarchical and k-means, are also provided.

Data Manipulators -- Gene expression experiments can be quite large. Once a subset of interesting genes has been identified, the irrelevant portions of the experiment can be removed from the experiment record, greatly speeding up downstream processing. Additionally, individual genes can be extracted, leading to easy integration with the Sequence Analysis Collection.

Results Annotators -- Annotations [e.g., descriptions, pathway identifiers, and Gene Ontology (GO) terms] can be added using vendor provided data via BioConductor’s annaffy package. In addition, annotations can be imported from existing flat files.

Viewers and Reporting Tools -- Use BioConductor graphics or create your own graphics and reports using SciTegic Pipeline Pilot’s Reporting Collection. While BioConductor provides sophisticated plots and charts, they can be difficult to configure and use.

The Gene Expression Collection makes it easy to use these standard BioConductor reporting capabilities, while also giving you the option to implement additional data views using the Reporting Collection. This gives you more control over display features, including hyperlinks and tooltips. Supplied components include heat maps, annotated tables, and parallel coordinate plots.

Integration with Other SciTegic Pipeline Pilot Collections -- The components in the Gene Expression Collection are designed to work seamlessly with components from existing collections, including the Sequence Analysis, R Statistics, Data Modeling, and Chemistry (via pathways) Collections.

