Microarray-MD

Category Genomics>Gene Expression Analysis/Profiling/Tools

Abstract Microarray-MD (Microarray Medical Diagnosis) is a novel microarray data analysis software system which utilizes gene expression data for medical diagnosis.

It implements a combination scheme of multiple Support Vector Machines (SVMs), which integrates a variety of gene selection criteria and allows for the discrimination of multiple diseases or subtypes of a disease.

The system can be trained and automatically tunes its parameters with the provision of pathologically characterized gene expression data to its input.

The major contribution of Microarray-MD is that it can provide physicians with substantial 'molecular-level information' by exploiting gene expressions.

The gene expression measurements are pre-processed and consequently used for the classification of the corresponding samples in two or more categories depending on their pathology.

Through the simple and practical GUI of the system novice users are offered the potential of using it with guidance provided by a helpful Wizard interface.

The system is capable of performing automatic tuning of its parameters, thus simplifying the microarray analysis process for both novice and expert users. Moreover, expert users are offered the options to tune all the relevant parameters of the algorithms applied for decision making in medical research.

Microarray-MD is a system capable of “learning” to recognize the pathology of samples provided to its input through a supervised training procedure.

It includes two (2) processing units, a Preprocessing Unit and a Decision Unit. The Pre-processing Unit prepares the gene expression data for passing into the Decision Unit, which is the main processing unit of the system.

The user may switch between two (2) modes of operation: the training and the testing mode. The training mode of operation requires a gene expression matrix of pathologically characterized samples as input.

During training the system organizes its internal structure and tunes its pre-processing and classification parameters for a given medical problem.

These parameters are then stored for use during the testing mode of operation. Given a patient’s gene expression vector, the trained system is able to classify it is based on 'prior knowledge' that has been encoded in the stored training parameters.

Pre-processing Unit -- The Pre-processing Unit handles the management of missing values as well as the normalization of the gene expression levels.

For the management of missing values - the manufacturer has incorporated (a) the row-average method, as it is simple and effective and (b) the k-nearest neighbors method (k-NN) which is more robust than the row-average method but requires more computations.

In addition to the estimation of missing values, the Preprocessing Unit incorporates data normalization methods which aim at the adjustment of the gene expression levels so that meaningful biological comparisons between different DNA microarray experiments can be made.

Two (2) normalization methods --

1) The first method normalizes the gene expression levels of each sample to conform to zero mean and unitary variance.

2) The second method normalizes the gene expression levels by subtracting its median and by dividing the result by its quartile range (the difference between the first and the third quartiles).

Decision Unit -- The Decision Unit handles medical problems as multi- class classification problems.

Gene selection modules - The gene selection modules of the Decision Unit integrate three ranking criteria for the selection of differentially expressed genes: 1) Prediction strength; 2) Welch’s t-test; and Sun’s et al. criterion.

Classification modules -- The classification module of the Decision Unit implements a binary SVM classifier.

SVM training involves a 'quadratic programming' optimization procedure which aims at the identification of a subset of important vectors from the training set, called support vectors. These vectors are utilized for the drawing of a separating hyper-surface between the two classes.

Graphical User Interface (GUI) -- The GUI of Microarray-MD has been designed mainly for scientists specialized in the field of medicine and biology. At the beginning of the program, the user is prompted to choose between the two (2) operating modes of the system.

On the user’s response, a window associated with the corresponding mode of operation is opened: the Training Window, for the training mode and the Testing Window for the testing mode.

The Training Window and the Wizard interface - The Training Windows consist of three input panels (Panel-1, Panel-2 and Panel-3), each of which can be used to select certain options and two output panels (Panel-4 and Panel-5) facilitating the presentation of the training results and the current status of the application.

Panel-1 - is provided for the management of input/output operations. The user can designate the location of an input file containing the desired gene expression matrix (GEM) training data.

The GEM file format is compatible with the tab-delimited pre-clustering file format (pcl) supported by the Stanford Microarray Database. Once a GEM file is loaded, Panels-2 and 3 are activated.

Panel-2 - contains graphical controls for the specification of the pre- processing parameters. The user may choose between the row- average and the k-NN methods for the imputation of missing values, and between the mean/variance and median/quartile range - based normalization methods.

Panel-3 - contains graphical controls for the specification of the classification parameters. The various classes as well as the distribution of the samples involved in the medical problem the system is intended to solve, are apposed in a list-box control.

Panel-4 - provides information on the status of the application, e.g. loading, training, etc., information related to the open GEM file, such as the total number of samples and the dimension of the gene expression vectors.

Panel-5 - after the training finishes, the results are printed in Panel-5 and can be saved for archiving purposes. The results include the classification performance of the system presented by means of ‘confusion matrix’ and average accuracy, and the optimal system configuration details as these occur by the almost unbiased 'leave-one- out parameter' tuning process.

Wizard interface - novice users are always provided with the option of training the Microarray-MD system by using the Wizard interface.

The Wizard provides a step-by-step interactive process accompanied by helpful information, and allows for the selection of key options.

The Testing Window -- Given a new patient’s gene expression data, a decision on the class they belong to, can be made through the Testing Window. The Testing Window requires only two (2) filenames as input.

1) The first filename corresponds to the file containing the system parameters produced as a result of the training process. This file also contains information that can be used to identify and retrieve the names of the genes selected by the gene selection module.

2) The second filename corresponds to the gene expression data of one or more patients as these are quantified by means of a DNA microarray image analysis software.

The user may proceed to the classification of the input gene expression data by just clicking on the “Decision” button. This action triggers the retrieval of the pre-processing and the classification parameters from the parameters file.

Then the internal structure of the 'Decision Unit' is automatically determined and the system is fed with the patient’s uncharacterized gene expressions as these are loaded from the gene expression measurements file.

The results are printed in the Results panel of the Testing Window. These include the decision made by the system as well as the probabilities of the input sample to belong to each of the classes the system was trained on.

These probabilities are based on the 'confusion matrix' obtained during the training process.

System Requirements

Demonstration Version for Windows 98/ME/NT/XP.

Manufacturer

Manufacturer Web Site Microarray-MD

Price Free Demonstration Version.

G6G Abstract Number 20301

G6G Manufacturer Number 102835