Fujitsu GhostMiner

Category Intelligent Software>Data Mining Systems/Tools and Intelligent Software>Neural Network Systems/Tools

Abstract GhostMiner is unique data mining (predictive modeling) software system from Fujitsu that Not only supports common databases (or spreadsheets) and mature machine learning algorithms, but also assists with data preparation and selection, model validation, multi-models like committees or k-classifiers, and visualization.

All of this and more is available in one package - a large range of data preparation techniques, a broad scope of selection of features methods and a choice of data mining algorithms and visualization techniques are integrated. This means that only one data format (project) is needed, and so trying out and comparing different approaches becomes extremely easy. The package also comes with an intuitive interface, which should make it easier to use even for non-technical users.

GhostMiner can be employed in a number of business intelligence areas and the following areas:

1) Bioinformatics - genetics, proteomics, Quantitative Structure Activity Relationship (QSAR) modeling.

2) Medicine - clinical and laboratory diagnostic systems, pharmacology, bioactivity of drugs, analysis of signals and images (with additional filters for image/signal processing).

3) Psychology and psychiatry - analytical support of psychometric tests and other questionnaire-based tests.

The GhostMiner system includes two (2) components: Developer and Analyzer.

GhostMiner Developer -- GhostMiner Developer is the tool for data- model designers and developers who, using databases, can train, test run and select useful models. The use of this system requires a good knowledge of statistical analysis and some knowledge of methods of computational intelligence.

Developer supports each step of the data mining process which is represented by a single 'project'. Each project is based on the analysis of data unique to a specific problem. The analysis starts from raw data and moves to encompass: Data selection; Data processing; Feature selection; Model learning; Model analysis; and Model selection. The project's output is a model of knowledge inherent in analyzed data. The model can then be used as a support tool in decision-making processes.

Data preprocessing (selection) -- In GhostMiner the information about the data is provided in two (2) ways.

First, purely statistical information about the data is given, such as:

1) The number of vectors;

2) The number of classes;

3) The number of features;

4) The number of vectors per class;

5) The minimum, average, and maximum values of each feature;

6) The variance of each feature; and

7) The number of missing values for each feature.

The data may be viewed in its original or pre-processed (standardized or normalized) form, ordering the vectors according to some feature values or filtering vectors that belong to selected classes only.

A second, quick evaluation of the data using charts and diagrams is also provided.

Feature selection -- The feature selection models facilitate manual and/or automatic feature selection from all the available scope of the features in the dataset. They are especially important when the numbers of features is huge and only some of them have actual impact for constructed models.

The following methods are implemented:

1) Manual feature selection;

2) Correlation coefficients based feature selection;

3) Feature selection wrappers where each classification model listed below could be "wrapped"; and

4) Feature selection committee.

Automatic feature selection methods provide the features ranking according to their importance.

Model learning -- There is No single algorithm that will achieve the best results on all data. For that reason GhostMiner provides several different types of data mining algorithms:

For classification --

1) Incremental Neural Network (IncNet) Neural Network;

2) Feature Space Mapping (FSM) Neurofuzzy System;

3) Separability Split Value (SSV) Decision Tree;

4) Support Vector Machine (SVM); and

5) k-Nearest Neighbors (kNN) algorithm.

For clusterization --

1) Dendrograms method; and

2) Support vectors clusterization.

Model analysis -- Sometimes even using the best single model for some questionable cases is difficult to classify. To improve the accuracy of the results and support the detailed analysis of such difficult cases GhostMiner has provided several enhancements:

1) Committees of models;

2) K classifiers; and Model Transform & Classify.

Model testing (selection) -- GhostMiner provides the following tools for evaluating models:

1) Cross-validation;

2) X-test; and

3) Confusion matrix.

GhostMiner Analyzer -- GhostMiner Analyzer is aimed at end users who are Not necessarily experts in computational techniques. The idea is to provide a simple tool for diagnosis, decision support or data classification that a medical doctor, a manager or a chemist would be able to use with ease.

Once models of the data have been created using GhostMiner Developer tools they are stored in project files. 'Analyzer' reads the project file and allows for a detailed evaluation of the results, including estimation of probabilities for different decisions, and visualization of new data in relation to the reference cases.

Data access -- GhostMiner reads and supports the following formats of data:

1) ASCII text files (including CSV files); 2) Excel spreadsheets; 3) Any database conforming to ODBC standard including MS Access, MS SQL Server and Oracle; and 4) Any database conforming to OLE DB standard including MS SQL Server, Informix, Oracle and many more.

System Requirements

GhostMiner is a desktop application which runs under Windows 2000/Windows XP

Minimum requirements:

Any Pentium or higher PC processor, 50 MB RAM, 65 MB disk space. The size of RAM needed is directly proportional to the size of the database. There is no explicit limit for the number of features or the number of cases in the database. The software is multi-threaded, thus multiple models can run in parallel even on a single processor.

Data access:

GhostMiner reads and supports the following formats of data:

ASCII text files (including CSV files)

Excel spreadsheets

Any database conforming to ODBC standard including MS Access, MS SQL Server and Oracle

Any database conforming to OLE DB standard including MS SQL Server, Informix, Oracle and many more

Manufacturer

Manufacturer Web Site Fujitsu GhostMiner

Price Contact manufacturer.

G6G Abstract Number 20154

G6G Manufacturer Number 101052