LPA Data Mining Toolkit

Category Intelligent Software>Data Mining Systems/Tools

Abstract The LPA Data Mining Toolkit is a collection of routines, supplied in the form of an Application Programming Interface (API), which support the discovery of rules and patterns within relational databases such as Access, Oracle, SQL Server etc.

Products features/capabilities include:

Three (3) Phases of Discovery --

The key stages of data mining as supported by the LPA Data Mining toolkit are:

1) Selection of Data Source: The first stage is to select the appropriate data for analysis. The toolkit assumes that all joins and exclusions and views are actioned outside of the toolkit. The toolkit works off a single table. You need to designate which columns are to be included in the search and discovery stage.

2) Constructing a Target: This can be a simple formula like 'customers who renewed maintenance' or a compound formula like 'people with salary greater than 25K and owner-occupier' which represents what you are interested in. This formula, or target expression, is used to drive the data mining investigation.

3) Discovering Patterns: The ‘main stage’ of the data mining process, and is where the patterns in the data are sought. All other values, in all designated columns are analyzed to see which values and value ranges contribute towards the target more than might normally be expected.

The LPA Data Mining toolkit contains:

1) API Routines: A collection of routines for dealing with the three (3) phases of discovery (data mining) as described above. These routines are presented as Prolog predicates [Prolog is a logic programming language. It is a general purpose language often associated with artificial intelligence (AI) and computational linguistics] and can be combined with most of all the other LPA products and features.

The routines often return lists and structures which can be manipulated easily by the Prolog developer. This makes the toolkit an ideal basis for Prolog application developers to build their own data mining applications or to include a data mining component within their existing applications.

2) Source Code Example: A fully documented source code example is supplied which shows you how to build an interactive data mining oriented application using the API routines and a set of dialogs designed using the Dialog Editor utility which comes with LPA Prolog (an additional advanced product from LPA).

3) Sample Data Mining Application: A stand-alone, point-and-click desktop application, based on the source code example described above is supplied which you can use 'out-of-the-box' to demonstrate and explore the data mining concepts described here.

Run-time Deployment and Application Deployment --

The LPA Data Mining toolkit can be integrated with most of all the other LPA products and technology. By combining the LPA Data Mining toolkit and the Intelligence Server (an additional advanced product from LPA), it is possible to present the data mining toolkit as a Component Object Model (COM) object for embedding within, say, a Visual Basic (VB)- oriented application.

By combining with ProWeb (an additional advanced product from LPA), it is possible to develop a web-based data mining application.

How the LPA Data Mining toolkit works --

For any given target, the LPA Data Mining toolkit will count each row to determine how important and how much influence each column exerts on the target. The result is an ordered list of elementary conditions which are deemed to be influential. The LPA Data Mining toolkit then lets you explore how well these atomic conditions combine in terms of producing 'candidate rules'.

What a Candidate Rule is --

Results are generated in the form of 'IF-THEN' rules, several of which might be formed about the same target statement.

For example:

IF "PurposeOfLoan" = "NewCar"

AND "StatusSex" = "SingleFemale"

THEN "LoanApproved" = 1

Associated with each 'candidate rule' are statistics about truth, sometimes referred to accuracy, and coverage and significance.

Truth% = 33.33

Hit% = 15.33

Base% = 13.40

Significance% = 14.43

Entropy = -3.18

Performance --

The LPA Data Mining toolkit generates large volumes of Structured Query Language (SQL) queries to analyze the database. By utilizing the performance of the database engine, the LPA Data Mining toolkit offers a truly scaleable and robust architecture.

What is required --

The LPA Data Mining toolkit uses Open Database Connectivity (ODBC) and SQL to query databases. You need to ensure that you have the correct ODBC drivers installed and that you have set up your data files as data sources.

Integration with WIN-PROLOG and its Toolkits --

WIN-PROLOG (an additional advanced product from LPA) is the central product in a series that consists of programming tools that work cross- platform on Windows XP, 2000, NT, ME, 98 and 95; the series also includes Flex (see G6G Abstract Number 20185), Flint (see G6G Abstract Number 20186), the CBR toolkit (an additional advanced product from LPA) and the ProData Database Interface toolkit (an additional advanced product from LPA).

The Windows series uses incremental compilation of user programs to provide the execution speed of a compiler but with the interactive behavior of an interpreter. This allows for the in-line debugging and editing of programs.

System Requirements

Manufacturer

Manufacturer Web Site LPA Data Mining Toolkit

Price Contact manufacturer.

G6G Abstract Number 20188

G6G Manufacturer Number 101711