PROXIMITY
Category Intelligent Software>Data Mining Systems/Tools
Abstract PROXIMITY is a system for relational knowledge discovery.
Knowledge discovery is the extraction of non-trivial, previously unknown, and useful information from data. Relational knowledge discovery is knowledge discovery in relational data.
Furthermore, a relational data structure in PROXIMITY is a labeled, directed graph in which objects from the domain of discourse are connected by links that represent relationships between pairs of objects.
Both objects and links can have an arbitrary number of attributes.
PROXIMITY offers a more flexible data representation than many other approaches to knowledge discovery and machine learning.
PROXIMITY can help human analysts discover new knowledge by analyzing complex data sets about people, places, things, and events.
New developments in this area are vital because of the growing interest in analyzing the Web, social networks, telecommunications and computer networks, relational and object-oriented databases, multi-agent systems, and other sources of structured and semi-structured data.
PROXIMITY consists of novel algorithms that help manage, explore, sample, model, and visualize data.
PROXIMITY implements methods for learning statistical models that describe the probabilistic dependencies in relational data and can estimate probability distributions over unseen data.
PROXIMITY is an open-source application developed in Java, and it makes substantial use of MonetDB, an open-source, vertical database system, designed for high performance on semi-structured data.
PROXIMITY incorporates major research findings from the Knowledge Discovery Laboratory, including model corrections for statistical biases inherent in relational data such as autocorrelation and degree disparity, as well as the manufacturer's graphical query language.
PROXIMITY provides an open-source platform that can be used for both research into relational knowledge discovery and practical applications to real-world data.
PROXIMITY Features/capabilities:
1) High Performance --
PROXIMITY uses the MonetDB server (as stated above…), a fast, open-source vertical database.
MonetDB allows PROXIMITY to be orders of magnitude faster than systems hosted on SQL databases for the kinds of operations needed by relational knowledge discovery.
2) QGraph --
PROXIMITY’s graphical query language (QGraph) computes fast matches to high-level descriptions of relational data patterns.
A graphical editor supports interactive creation of QGraph queries.
3) Automatic Construction of Statistical Models --
PROXIMITY allows a user to easily construct statistical relational models from either the Java API or from Python scripts.
The models are trained using sets of labeled sub-graphs that can be created from QGraph queries.
Using either interface, models can also be applied to new (unlabeled) data.
Instead of only providing the most likely label, the manufacturer's models specify a probability distribution over the possible labels for each sub-graph.
PROXIMITY allows the user to evaluate the performance of the models using both accuracy and receiver-operator curves (ROC). These models can be saved and reloaded for later use.
- a) Relational Bayesian Classifiers -
- A relational Bayesian classifier (RBC) is a relational version of the simple Bayesian classifier.
- This classifier builds a probabilistic model of each attribute based on the attributes of surrounding objects and links. Although the RBC is a simple model, it performs quite well.
- b) Relational Probability Trees -
- A relational probability tree (RPT) selectively considers attributes of nearby objects and links as well as complex aggregates of these attributes to build a probabilistic model.
- c) Relational Dependency Networks -
- Relational dependency networks (RDNs) extend dependency networks to a relational setting.
- RDN models are a new form of probabilistic relational models that offer advantages over relational Bayesian networks (RBNs) and relational Markov networks (RMNs).
- Advantages of RDN models include an interpretable representation that facilitates knowledge discovery in relational data; the ability to represent arbitrary cyclic dependencies, including relational autocorrelation; and simple and efficient methods for learning both model structure and parameters.
4) Browser-Style Interface --
PROXIMITY provides users with an intuitive browser-style user interface, and with advanced database visualization tools.
5) XML and Text Import --
PROXIMITY supports simple, but flexible, XML and text formats for importing data from earlier versions of PROXIMITY, or from other databases or applications.
6) Python-Based Scripting --
All PROXIMITY operations that can be called directly from the manufacturer’s Java Application Programming Interface (API) can also be invoked by Python scripts or called from the graphical user interface (GUI) via the manufacturer's interactive interpreter.
7) Open Source --
All of PROXIMITY’s source code (written in Java) is included in the distribution.
8) Documentation --
The PROXIMITY distribution includes extensive written documentation including a tutorial and examples.
System Requirements
Contact manufacturer.
Manufacturer
- Knowledge Discovery Laboratory
- Department of Computer Science
- University of Massachusetts Amherst
- Amherst, MA 01003 USA
Manufacturer Web Site PROXIMITY
Price Contact manufacturer.
G6G Abstract Number 20701
G6G Manufacturer Number 104273