The

Team

Data Mining, Machine Intelligence and Inductive Programming

Members:

Bella-Sanjuán,Antonio
Castillo-Andreu, Hèctor
Contreras Ochando, Lidia
Estruch-Gregori, Vicent
Fabra-Boluda, Raul
Ferri-Ramírez, Cèsar
Hernández-Orallo, José
Insa-Cabrera, Javier
Martínez-Usó, Adolfo
Martínez-Plumed, Fernando
Nieves-Cordones, David
Ramírez-Quintana, M.José
Silva Palacios, Daniel Andres

a subgroup of Extensions of Logic Programming Group.

Presentation:

The DMIP team began in 1997, and consolidated as a group in 2000, with the initial goal of extending ILP to other declarative languages. Since then, the group has been acquiring a broader view of the field, exploring different techniques and applications, focussing on the evaluation of machine learning and machine intelligence systems, so spanning in many areas of machine learning, data mining and artificial intelligence.

Currently, the research areas are:

Machine Learning and Data Mining
Knowledge Discovery
Multi-paradigm Inductive Programming
ROC analysis, cost-sensitive learning and model evaluation for decision support
Agreement Technologies.
Agent Intelligence Evaluation.
MML induction and Solomonoff prediction.
Probabilistic (inductive) programming.
Inductive Debugging.
Universal Psychometrics.

Nonetheless, the primary interest is still the learning of comprehensible or declarative models from data and the understanding of system performance.

For a more comprehensive account of the team's activities and projects, you can take a look a this presentation (as for 2009) .

Learning Systems and other software:

We have developed three learning systems:

The FLIP system (1998-2001, click here for downloading the software and for more information): implements a framework for the Induction of Functional Logic Programs (IFLP) from facts. This can be seen as an extension to the now consolidated field of Inductive Logic Programming (ILP). Inspired in the inverse resolution operator of ILP, the system is based on the reversal of narrowing, the more usual operational mechanism for Functional Logic Programming. The main advantages of the FLIP system over the most used ILP systems are a natural handling of functions, without the use of mode or determinism declarations, and its power for inducing short recursive programs. Its applications are mainly program synthesis, program debugging and data mining of small highly structured documents.

The SMILES system (2001-2002, click here for downloading the software and for more information): a machine learning system that integrates many different features from other machine learning techniques and paradigms and, more importantly, it presents several innovations in almost all of these features. In particular, SMILES extends classical decision tree learners in many ways (new splitting criteria, non-greedy search, new partitions, extraction of several and different solutions), it has an anytime handling of resources, and has a sophisticated and quite effective handling of costs. In this way, SMILES combines and improves the recent interest in hypotheses combination (e.g. boosting) and cost-sensitive learning (a priori and a posteriori class assignments, ROC analysis) outperforming previous systems in many situations. Its applications are basically data-mining and any other machine learning task where decision trees could be useful.

The DBDT system (2004-2010, click here for downloading the software and for more information): is a machine learning algorithm that integrates decision tree learning and center splitting. Roughly speaking, the inferred classifer can be viewed as a tree of attribute prototypes (The value distribution of an attribute is represented by a set of prototypes.). An instance is linked to one prototype or other depending on its proximity.

A machine intelligence testing framework system (2010-2012), click here for downloading the software. For more information, see the project Anytime Universal Intelligence .

A RLGGP system (2011-2012), a system that integrates reinforcement learning and General Game Playing, click here for more information.

A MML-LPP-Cost system (2011-2012), a system for coding logic programs with probabilities using MML, click here for more information.

Newton Trees, stochastic distance-based decision trees (2010-2012), click here for more information.

Different threshold-choice methods in cost space: Scripts in R for Brier Curves and Rate-driven Curves (2011-2012).

gErl, a general learning system (2012-), click here for more information.

RROC: Scripts in R for ROC Curves for Regression (2013).

Reg2Class: Scripts in R for Regression to classification tasks (2014).

SBA: Script in R with calibration techniques for multi-class problems R script. More information : "On the Effect of Calibration in Classifier Combination."(2014).

AirVLC: An application for producing real-time urban air pollution forecasts for the city of Valencia in Spain AirVLC (2015-).

Multidimensional: With this software, multidimensional data is systematically analysed at multiple granularities by applying aggregate and disaggregate operators (e.g., by the use of OLAP tools). We show how these strategies behave when the resolution context changes, using several machine learning techniques in four application domains (link)(2015)

Coverage Graphs : Knowledge acquisition with forgetting: an incremental and developmental rule-based setting (Expermiments & Code in GitHub). (2014-2016)