Feature discovery for data mining

Valle, Manuel del; Sánchez, Beatric; Lago Fernández, Luis Fernando; Corbacho Abelaira, Fernando

UAM_Biblioteca

Author

Valle, Manuel del; Sánchez, Beatric; Lago Fernández, Luis Fernando

; Corbacho Abelaira, Fernando

Entity

UAM. Departamento de Ingeniería Informática

Publisher

Roberto Ruiz; José C. Riquelme; Jesús S. Aguilar-Ruiz

Date

2005

Citation

TAMIDA2005: Actas del III Taller Nacional de Minería de Datos y Aprendizaje. Granada: Ed. Roberto Ruiz, José C. Riquelme, Jesús S. Aguilar-Ruiz, 2005. 107-113

ISBN

84-9732-449-8

Editor's Version

http://www.lsi.us.es/redmidas/CEDI/papers/646.pdf

Subjects

Informática

URI

http://hdl.handle.net/10486/665620

Note

This is an electronic version of the paper presented at the III Taller de Minería de Datos y Aprendizaje, held in Granada on 2005

Rights

Abstract

In most problems of Knowledge Discovery the human analyst previously constructs a new set of features, derived from the initial problem input attributes, based on a priori knowledge of the problem structure. These different features are constructed from different transformations which must be selected by the analyst. This paper provides a first step towards a methodology that allows the search for near-optimal representations in classification problems by allowing the automatic selection and composition of feature transformations from an initial set of basis functions. In many cases, the original representation for the problem data is not the most appropriate, and the search for a new representation space that is closer to the structure of the problem to be solved is critical for the successful solution of the problem. On the other hand, once this optimal representation is found, most of the problems may be solved by a linear classification method. As a proof of concept we present two classification problems where the class distributions have a very intricate overlap on the space of original attributes. For these problems, the proposed methodology is able to construct representations based on function compositions from the trigonometric and polynomial bases that provide a solution where some of the classical learning methods, e.g. multilayer perceptrons and decision trees, fail. The methodology consists of a discrete search within the space of compositions of the basis functions and a linear mapping performed by a Fisher discriminant. We play special emphasis on the first part. Finding the optimal composition of basis functions is a difficult problem because of its nongradient nature and the large number of possible combinations. We rely on the global search capabilities of a genetic algorithm to scan the space of function compositions.

Show full item record

Files in this item

Name

feature_valle_TAMIDA_2005.pdf

Size

483.4Kb

Format

PDF

Google™ Scholar:Valle, Manuel del - Sánchez, Beatric - Lago Fernández, Luis Fernando - Corbacho Abelaira, Fernando

This item appears in the following Collection(s)

Producción científica en acceso abierto de la UAM [20411]

UAM_Biblioteca