Hierarchical Clustering of Sub-Populations with a dissimilarity based on the likelihood ratio statis

Ciampi, A.; Lechevallier, Y.; Limas, M.C.; Marcos, A.G.

doi:10.1007/S10044-007-0088-4

Hierarchical Clustering of Sub-Populations with a dissimilarity based on the likelihood ratio statis

Ciampi, A. ¹
Lechevallier, Y. ³
Limas, M.C. ²
Marcos, A.G. ²

1 McGill University

McGill University

Montreal, Canadá

ROR https://ror.org/01pxwe438
2 Universidad de León

Universidad de León

León, España

ROR https://ror.org/02tzt0b78
3 French Institute for Research in Computer Science and Automation

French Institute for Research in Computer Science and Automation

Le Chesnay, Francia

ROR https://ror.org/02kvxyf05

Mostrar afiliacións +

Revista:

Pattern Analysis and Applications

ISSN: 1433-7541

Ano de publicación: 2008

Volume: 11

Número: 2

Páxinas: 199-220

Tipo: Artigo

beta Ver similares en nube de resultados

DOI: 10.1007/S10044-007-0088-4 SCOPUS: 2-s2.0-44449157741 GOOGLE SCHOLAR

Outras publicacións en: Pattern Analysis and Applications

Proxectos relacionados

Aplicación de técnicas de Data Mining a la calificación inteligente de parámetros de calidad en bobinas de acero a la entrada de una linea de galvanizado en caliente

2007/00079/001

CONTROL DE CALIDAD GLOBAL EN TRENES DE FABRICACION DE ELASTOMEROS

2004/00086/001

Resumo

The problem of clustering subpopulations on the basis of samples is considered within a statistical framework: a distribution for the variables is assumed for each subpopulation and the dissimilarity between any two populations is defined as the likelihood ratio statistic which compares the hypothesis that the two subpopulations differ in the parameter of their distributions to the hypothesis that they do not. A general algorithm for the construction of a hierarchical classification is described which has the important property of not having inversions in the dendrogram. The essential elements of the algorithm are specified for the case of well-known distributions (normal, multinomial and Poisson) and an outline of the general parametric case is also discussed. Several applications are discussed, the main one being a novel approach to dealing with massive data in the context of a two-step approach. After clustering the data in a reasonable number of 'bins' by a fast algorithm such as k-Means, we apply a version of our algorithm to the resulting bins. Multivariate normality for the means calculated on each bin is assumed: this is justified by the central limit theorem and the assumption that each bin contains a large number of units, an assumption generally justified when dealing with truly massive data such as currently found in modern data analysis. However, no assumption is made about the data generating distribution. © 2007 Springer-Verlag London Limited.

Hierarchical Clustering of Sub-Populations with a dissimilarity based on the likelihood ratio statis

McGill University

Universidad de León

French Institute for Research in Computer Science and Automation

Proxectos relacionados

Resumo