DME - Artigos em Revistas Internacionais / Articles in International Journals
Permanent URI for this collection
Artigo ou um editorial publicado numa revista científica.
(Aceite; Publicado; Actualizado).
Pesquisar Copyright
Browse
Browsing DME - Artigos em Revistas Internacionais / Articles in International Journals by Author "Bacelar-Nicolau, Helena"
Now showing 1 - 10 of 12
Results Per Page
Sort Options
- Cluster Analysis of Business DataPublication . Sousa, Áurea; Bacelar-Nicolau, Helena; Silva, OsvaldoIn this work, classical as well as probabilistic hierarchical clustering models are used to look for typologies of variables in classical data, typologies of groups of individuals in a classical three-way data table, and typologies of groups of individuals in a symbolic data table. The data are issued from a questionnaire on business area in order to evaluate the quality and satisfaction with the services provided to customers by an automobile company. The Ascendant Hierarchical Cluster Analysis (AHCA) is based, respectively, on the basic affinity coefficient and on extensions of this coefficient for the cases of a classical three-way data table and a symbolic data table, obtained from the weighted generalized affinity coefficient. The probabilistic aggregation criteria used, under the probabilistic approach named VL methodology (V for Validity, L for Linkage), resort essentially to probabilistic notions for the definition of the comparative functions. The validation of the obtained partitions is based on the global statistics of levels (STAT).
- Cluster analysis using affinity aoefficient in order to identify religious beliefs profilesPublication . Sousa, Áurea; Nicolau, Fernando C.; Bacelar-Nicolau, Helena; Silva, OsvaldoWe present an application of Ascendant Hierarchical Cluster Analysis (AHCA) to a dataset related to religion, in order to find a typology of religious beliefs profiles of individuals who live on São Miguel island (Azores) according to the frequency they go to the Mass. AHCA was based on the weighted generalized affinity coefficient for symbolic or complex data, and on classical and probabilistic aggregation criteria; the probabilistic ones belong to a parametric family of methods in the scope of the VL methodology. Additionally, we applied some validation measures (based on the values of the proximity matrix and adapted for the case of similarity measures) to evaluate the obtained results (clusters and partitions).
- Clustering an interval data set : are the main partitions similar to a priori partition?Publication . Sousa, Áurea; Bacelar-Nicolau, Helena; Nicolau, Fernando C.; Silva, OsvaldoIn this paper we compare the best partitions of data units (cities) obtained from different algorithms of Ascendant Hierarchical Cluster Analysis (AHCA) of a well-known data set of the literature on symbolic data analysis (“city temperature interval data set”) with a priori partition of cities given by a panel of human observers. The AHCA was based on the weighted generalised affinity with equal weights, and on the probabilistic coefficient associated with the asymptotic standardized weighted generalized affinity coefficient by the method of Wald and Wolfowitz. These similarity coefficients between elements were combined with three aggregation criteria, one classical, Single Linkage (SL), and the other ones probabilistic, AV1 and AVB, the last ones in the scope of the VL methodology. The evaluation of the partitions in order to find the partitioning that best fits the underlying data was carried out using some validation measures based on the similarity matrices. In general, global satisfactory results have been obtained using our methods, being the best partitions quite close (or even coinciding) with the a priori partition provided by the panel of human observers.
- Clustering of Symbolic Data based on Affinity Coefficient: Application to a Real Data SetPublication . Sousa, Áurea; Bacelar-Nicolau, Helena; Nicolau, Fernando C.; Silva, OsvaldoIn this paper, we illustrate an application of Ascendant Hierarchical Cluster Analysis (AHCA) to complex data taken from the literature (interval data), based on the standardized weighted generalized affinity coefficient, by the method of Wald and Wolfowitz. The probabilistic aggregation criteria used belong to a parametric family of methods under the probabilistic approach of AHCA, named VL methodology. Finally, we compare the results achieved using our approach with those obtained by other authors.
- Clustering of variables with a three-way approach for health sciencesPublication . Bacelar-Nicolau, Helena; Nicolau, Fernando C.; Sousa, Áurea; Bacelar-Nicolau, LeonorCluster analysis or classification usually concerns a set of exploratory multivariate data analysis methods and techniques for grouping either a set of statistical data units or the associated set of descriptive variables, into clusters of similar and, hopefully, well separated elements. In this work we refer to an extension of this paradigm to generalized three-way data representations and particularly to classification of interval variables. Such approach appears to be especially useful in large data bases, mostly in a data mining context. A health sciences case study is partially discussed.
- Distribution of the Affinity Coefficient between Variables based on the Monte Carlo Simulation MethodPublication . Sousa, Áurea; Silva, Osvaldo; Bacelar-Nicolau, Helena; Nicolau, Fernando C.The affinity coefficient and its extensions have both been used in hierarchical and non-hierarchical Cluster Analysis. The purpose of the present empirical study on the distribution of the basic and the generalized affinity coefficients and on the distribution of the standardized affinity coefficient, by the method of Wald and Wolfowitz, under different assumptions, is to assess the effect of the statistical probability distributions of the variables (columns) of the initial data matrix, and of the respective parameters, in the distribution of the values of these coefficients. We present some results concerning the asymptotic distribution of the referred coefficients under the assumption that the variables (for which the values of these coefficients are calculated) are independent and have statistical probability distributions specified apriori. In this distributional study, based on the Monte Carlo simulation method, we considered ten well-known statistical probability distributions with different variations of the respective parameters. The simulation studies lead to the conclusion that the coefficients’ convergence for the normal distribution is quite fast and, in general, a good approximation is obtained for small sample sizes, that is for sample sizes above 20 and in many cases for sample sizes above 10.
- Entrepreneurship Promotion in Higher Education InstitutionsPublication . Sousa, Áurea; Couto, Gualter; Branco, Nélia Cavaco; Silva, Osvaldo; Bacelar-Nicolau, HelenaThe importance of entrepreneurship promotion has increased significantly in today's society, especially during periods of crises. This work is based on the responses obtained through a survey conducted on a sample of 305 undergraduates of the University of the Azores, enrolled in different science programs. The aim is to deepen the knowledge of the entrepreneurial propensity of higher education students in the Azores, and in that way the university can stimulate their interest in creating businesses. The main results obtained, using exploratory data analysis (from the univariate to the multivariate), are presented and discussed.
- A global Approach to the Comparison of Clustering ResultsPublication . Silva, Osvaldo; Bacelar-Nicolau, Helena; Nicolau, Fernando C.The discovery of knowledge in the case of Hierarchical Cluster Analysis (HCA) depends on many factors, such as the clustering algorithms applied and the strategies developed in the initialstage of Cluster Analysis. We present a global approach for evaluating the quality of clustering results and making a comparison among different clustering algorithms using the relevant information available (e.g. the stability, isolation and homogeneity of the clusters). In addition, we present a visual method to facilitate evaluation of the quality of the partitions, allowing identification of the similarities and differences between partitions, as well as the behaviour of the elements in the partitions. We illustrate our approach using a complex and heterogeneous dataset (real horse data) taken from the literature. We apply HCA based on the generalized affinity coefficient (similarity coefficient) to the case of complex data (symbolic data), combined with 26 (classic and probabilistic) clustering algorithms. Finally, we discuss the obtained results and the contribution of this approach to gaining better knowledge of the structure of data.
- Measuring similarity of complex and heterogeneous data in clustering of large data setsPublication . Bacelar-Nicolau, Helena; Nicolau, Fernando C.; Sousa, Áurea; Bacelar-Nicolau, LeonorCluster analysis or classification usually concerns a set of exploratory multivariate data analysis methods and techniques for finding a clustering structure on a dataset. That may refer either to groups of statistical data units or to groups of variables. In this work we deal with a generalization of this paradigm concerning clustering of complex data described by three different types of variables, frequently present in a three-way context. We obtain compatible versions of the same affinity coefficient for measuring similarity between statistical data units described by those three types of variables. A global generalized similarity coefficient is analyzed for such kind of mixed data, often arising in data mining or knowledge mining.
- On clustering interval data with different scales of measures : experimental resultsPublication . Sousa, Áurea; Bacelar-Nicolau, Helena; Nicolau, Fernando C.; Silva, OsvaldoSymbolic Data Analysis can be defined as the extension of standard data analysis to more complex data tables. We illustrate the application of the Ascendant Hierarchical Cluster Analysis (AHCA) to a symbolic data set (with a known structure) in the field of the automobile industry (car data set), in which objects are described by variables whose values are intervals of the real data set (interval variables). The AHCA of thirty-three car models, described by eight interval variables (with different scales of measure), was based on the standardized weighted generalized affinity coefficient, by the method of Wald and Wolfowitz. We applied three probabilistic aggregation criteria in the scope of the VL methodology (V for Validity, L for Linkage). Moreover, we compare the achieved results with those obtained by other authors, and with a priori partition into four clusters defined by the category (Utilitarian, Berlina, Sporting and Luxury) to which the car belong. We used the global statistics of levels (STAT) to evaluate the obtained partitions.