Grouping Of European Union Member States Economics Essay
歐盟成員國的分組
這篇文章中所呈現的主要的思想是關注于對歐盟成員國的融合的視覺探索,根據由歐洲委員會對不同的經濟和社會領域所定義的結構性指標。目前,結構性指標列表包含了七十九個指標,它們是被以下六個領域所組成:總體的經濟背景、就業、創新和研究,經濟改革、社會凝聚力和環境。為了實現可視化和Kohonen州的分組自組織映射(SOM算法)的目的。具體的應用程序在兩個階段進行。在第一階段Kohonen SOM算法分別應用于結構指標的每一組數據集。結果表明:6張圖表用視覺圖像說明了歐盟國家之間在特定結構性區域內的異同點。在下一階段,基于所有歐盟成員國的總圖結構指標被標明出來。總圖表明在三個集群中的的歐盟國家的“自然”形成。但是,其他集群解決方案也提出說明歐盟成員國的集群化趨勢及其區域凝聚力。最后,針對于四個集群的解決方案,我們產生了合適的集團范本。
The main idea presented in this paper concerns to the visual exploration of the convergence of European Union member states in the light of Structural Indicators defined by the European Commission for different economic and social domains. Currently the Structural Indicators list comprises seventy nine indicators which are covered by the following six domains: General Economic Background, Employment, Innovation and Research, Economic reform, Social cohesion and Environment. For the purpose of visualization and grouping of states the Kohonen Self-Organizing Map (SOM algorithm) is applied. The concrete application is carried out in two stages. In the first stage the Kohonen SOM algorithm is applied on each group of Structural Indicator dataset separately. The resulting six maps provide visual images as illustrations of the similarities and differences between European Union countries regarding specific structural areas. In the next stage, the general map of European Union states based on all structural indicators is presented. The general map suggests the „natural“ grouping of the European Union countries in three clusters, but, the other cluster solutions are also presented to illustrate the clustering tendency of European Union states and their regional cohesion. Finally, for the four-cluster solution, the appropriate group profile analysis is generated.
Keywords: Kohonen Self-Organizing Map, visualization, clustering, structural indicators, group profile analysis
JEL classification: C45; O52; R10
1. Introduction
At the Lisbon meeting of the European Council in 2000 the so-called Lisbon Strategy was formulated, aimed at achieving the European Union strategic goal for the next decade "of becoming the most competitive and dynamic knowledge-based economy in the world capable of sustainable economic growth with more and better jobs and greater social cohesion" [1] . As a part of this strategy the initiative was put forth for the introduction of the specific set of social and economic indicators for European Union (EU) member states. The relevant indicators were defined as Structural Indicators (SI) and have been used as an objective measure of the progress that EU has made towards the Lisbon Objective. Currently the list of Structural Indicators comprises seventy nine indicators which cover the following six domains: General Economic Background, Employment, Innovation and Research, Economic reform, Social cohesion and Environment.#p#分頁標題#e#
Considering this on-going historical process and facing the end of the decade when the Lisbon Objective was set, the main idea of the paper is to provide a comprehensive image of the current status and the relative positions of EU member states according to the Structural Indicators. For this purpose, Kohonen Self-Organizing Map (SOM algorithm) is applied. This algorithm belongs to the group of data mining tools (Bozdogan, H. ed. 2004, Keating, B. 2008) which are appropriate for visualization and clustering purposes.
The paper is organized as follows. After an introduction, the second part offers a literature review. The third part provides a description of data used for SOM application. The next section deals with exploratory data analysis where multiple box plots and coefficients of variation for particular datasets are presented. The fifth section describes the main characteristics of the applied algorithm. This section is followed by the sixth crucial part of the paper which discusses the empirical results of the visualization and grouping of EU states by SOM algorithm. Finally, the concluding remarks are presented.
2. Literature review
Generally the problem of studying the convergence and grouping structure in the EU area is a topic that attracts great attention of economists and other researchers. One strand of studies relates to the domain of economic growth theory and encompasses a large amount of literature. A great number of authors - to mention only a few - Chatterji, M. (1992), Quah, D.T. (1996), Corrado, L. et.al. (2005), Mora, T. (2005, 2008), Sassi, M. (2006), Dall’erba, S, Le Galo, J., (2008) focused on the idea concerning a phenomenon known as the “club convergence hypothesis” and the existence of “convergence clubs” among European countries and regions. In recent years this idea was empirically investigated by many researchers and a great deal of research effort was devoted to developing appropriate econometric tools for testing the hypothesis and exploring the phenomenon. The other approach, which brought not only a remarkable contribution to the idea of convergence and cluster structures in EU area, but also provided a contribution to the clustering methodology, is proposed by Gligor, M. and Ausloos, M. (2006, 2007, 2008a, 2008b, 2008c). These authors considered the European countries as interacting agents of a complex system. They studied the problem of cross-correlations between EU countries by putting it into a framework of classification trees and complex systems analysis. They recalled the properties of the MST (Minimal Spanning Tree) method, which had been used by Hill, R. (2001) as a methodology for linking countries in a way that international price and quantity indexes can be chained. The main problem of the MST method was the lack of robustness. Hill pointed out this problem and concluded that this method is not stable over time. To research the clustering structures in the EU area Gligor, M. and Ausloos, M. (2008 b) proposed the application of the MAMLP (Moving-Average-Minimal-Length-Path) method. The principal findings of these authors concerned the empirical verification of the decreasing of the mean statistical distance between EU countries (reflected in the correlated fluctuations of the basic macro-economic indicators, such as: GDP, GDP/per capita, Consumption and Investments).#p#分頁標題#e#
On the other side, from the perspective of the idea of visualization, Kohonen SOM algorithm appeared as an appropriate method in many studies. I indicate here a few studies that employed Kohonen algorithm, as a visualization and cluster-extracting method. One of the early socio-economic applications of the Kohonen map was made by Varfis and Versino (1992), who compared the results of the SOM algorithm to some well known statistical techniques (principal component analysis and hierarchical clustering). They mainly focused on the techniques’ properties making comparisons between them and noted that the Kohonen algorithm is a worthy alternative to well-established statistical techniques. The authors provided the Kohonen map of European Statistical Territorial Units and discovered that units on the map cluster into distinct European geographic areas. Also, interesting results are presented by Kaski, S. & Kohonen, T. (1996) who used the Kohonen algorithm in creating the “welfare map” of the countries of the world. Among the remarkable findings of this article, concerning the identification of welfare and poverty structures among the countries of the world, one was of a particular importance. That was the fact that, although no geographical factors were applied in computing the map, the organization of the countries on the map reflects their geographical positions.
Similar conclusions regarding the influence of “geography” on economic factors can be found in the exploratory study provided by Haughton, D. et. al. (2003) where twenty-five Central and Eastern European and Central Asian states (“transition economies”) were investigated in order to group according their abilities to attract foreign direct investment (FDI). The authors employed Kohonen map to determine the extent to which the distribution of foreign direct investment in different regions reflects the geographic organization of the countries. The resulting map based on a set of 21 macroeconomic indicators, showed that the distribution of the FDI determinants closely approximates the geographic grouping of the countries in the region.
The other study realized by Mattoscio, N. et. al. (2009) analyzed clusters and distances among the EU’s member states in terms of health standards and economic development. The Kohonen map in this analysis revealed some well-defined groups of countries. Also, the authors derived a conclusion that differences among countries in the identified groups are based on just a few variables of the whole set used in the analysis, which can be summarized in terms of current expenses and investments.
Interesting findings regarding the application of the modified Kohonen algorithm is provided by Aaron, C. et. al. (2003) who studied the convergence of EU countries within the Maastricht criteria framework for the 1980-2002 period. They used cross-country time-series data for four economic variables: deficit in GNP, debt in GNP, inflation rate and long-term nominal interest rate. The approach which they proposed related to the quantification of the trajectory of each country, where each trajectory was considered as a multivariate function of time. They applied the SOM algorithm to obtain a classification of the functions. Further on, they showed that this approach allowed the tracking of the individual evolution of each country with respect to the chosen criteria. Their results “confirmed the convergence of the European countries to Maastricht criteria, since the number of classes has diminished since 1980” (Aaron, C. et. al., 2003, p.330).#p#分頁標題#e#
In the context of the previous broad literature, the objective of this paper is to explore the relative positions and grouping of 27 EU countries in the light of the Lisbon Strategy and to provide empirical comparisons to some earlier studies.
3. Description of the data
The data considered in this study is provided by EU statistical office (EUROSTAT) and may be found at EUROSTAT web portal [2] . The complete SI list is divided in six domains: General Economic Background (GEB), Innovation and Research (I&R), Economic reform, Employment, Environment and Social cohesion. These indicators were defined as a part of broader set of EU Policy Indicators. In addition to the Structural Indicators, EU Policy Indicators include Principal European Economic Indicators (Euro Indicators PEEIs), Sustainable Development Indicators and Employment and Social Policy Indicators. In this paper only the data on Structural indicators for EU member states are analyzed. Therefore, the results of this study should be considered only in that respect.
In the following the list of Structural indicators that have been treated in the application is presented.
Economic Reform indicators are supposed to reflect the EU member states’ capacity for coherent development of economic environment with regards to: effective competition and trade, price convergence, government’s initiatives for supporting entrepreneurship in the EU and providing the healthy business environment for job creation and poverty reduction, market integration by different types of trade activity (eliminating barriers to the free movement of goods, services and people within the EU). This set of Structural Indicators includes the following [3] :
Business demography
Business investment
Comparative price levels
Electricity prices by type of user;
Gas prices by type of user;
Market integration by type of trade activities;
Market Integration - Foreign Direct Investment (FDI) intensity
Market share of the incumbent in fixed telecommunications by types of call
Market share of the leading operator in mobile telecommunication
Market share of the largest generator in the electricity market
Price of telecommunications by type of call;
Public procurement
State aid by type of aid.
In the area of Employment the general objective of the EU employment policy has been defined in the form of concrete figures - as the achievement of an average employment rate of 70% for the EU overall and at least 60% for women by 2010. The indicators in this domain address one of the vital issues of prosperous economic and social development of any country, i.e. the issue of adequate and balanced utilization of the human capital as an essential economic resource. The list of eleven indicators is defined as follows [4] :#p#分頁標題#e#
Employment rate by gender;
Employment rate of older workers by gender;
Average exit age from the labor force by gender;
Gender pay gap in unadjusted form
Tax wedge on labor cost
Fatal accidents at work
Implicit tax rate on labor
Life-long learning by gender;
Tax rate on low wage earners by marginal effective tax rates on employment incomes
Unemployment rate by gender
Serious accidents at work by gender
In the context of EU economic and social development special attention has been dedicated to the environmental issues dealing with greenhouse gas emission, renewable energy sources, efficient usage of the disposable resources, waste reduction and recycling, etc. To monitor the progress in this sphere the following indicator list has been proposed [5] :
Car share of inland passenger transport
Combined heat and power generation
Electricity generated from renewable sources
Energy intensity of the economy
Farmland bird index
Greenhouse gas emissions
Healthy life years at birth by gender;
Municipal waste generated
Municipal waste by type of treatment;
Resource productivity
Road share of inland freight transport
Sufficiency of sites designated under the EU habitats directive
Implicit tax rate on energy
Urban population exposure to air pollution by ozone
Urban population exposure to air pollution by particulate matter
Volume of freight transport relative to GDP
Volume of passenger transport relative to GDP.
Structural Indicators on General Economic Conditions provide the basis for structural reform and monitor a country’s economic prosperity. They include the following indicators [6] :
GDP per capita in PPS (Purchasing Power Standards)
Employment growth by gender
General government debt
Inflation rate
Labor productivity per hour worked
#p#分頁標題#e#
Labor productivity per person employed
Public balance
Real GDP growth rate
Real unit labor cost growth.
The area of Innovation and Research is considered to be the key driver for developing European information or knowledge-based society. The following list of sixteen indicators is defined to follow the progress in this domain [7] :
Broadband penetration rate
E-Commerce via Internet
E-government on-line availability
E-government usage by enterprises
E-government usage by individuals by gender; All Individuals
Gross domestic expenditure on R&D (GERD) by source of funds;
Gross domestic expenditure on R&D (GERD)
High-tech exports
ICT expenditure by type of product;
Level of Internet access – households
Patent applications to the European Patent Office (EPO)
Patents granted by the United States Patent and Trademark Office (USPTO)
Science and technology graduates by gender;
Spending on Human Resources
Venture capital investments by type of investment stage;
Youth education attainment level by gender.
In the field of Social cohesion, the main EU objective is „to significantly reduce the number of persons at risk of poverty and social exclusion by 2010“. The achievement of this objective should be evaluated by monitoring a list of ten indicators [8] .
Early school leavers by gender;
Formal child care by duration and age group;
Inequality of income distribution
Jobless households – children
Jobless households by gender;
Long-term unemployment rate by gender;
At-risk-of-poverty rate before social transfers by gender;
At-risk-of-poverty rate after social transfers by gender
Dispersion of regional employment rates by gender
Persistent-at-risk-of-poverty rate by gender
The complete SI list includes seventy nine indicators. However, since the data on three indicators are not available for single states [9] , the application is based on previously cited seventy six indicators. The analyzed datasets are yearly data. The majority of them refers to 2008 and 2007, with a few exceptions: Economic Reform indicators number 1, 8 and 9, Employment indicators number 6 and 11, Environment indicator number 5, I&R indicators number 8, 9 and 11 refer to the 2006; Environment indicator number 10 and I&R indicator number 14 date back to 2005; finally the I&R indicator number 12 refers to 2003.#p#分頁標題#e#
——One of the problems that frequently appeared in practical data mining analysis, which also appeared in this study, is the problem of missing data (Grzymala-Busse, J.W. et. al. 2005). This problem brings severe concerns also when traditional methods are used (Tabachnick, G. B. et. al. 2007). However, due to the fact that the SOM is a very robust algorithm, this problem was overcome in a relatively simple way: as the crucial part of the SOM algorithm is the choice of the best-matching node on the map for each input vector, in the case of missing value(s) for some input vector attribute(s), the process of looking-up the best-matching node on the map is limited to the available values, i.e. the lookup is conducted in a reduced map. After the training process is completed and the map is generated, the missing value(s) for particular attribute(s) can be replaced by the corresponding SOM nodes’ value(s) [10] .
4. Exploratory data analysis
Exploratory data analysis usually appears as an initial phase of many data analysis assignments (Giudici, P. et. al. 2009). Very often it sets the stage for further analysis and is fundamental for understanding what might be discovered through the application of some sophisticated data analysis technique, such as data mining techniques (Hand, D. et. al. 2001). In this study, the basic objective of the initial (explorative) data analysis is to describe the main features of the six SI datasets regarding, in particular, their variability. For this purpose the multiple box plots are used. They are particularly useful for making efficient comparisons between distributions of different univariate datasets (Myatt, J.G. 2007, Myatt, J.G. et.al. 2009). Having in mind that particular structural indicators are measured on different scales and that great discrepancy of the original SI data values is dominant, data are standardized prior to the graphical presentation. Figure 1 shows the relative image of SI datasets in each of the six groups.
Figure 1: Box plots of six SI datasets (standardized values)
a) Economic reform indicators b) Employment indicators
c) Environment indicators d) GEB indicators
e) I&R indicators f) Social cohesion indicators
The box plots presented in Figure 1 reveal the general central tendency and variability of the particular structural indicators datasets, as well as the shape of the distributions. It is evident that some distributions are skewed positively; others negatively, and for some indicators the outliers and/or extreme values are recorded.
In the area of Economic reform structural indicators, three of them: namely indicator number 7 (Market Integration - Foreign Direct Investment intensity), number 9 (Market share of the leading operator in mobile telecommunication) and number 12 (Public procurement) show strong positive skewness with outliers and extreme values to the right of the distributions.#p#分頁標題#e#
In the group of Employment structural indicators several indicators, such as: number 5 (Tax wedge on labor cost), number 8 (Life-long learning by gender), number 9 (Tax rate on low wage earners by marginal effective tax rates on employment incomes), number 10 (Unemployment rate by gender) and number 11 (Serious accidents at work by gender) also show moderate or substantial departure from normal distribution.
The Environment structural indicators show the most versatile data distribution shapes with a lot of outliers and/or extreme values. In particular, this is the case with the following indicators: number 1 (Car share of inland passenger transport), number 2 (Combined heat and power generation), number 3 (Electricity generated from renewable sources), number 4 (Energy intensity of the economy), number 12 (Sufficiency of sites designated under the EU habitats directive), number 15 (Urban population exposure to air pollution by particulate matter) and number 17 (Volume of passenger transport relative to GDP).
In the domain of GEB structural indicators, the presence of outliers can be observed in relation to indicator number 1 (GDP per capita in PPS), number 4 (Inflation rate) and number 6 (Labor productivity per hour worked), as well as the skewness of distributions.
Among the sixteen structural indicators on Innovation and Research a few of them – number 5 (E-government usage by individuals by gender), number 11 (Patent applications to the European Patent Office), number 12 (Patents granted by the United States Patent and Trademark Office), number 14 (Spending on Human Resources) and number 16 (Youth education attainment level by gender) - have outliers and substantial departures from normal distribution.
Regarding the group on social cohesion indicators, the most remarkable is the behavior of the first indicator (Early school leavers by gender) with a substantial positive skewness and outliers. Also, indicator number 3 (Inequality of income distribution), number 4 (Jobless households – children) and number 5 (Jobless households by gender) show positive skewness.
The previous results are very indicative and helpful in making decisions on the potential data transformations which are usually considered in the early stage of the Kohonen map processing.
In addition to the box plots, the graphical presentation of coefficients of variation for each SI dataset is examined and the most interesting findings are presented at the Figure 2.
Coefficients of variations for Structural indicators concerning the Employment, Environment and Social cohesion are generally in the range of 0.0 – 1.0, while the coefficients of variation for structural indicators on Innovation and Research, General Economic Background and Economic reform are spread in wider ranges: 0.131 – 1.159, 0.324 - 2.036 and 0.169 – 3.445 respectively.#p#分頁標題#e#
Figure 2: Coefficients of variation for each SI dataset
It is interesting to note that in the group of Economic reform structural indicators, most coefficients of variation take values in the 0.168 – 0.602 interval. However, there is a peak on the variability of indicator number seven, i.e. on Market Integration - Foreign Direct Investment (FDI) intensity (Average value of inward and outward FDI flows divided by GDP - in percent). The coefficient of variation for this indicator has an extreme value of 3.445 due to the high value of this indicator for Luxembourg (234%), while the other indicators from this group have moderate variability.
The values of coefficient of variation for Employment indicators are spread from 0.025 to 0.804, having the peak on indicator number 8 (Life-long learning by gender) – 0.804. When this value is excluded the upper limit in this SI group is 0.411.
In the group of Environment structural indicators the minimal value of coefficient of variation is found for indicator number seven - Healthy life years at birth by gender (0.067) and maximal value for the third indicator - Electricity generated from renewable sources (0.986). Also, there is a high variability for the second and the forth indicator: Combined heat and power generation and Energy intensity of the economy with coefficients of variation equal to 0.830 and 0.731 respectively.
Concerning the General Economic Background structural indicators, a few of them present extremely high variability (indicator number 2, 7, 8 and 9). These are indicators on Annual percentage change in total employed population, Public balance, Real GDP growth rate and Real unit labor cost growth. The coefficients of variation for these indicators are: 1.196, 1.635, 2.036 and 1.325 respectively.
In the I&R group, two indicators (number 11 and 12: Patent applications to the European Patent Office - EPO and Patents granted by the United States Patent and Trademark Office – USPTO) have the coefficient of variation value above 1.0, i.e. 1.101 and 1.159. The coefficients of variation for the other I&R indicators take the values from the 0.131 to 0.886 interval.
Finally, the group of Social cohesion indicators demonstrates the lowest variability across the EU states. The respective coefficient of variation values fall within the 0.146 - 0.592 interval.
5. General characteristics of the Kohonen Self-Organizing Map algorithm
The Kohonen SOM algorithm is a well-known learning algorithm that belongs to the group of unsupervised neural networks models. Historically the SOM method was originally proposed as an explanation for biological phenomena. The fundamental idea of SOM map was introduced by Marlsburg(1973) and Grossberg(1976), but later on Kohonen (1981a, 1981b) proposed the model which was successfully applied to a number of pattern recognition and engineering applications (Cherkassky,V.et.al.1998). The main feature of the SOM algorithm, which makes it helpful for this study, is that the relations between data items from the original data space become explicit on the SOM map due to a nonlinear projection from a high-dimensional space onto a two-dimensional display [11] . Different aspects of the Kohonen algorithm were extensively analyzed by many authors and, also, numerous examples of SOM applications may be found in literature [12] .#p#分頁標題#e#
To avoid turning this paper into a technical-oriented one, here only the essentials of the SOM technique are presented (Deboeck, G. et.al. 1998). The SOM map consists of units (nodes or neurons) which reflect the general form of the input data space. After net training, each unit represents a group of individuals with similar features, i.e. individuals with similar features correspond to the same unit or to neighboring units on the map. Assuming that the observation space is n-dimensional, the real sample vectors x(t), t = 1,2,… are
x(t) = [x1(t), x2(t) , …, xn(t)] (1)
where t = 1, 2 ... refers to the index of the sample.
Similar to the sample vectors there are n-dimensional model vectors
mi(t) = [mi1(t), mi2(t) , …, min(t)] (2)
which are associated with each node on the map. The values of the model vectors are adjusted in response to samples x(t), t =1, 2,.. . At each step of the algorithm, primarily the index of the “winner node” is identified in the following manner:
(3)
where the index c refers to the “winner neuron”. This is followed by the adjustment of the model vectors mi(t):
mi(t+1) = mi(t) + ?(t) hci(t) ?x(t) – mi(t)? . (4)
In the previous relation two important parameters appear: ?(t) as a learning rate and neighborhood function hci(t) . Learning rate takes the values from the 0 – 1 interval, and should be close to unity at the beginning of the learning process, and then gradually decreases with t. The neighborhood function has a specific role in the process of local relaxation or smoothing of the weight vectors of neurons in the neighborhood of the “winning neuron”. For the purpose of convergence it is necessary that hci(t)? 0 when t ? ?. This is a unique feature of SOM algorithm that neighborhood width gradually decreases as iterations progress (Cherkassky V. et. al. 1998). More details on a set of practical issues relevant for application of SOM algorithm (the input variables scaling, initial values of the SOM algorithm, optimization of the learning rate, selection of the neighborhood function, number of training steps, coloring of the clusters, etc.) are discussed, among the others, by Kaski, S., et. al. (2000) and Kohonen T. (2001).
6. Major results: the visualization and grouping of EU member states
In this section major results of SOM application are presented. The section consists of two parts. In the first part the particular solutions for each SI dataset are presented [13] . These particular mapping solutions are followed by the general SOM map of EU states based on all structural indicators. The respective results are described in the second part of the section – 6.2.#p#分頁標題#e#
6.1. Kohonen maps of EU states – particular solutions for each SI dataset
As explained in the previous section, during the SOM training process the nodes of the map gradually adapt to the intrinsic shape of data distribution. But, prior to the training of the SOM algorithm each dataset forwarded into SOM algorithm passes through a pre-processing data procedure (Deboeck, G. et.al. 1998). As SI datasets show substantial differences regarding the appropriate measurement level and variability, indicators are scaled by the appropriate standard deviation (Berry M.J.A., et. al. 1997). The other issue relevant to the data preprocessing phase is the transformation of the particular SI datasets. Data transformations are based on the exploratory data analysis results (presented in the section 4). For substantial positively skewed distributions, with outliers and/or extreme values, the appropriate transformation (square root or log transformation) are considered and applied, while for substantial negatively skewed data with outliers, the relevant transformations for positive skewness are conducted on the reflected original datasets (Tabachnick, B.G., et. al. 2007). Finally, regarding the priority of the particular indicators, all indicators are treated equally, i.e. the same priority factor equivalent to number 1 is assigned to each of them [14] .
The respective maps at Figure 3 show visual images of EU states and the appropriate intrinsic grouping in clusters. Due to the general SOM property of topology preserving, the closer the positions of two countries on a map, the more similar their profiles regarding the respective structural indicators.
Figure 3: Initial Kohonen maps of EU states based on the particular SI datasets
a) Economic Reform map b) Employment map
c) Environment map d) GEB map
e) I&R map f) Social cohesion map
Along with the projection of the EU states on the map, Figure 3 provides the appropriate clustering solutions. Distinctive clusters are painted in different colors which are automatically generated by the concrete program solution [15] . The specific regions are determined by the number of the clusters, as well as, by the chosen granularity of the map, i.e. the number of map nodes [16] . While the number of the SOM nodes, as an input program value, may be arbitrary chosen and entered by the user at the beginning of the SOM processing, the number of map clusters is not predetermined. Namely, the choice of the concrete cluster solution is based on some objective criteria used in the clustering procedure. Here it is the cluster indicator which is explained in the next part and presented at the Figure 4. The previous presented clustering solutions should be considered as initial options for particular SI structure, and are usually expanded with more in-depth analysis of the concrete map.#p#分頁標題#e#
With regards to the clustering procedure, the SOM-Ward-clustering method is applied. This method combines the well-known Ward’s algorithm (standard hierarchical cluster procedure) and the local order information of the map. This combination assumes that the algorithm starts as a Ward’s method (Hair, J.F.Jr, et.al. 2006), where each node forms a cluster, and continues with combining two clusters at each step of the algorithm. The main principle of the Ward’s method is focused on minimizing the total within-cluster sum of squares which is used as the measure of within-clusters homogeneity. Thus, at each algorithm step, clusters are combined in such a way that the resulting cluster solution has the minimal within-cluster sums of square across all variables in all clusters. However, in addition to this classical Ward’s procedure, SOM-Ward-clustering method takes into account the positions of the clusters in the map, allowing only neighboring clusters to be combined.
Having in mind the explorative nature of SOM algorithm, the number of clusters is not predetermined. Actually, it was experimenting with different cluster numbers by fine-tuning the map parameters, to provide, as much as possible, the optimal partition of data items. Also, it is used a specific operative measure for determining the cluster number - cluster indicator which is calculated at each step of the clustering procedure on the base of the difference between two neighboring SOM-Ward sums of square [17] . This indicator is used as a heuristic measure for a possibly good clustering: its high value for a particular cluster count is a marker of good "natural" clustering and, vice versa, its low value suggests a predominantly "artificial" clustering. Therefore, by following the values of this indicator on the cluster indicator graph, some interesting clustering solutions may be observed. The cluster indicator histograms are given in Figure 4.
The cluster indicator histograms reveal different clustering tendencies in the domains of six SI datasets. Thus, it may be observed that:
the clustering tendencies of EU states regarding the Economic reform and General Economic Background (GEB) are more or less alike. Considering the structural indicators on Economic reform, the highest quality of EU states clustering is provided for three clusters and then for twenty three and twenty six clusters. Regarding the General Economic Background (GEB) the interesting cluster- number may be: four, eight, twenty two and twenty six clusters. Having in mind that the EU consists of 27 states, solutions with more then twenty clusters have no practical relevance. However, this also reveals that EU states are highly differentiated in the domain of Economic reform and General Economic Background;
considering Employment indicators, the optimal cluster number may be thirteen and eight. Again, the option with twenty six clusters is not considered as a relevant one;#p#分頁標題#e#
in the domain of Environment and Social cohesion, cluster indicator histograms have similar shape: high cluster indicator values are recorded for three and four clusters and very low values for the other cluster solutions;
considering the I&R structural indicators, a possible clustering solution for EU member states may be the partition into three or four clusters, then eight or ten clusters.
Figure 4: Cluster indicator graphs for six SOM maps
a) Economic Reform b) Employment c) Environment
d) GEB e) I&R f) Social cohesion
For the purpose of understanding the clustering tendencies of EU states, it is experimented with different cluster number solutions in each SI field. Only the case of GEB Structural Indicators is discussed here. According to the cluster indicator histogram (Figure 4.d) the optimal cluster number is four, but if there is a need for more detailed analysis, the second best solution is eight clusters (partitions into 22 and 26 clusters were excluded for practical reasons).
Figure 5: GEB maps: four, eight and eleven-cluster solution
In addition to the visualization and grouping procedure, the final evaluation of the Kohonen map involves the inspection of the appropriate component planes [18] which usually accompany the map. These planes provide information on the relative distribution of a particular input variable (indicator) and its influences on the final map. Furthermore, they provide information on dependencies among the indicators themselves. Here, only one set of component planes is presented, i.e. component planes for GEB indicators, which illustrate the contribution to the respective map (Figure 6).
Figure 6: SOM on GEB Structural Indicators and component planes
The component planes positioned at the right-hand side of the picture in Figure 6 show the contribution of each GEB Structural Indicator to the respective Kohonen map (left-hand side of the picture). The scale at the bottom of each plane goes from the lowest values to the highest values for each indicator [19] . Comparing the component planes of the nine contributing economic indicators it can be observed that some of them are demonstrating similar structure and contribution to the final map. Looking at the component planes X5 (Labor productivity per hour worked) and X6 (Labor productivity per person employed) it is obvious that they are very similarly colored, indicating similar influences of those two components on the final map. Also, the first plane (indicator X1 - GDP per capita in PPS - purchasing power standards) shows obvious similarity with planes X5 and X6. These three component planes are very much alike, showing the largest value at the lower-left part and generally larger values at the left-hand side, compared to the right side of the plane. Similar behavior of respective indicators may be confirmed with high values of Pearson coefficient of correlation (Table 1).#p#分頁標題#e#
Although the Pearson coefficient of correlation measures only the linear correlation between variables, it also provides precious knowledge on GEB indicators’ dependences. Remarkable values of the coefficient are observed among several pairs of GEB indicators: indicators X1 (GDP per capita in PPS - purchasing power standards) and X5 (Labor productivity per hour worked) and between indicators X1 and X6 (Labor productivity per person employed) with a coefficient of correlation equal to 0.909 and 0.918 respectively. Also, there is a large coefficient of correlation between indicators X5 (Labor productivity per hour worked) and X6 (Labor productivity per person employed) - 0.969, which is in compliance with the visual image of the relevant component planes in Figure 6. It is obvious that these indicators have similar distributions. In contrast, the following pairs of component planes: X1 (GDP per capita in PPS - purchasing power standards) and X4 (Inflation rate), X3 (General government debt) and X4 (Inflation rate), X4 (Inflation rate) and X5 (Labor productivity per hour worked), as well as the pair X4 (Inflation rate) and X6 (Labor productivity per person employed), present images of more or less inverse distributions. The visual perception of their distributions on the appropriate component planes can be confirmed with the negative values of the respective correlation coefficients in Table 1.
6.2. An overall Kohonen map – an atlas of EU states based on all SI datasets
Whilst the previous section describes the visualization and grouping of EU states based on each of six SI datasets separately, this section deals with the visualization of EU states based on all seventy six structural indicators. Having applied the same procedure of mapping as it was previously described, including the pre-processing data phase (see 6.1 part), the following results are obtained (Figure 7 and 8).
Figure 7: Cluster indicator histogram for a general SOM map
The cluster indicator graph in Figure 7 demonstrates that the most relevant grouping of EU states is in three clusters. Apart from this most striking cluster solution, some additional partitions are applied to illustrate the clustering tendency of EU states. The respective results are given in Figure 8.
Figure 8: General map of EU states based on all SI datasets: different cluster solutions
a) Three-cluster solution b) Four-cluster solution c) Five-cluster solution
d) Six-cluster solution e) Nine-cluster solution f) Eleven-cluster solution
The map partitions presented in Figure 8 provide a general view of the clustering tendency of EU states (from three to eleven clusters) based on all Structural indicators.
The proposed groupings of EU member states may be the subject of further analysis, primarily of a group profile analysis. Here are presented the results of the profile analysis referring to the four-cluster solution. The appropriate profile analysis for other cluster solutions may be carried out in a similar fashion.#p#分頁標題#e#
The second map in Figure 8 shows the grouping of EU states in four clusters as follows:
cluster 1 – Belgium, Ireland, Greece, Spain, France, Italy, Portugal, United Kingdom and Luxembourg;
cluster 2 – Bulgaria, Estonia, Latvia, Lithuania, Poland, Romania, Slovenia, Czech Republic, Hungary and Slovakia;
cluster 3 – Denmark, Netherlands, Austria, Germany, Finland and Sweden;
cluster 4 – Cyprus and Malta.
The graph in Figure 9 shows the relative image of the four clusters with each cluster described by the appropriate SI mean values. The concrete height of a bar in the bar charts is given in the relative terms: it shows the deviation of a particular cluster SI mean value from the entire data set SI mean value.
Figure 9: Relative profiles of the four clusters – based on all SI values
As all deviations are measured by the same unit, i.e. standard deviation of the SI values for entire dataset, the previous bar charts provide a good solution for the comparison of cluster profiles. Thus, in comparing four clusters, the following points emerge:
going from the first to the last cluster there is a strong tendency of increasing deviations of the particular cluster’s SI means from the entire dataset mean values;
the profile of the first cluster, consisting of Belgium, Ireland, Greece, Spain, France, Italy, Portugal, United Kingdom and Luxembourg can be considered to be an „average“ cluster profile, since deviations of its SI mean values from the general average values are the ones closest to zero;
most of SI means for the second cluster (Bulgaria, Estonia, Latvia, Lithuania, Poland, Romania, Slovenia, Czech Republic, Hungary and Slovakia) show negative deviations from the general SI mean values for all EU states. They fall in the range of -1.0 and 1.0 standard deviations around the zero line;
unlike in the second cluster, the SI mean values for EU states belonging to the third cluster (Denmark, Netherlands, Austria, Germany, Finland and Sweden) mainly show the positive deviations from the general average values for all EU states;
the fourth cluster comprising just two states – Malta and Cyprus, shows substantial differences on both sides (positive and negative) from the general EU average of all Structural indicators. The deviations are spread in the interval of -2.5 to 2.5 standard deviations around the general mean values.
In addition to the previous profile analysis where all seventy six Structural Indicators are applied and the respective comparisons between clusters are made, in the next part a more in-depth profile analysis of each cluster is proposed. Actually, a four-cluster solution in each group of the Structural Indicators is profiled, where SI standardized values serve as clustering variables. Figure 10 provides the respective graphical results.#p#分頁標題#e#
Figure 10: Profile analysis of standardized SI datasets for the four-cluster solution
Generally, these graphs confirm some of the previous conclusions. For almost all SI groups, except for GEB indicators and Economic reform, the first cluster takes the role of the “average” cluster with the lowest variability of SI values. On the other hand, the fourth cluster shows the largest variability of average standardized SI values around the zero line, but again with the exception of GEB indicators. Cluster number three has clearly distinctive standardized values for the group of Social cohesion, I&R and Employment indicators, which is not the case for the other three SI groups (General Economic Background, Environment and Economic reform) where a lot of intersections with other clusters’ SI values are observed. The second cluster consisting of east European states (Bulgaria, Estonia, Latvia, Lithuania, Poland, Romania, Slovenia, Czech Republic, Hungary and Slovakia), that have gone through the process of economic and social transformations, reveals specific behaviour patterns regarding the GEB and Economic reform indicators, but follows the first cluster’s values for Employment, Social cohesion and I&R indicators.
In addition to the above observations, Figure 10 indicates a different level of separation between four clusters for particular Structural indicators. On the one side, in the domain of Social cohesion, Employment and I&R indicators, clear clusters’ separations exist, while on the other side, with regards to GEB, Environment and Economic reform indicators, a lot of clusters intersections dominate.
7. Concluding remarks
This paper demonstrates how the Kohonen SOM algorithm can be used to provide a visual presentation of the relative position of EU member states according to the specific set of economic and social indicators defined by the European Commission as Structural Indicators (SI). Apart from the general map of EU states (based on all SI datasets), separate maps are presented for each particular SI domain: General Economic Background, Employment, Innovation and Research, Economic reform, Social cohesion and Environment. Since the SOM algorithm combines the goals of both projection and clustering, the paper also discusses a clustering tendency of the EU states based on all Structural Indicators.
Prior to the SOM application an exploratory analysis is performed by the usage of multiple box plots and graphical presentations of the coefficients of variation for each particular SI dataset. It reveals the presence of a higher variability for the Economic reform, General Economic Background and I&R indicators comparing to the Environment, Employment and Social cohesion indicators. The principal findings of SOM application can be summarized as follows:#p#分頁標題#e#
the resulting maps provide intuitive and helpful graphical images of complex SI datasets. Regarding the clustering procedure, it is shown how an optimal cluster number is determined by the data itself. For this purpose, a specific operative measure – the cluster indicator - is used. Based on cluster indicator values, the three-cluster solution appears as the most relevant for the general map of EU countries (cluster 1 - Belgium, France, United Kingdom, Luxembourg, Italy, Spain, Greece, Portugal, Ireland, Cyprus and Malta; cluster 2 - Bulgaria, Estonia, Latvia, Lithuania, Romania, Poland, Hungary, Slovenia, Slovakia and Czech Republic; cluster 3 - Denmark, Netherlands, Austria, Germany, Finland and Sweden);
considering the clustering solutions for the general map (based on all seventy six structural indicators) it is remarkable that geographic location of many countries is reflected in the final map organization, with some exceptions, notably Ireland and Luxembourg. This observation complies with the results of some earlier studies that are discussed in the second part of the paper;
the previous conclusion concerning the relevance of “geography factor” doesn’t apply in the same extent to all Kohonen maps in six SI domains. Going through clustering solutions in these areas, “geography” matters more regarding the General Economic Background, Innovation and Research, and less in the domain of Economic Reform, Environment and Social cohesion and the least regarding the Employment area;
also, it is observed that two country groups - Eastern European countries and Scandinavian countries (Sweden, Finland) with Denmark and Netherlands, appeared as more homogenous and stable groups across all six SI domains comparing to other EU countries;
finally, the profile analysis (for the four-cluster solution) reveals the main clusters’ features and also presents different divisions among clusters regarding particular SI datasets. Whilst in the sphere of Employment, I&R and Social cohesion, a clear separation between clusters of EU member states is recorded, on the other side, more intertwined trends between clusters dominate in the case of Environment, GEB and Economic Reform Indicators.
This paper presents only a static picture of EU countries within the scope of predefined SI datasets. As the Lisbon strategy may be considered as an ongoing historical process which assumes a dynamic dimension, the great challenge for future work is to expand this picture through a time-horizon, and explore the change of individual country positions in that respect.