Procedure:
Step1: Open the data file in Weka Explorer. It is presumed that the
required data fields have been discretized. In this example it is age
attribute. 
Step2: Clicking on the associate tab will bring up the interface for
association rule algorithm. 
Step3: We will use K-means algorithm. This is the default algorithm.
Step4:
Inorder to change the parameters for the run (example support, confidence etc)
we click on the text box immediately to the right of the choose button.
Scheme:       weka.clusterers.SimpleKMeans -init 0
-max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2 -1.0
-N 3 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots
1 -S 10
Relation:     labor-neg-data
Instances:    57
Attributes:   17
              duration
              wage-increase-first-year
              wage-increase-second-year
              wage-increase-third-year
              cost-of-living-adjustment
              working-hours
              pension
              standby-pay
              shift-differential
              education-allowance
              statutory-holidays
              vacation
              longterm-disability-assistance
              contribution-to-dental-plan
              bereavement-assistance
              contribution-to-health-plan
              class
Test mode:    evaluate on training data
=== Clustering model
(full training set) ===
kMeans
======
Number of iterations: 3
Within cluster sum of
squared errors: 119.5224194214812
Initial starting points
(random):
Cluster 0:
1,5.7,3.971739,3.913333,none,40,empl_contr,7.444444,4,no,11,generous,yes,full,yes,full,good
Cluster 1:
1,2,3.971739,3.913333,tc,40,ret_allw,4,0,no,11,generous,no,none,no,none,bad
Cluster 2:
2,2.5,3,3.913333,tcf,40,none,7.444444,4.870968,no,11,below_average,yes,half,yes,full,bad
Missing values globally
replaced with mean/mode
Final cluster
centroids:
                                                   
Cluster#
Attribute                            Full Data             0             1             2
                                       
(57.0)        (36.0)         (5.0)        (16.0)
========================================================================================
duration                                2.1607       
2.2267           1.4          2.25
wage-increase-first-year                3.8036        4.4695           3.2        2.4938
wage-increase-second-year               3.9717        4.4175         4.183        2.9027
wage-increase-third-year                3.9133        4.1093        3.9133        3.4725
cost-of-living-adjustment                 none          none          none          none
working-hours                          38.0392       37.4766       39.2078         38.94
pension                             empl_contr    empl_contr          none    empl_contr
standby-pay                             7.4444        7.9938        6.7556        6.4236
shift-differential                       4.871        5.4776        3.1484        4.0444
education-allowance                         no            no            no            no
statutory-holidays                     11.0943       11.4801          10.6       10.3809
vacation                         below_average      generous below_average below_average
longterm-disability-assistance             yes           yes            no           yes
contribution-to-dental-plan               half          half          none          half
bereavement-assistance                     yes           yes            no          
yes
contribution-to-health-plan               full          full          none          full
class                                     good          good           bad           bad
Time taken to build
model (full training data) : 0.01 seconds
=== Model and
evaluation on training set ===
Clustered Instances
0      36 ( 63%)
1       5 ( 
9%)
2      16 ( 28%)
Scheme:       weka.clusterers.SimpleKMeans -init 0
-max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2 -1.0
-N 2 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots
1 -S 10
Relation:     labor-neg-data
Instances:    57
Attributes:   17
              duration
              wage-increase-first-year
              wage-increase-second-year
              wage-increase-third-year
              cost-of-living-adjustment
              working-hours
              pension
              standby-pay
              shift-differential
              education-allowance
              statutory-holidays
              vacation
              longterm-disability-assistance
              contribution-to-dental-plan
              bereavement-assistance
              class
Ignored:
              contribution-to-health-plan
Test mode:    Classes to clusters evaluation on training
data
=== Clustering model (full
training set) ===
kMeans
======
Number of iterations: 5
Within cluster sum of squared
errors: 122.05464734126849
Initial starting points (random):
Cluster 0: 1,5.7,3.971739,3.913333,none,40,empl_contr,7.444444,4,no,11,generous,yes,full,yes,good
Cluster 1:
1,2,3.971739,3.913333,tc,40,ret_allw,4,0,no,11,generous,no,none,no,bad
Missing values globally replaced
with mean/mode
Final cluster centroids:
                                                    Cluster#
Attribute                            Full Data             0             1
                                       
(57.0)        (43.0)        (14.0)
==========================================================================
duration                                2.1607         2.213             2
wage-increase-first-year                3.8036        4.2024        2.5786
wage-increase-second-year               3.9717         4.221        3.2062
wage-increase-third-year                3.9133        4.0329        3.5462
cost-of-living-adjustment                 none          none          none
working-hours                          38.0392       37.6557       39.2171
pension                             empl_contr    empl_contr  
       none
standby-pay                             7.4444        7.7778        6.4206
shift-differential                       4.871        5.2018        3.8548
education-allowance                         no            no            no
statutory-holidays                     11.0943       11.2878          10.5
vacation                         below_average
below_average below_average
longterm-disability-assistance             yes           yes           yes
contribution-to-dental-plan               half          half          none
bereavement-assistance                     yes           yes           yes
class                                     good          good           bad
Time taken to build model (full
training data) : 0 seconds
=== Model and evaluation on
training set ===
Clustered Instances
0      43 ( 75%)
1      14 ( 25%)
Class attribute:
contribution-to-health-plan
Classes to Clusters:
 
0  1  <-- assigned to cluster
 20  8 |
none
 
9  0 | half
 14  6 |
full
Cluster 0 <-- none
Cluster 1 <-- full
Incorrectly clustered instances :           31.0        
54.386  %
 








 
 
 
 
 
 
 
 
No comments:
Post a Comment