GR SOLUTION: K-means clustering implementation in weka tool

Procedure:

Step1: Open the data file in Weka Explorer. It is presumed that the required data fields have been discretized. In this example it is age attribute.

Step2: Clicking on the associate tab will bring up the interface for association rule algorithm.

Step3: We will use K-means algorithm. This is the default algorithm.

Step4: Inorder to change the parameters for the run (example support, confidence etc) we click on the text box immediately to the right of the choose button.

Scheme: weka.clusterers.SimpleKMeans -init 0 -max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2 -1.0 -N 3 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots 1 -S 10

Relation: labor-neg-data

Instances: 57

Attributes: 17

duration

wage-increase-first-year

wage-increase-second-year

wage-increase-third-year

cost-of-living-adjustment

working-hours

pension

standby-pay

shift-differential

education-allowance

statutory-holidays

vacation

longterm-disability-assistance

contribution-to-dental-plan

bereavement-assistance

contribution-to-health-plan

class

Test mode: evaluate on training data

=== Clustering model (full training set) ===

kMeans

======

Number of iterations: 3

Within cluster sum of squared errors: 119.5224194214812

Initial starting points (random):

Cluster 0: 1,5.7,3.971739,3.913333,none,40,empl_contr,7.444444,4,no,11,generous,yes,full,yes,full,good

Cluster 1: 1,2,3.971739,3.913333,tc,40,ret_allw,4,0,no,11,generous,no,none,no,none,bad

Cluster 2: 2,2.5,3,3.913333,tcf,40,none,7.444444,4.870968,no,11,below_average,yes,half,yes,full,bad

Missing values globally replaced with mean/mode

Final cluster centroids:

Cluster#

Attribute Full Data 0 1 2

(57.0) (36.0) (5.0) (16.0)

========================================================================================

duration 2.1607 2.2267 1.4 2.25

wage-increase-first-year 3.8036 4.4695 3.2 2.4938

wage-increase-second-year 3.9717 4.4175 4.183 2.9027

wage-increase-third-year 3.9133 4.1093 3.9133 3.4725

cost-of-living-adjustment none none none none

working-hours 38.0392 37.4766 39.2078 38.94

pension empl_contr empl_contr none empl_contr

standby-pay 7.4444 7.9938 6.7556 6.4236

shift-differential 4.871 5.4776 3.1484 4.0444

education-allowance no no no no

statutory-holidays 11.0943 11.4801 10.6 10.3809

vacation below_average generous below_average below_average

longterm-disability-assistance yes yes no yes

contribution-to-dental-plan half half none half

bereavement-assistance yes yes no yes

contribution-to-health-plan full full none full

class good good bad bad

Time taken to build model (full training data) : 0.01 seconds

=== Model and evaluation on training set ===

Clustered Instances

0 36 ( 63%)

1 5 ( 9%)

2 16 ( 28%)

Scheme: weka.clusterers.SimpleKMeans -init 0 -max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2 -1.0 -N 2 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots 1 -S 10

Relation: labor-neg-data

Instances: 57

Attributes: 17

duration

wage-increase-first-year

wage-increase-second-year

wage-increase-third-year

cost-of-living-adjustment

working-hours

pension

standby-pay

shift-differential

education-allowance

statutory-holidays

vacation

longterm-disability-assistance

contribution-to-dental-plan

bereavement-assistance

class

Ignored:

contribution-to-health-plan

Test mode: Classes to clusters evaluation on training data

=== Clustering model (full training set) ===

kMeans

======

Number of iterations: 5

Within cluster sum of squared errors: 122.05464734126849

Initial starting points (random):

Cluster 0: 1,5.7,3.971739,3.913333,none,40,empl_contr,7.444444,4,no,11,generous,yes,full,yes,good

Cluster 1: 1,2,3.971739,3.913333,tc,40,ret_allw,4,0,no,11,generous,no,none,no,bad

Missing values globally replaced with mean/mode

Final cluster centroids:

Cluster#

Attribute Full Data 0 1

(57.0) (43.0) (14.0)

==========================================================================

duration 2.1607 2.213 2

wage-increase-first-year 3.8036 4.2024 2.5786

wage-increase-second-year 3.9717 4.221 3.2062

wage-increase-third-year 3.9133 4.0329 3.5462

cost-of-living-adjustment none none none

working-hours 38.0392 37.6557 39.2171

pension empl_contr empl_contr none

standby-pay 7.4444 7.7778 6.4206

shift-differential 4.871 5.2018 3.8548

education-allowance no no no

statutory-holidays 11.0943 11.2878 10.5

vacation below_average below_average below_average

longterm-disability-assistance yes yes yes

contribution-to-dental-plan half half none

bereavement-assistance yes yes yes

class good good bad

Time taken to build model (full training data) : 0 seconds

=== Model and evaluation on training set ===

Clustered Instances

0 43 ( 75%)

1 14 ( 25%)

Class attribute: contribution-to-health-plan

Classes to Clusters:

0 1 <-- assigned to cluster

20 8 | none

9 0 | half

14 6 | full

Cluster 0 <-- none

Cluster 1 <-- full

Incorrectly clustered instances : 31.0 54.386 %

Pages

K-means clustering implementation in weka tool

No comments:

Post a Comment

Popular Posts