Contents



Breast cancer input data



Overview of data classifiers



Exponential Radial Basis Function SVM (ERBFSVM)



Radial Basis Function Neural Network (RBFNN)



Learning Vector Quantization Neural Network (LVQNN)



Two Layer Perceptron (2LP)



Back Propagation Network (BPNN)



Decision supporting parameters



Cross Validation error



Diagnostic accuracy measures



Comparative analysis



Conclusions

Introduction

The goal:



Comparative analysis of ANNs and SVM in the
classification of medical data: breast cancer.



The input space is represented by the population of
683 women who suffered from breast cancer with 10
attributes associated with each patient.



Two data set classification problem, since each
instance has one of two possible cases: benign or
malignant.



In order to find which of the models predicts unknown
data more accurately, a 10-fold Cross Validation (CV)
algorithm is used.



Additionally, the diagnostic accuracy of the classifiers
is measured.

Introduction – cont.



Breast cancer input data (Wolberg, Mangasarian):

www.ics.uci.edu/~mlearn/databases/breast-cancer-
wisconsin/



Cardinality of learning data set



Total number of attributes

clump thickness (1,2,…,10)

uniformity of cell size (1,2,…,10)

uniformity of cell shape (1,2,…,10)

marginal adhesion (1,2,…,10)

single epithelial cell size (1,2,…,10)

bare nuclei (1,2,…,10)

bland chromatin (1,2,…,10)

normal nucleoli (1,2,…,10)

mitoses (1,2,…,10)

n 

683

l 

ERBFSVM



ERBF kernel



Solution: optimal classifier

____________________________________

where









sign

i i

i SV

y K











x x























i i

i SV

y K

r s

SV y





 





 







x x









exp









x x

LVQNN



The output (1 neuron):



The outputs of the competitive hidden layer neurons



compet returns 1 only for the neuron whose weight vector

forms the closest match with the input x.





(2)

(1)

, ,













(1)

compet

, ,

j n









RBF network



The output (1 neuron):



The outputs of hidden layer neurons



- spread constant





(2)

(1)

, ,









(1)

exp

2( )

, ,

j n





































2LP



The output (1 neuron):

where



The outputs of hidden layer neurons:









(2)

(1)

(2)

(1)

sign

, ,













(1)

sign





BPN



The output (1 neuron):

where



The outputs of hidden layer neurons:









(2)

(1)

(2)

(1)

logsig

, ,

















(1)

logsig

logsig( )

exp















Cross Validation procedure

Division of all input patterns (e.g. l = 199) into K
subsets, (e.g. K = 10).

Model training on all subsets except from one.

Model testing on the subset left out.



Cross Validation error (should be as low as
possible):



- the number of misclassified examples

within a single m

separation.



L - the number of validating data.





100









ERBFSVM results



The lowest percentage of misclassified examples:

= 2.78 [%] by C = 1 and

σ ∈ {3.9,4.7}

0.1

100

1000

10000

[%]

C = 1
C = 10
C = 100
C = 1 000
C = 10 000
C = 100 000
C = 1 000 000

RBFNN results



The lowest percentage of misclassified examples:

= 2.78 [%] by S1

∈ {9,11,13, ..., 29}

100

200

300

400

500

600

[%]

LVQNN vs. remaining networks



LVQNN accomplishes identical lowest prediction error:
ERBFSVM and RBFNN, E

= 2.78 [%] by S1 = 10

100

200

300

400

500

600

[%]

2TBPNN
2LBPNN
2LP
LVQNN

3LBPNN



3 layer BPNN is completely unpredictable as a
classifier ...

100

150

200

100

150

200

[%]

Generalization results –

summary

Classifier

[%]

sensitivity

specificity

ERBSFM

2.78

0.95

0.98

RBFNN

2.78

0.95

0.98

LVQNN

2.78

0.95

0.98

2LP

4.11

0.93

0.94

3BPNN

3.53

0.93

0.95

Diagnostic accuracy and generalization
ability of breast cancer data classifiers

Conclusions



The sensitivity 0.95 for ERBFSVM, LVQNN and
RBFNN means that 95% of sick patients is identified
as sick.



Specificity 0.98 gives 98% of certainty that healthy
patients are diagnosed as such.



High diagnostic accuracy is justified by a very good
generalization ability of the models; E

= 2.78 [%].

This suggests that these classifiers are reliable and
precise in medical diagnosis on new breast cancer
cases.



The above models can serve as a feedback for
physicians during the process of treatment.

Conclusions – cont.



Medical Diagnostic System

Acknowledgements
The author is grateful to PhD student Maciej Kusy for his valuable help with data
preparation and calculations.

Module I

Data Preparation

Module III

Main Routine -

Training and

Testing of Models

Module II

Specification and

Initialization of

Models

Module IV

Selection of Most

Optimal Model.

Rule Generation

Document Outline