6 Support Vector Machines in R the linear kernel implementing the simplest of all kernel functions k(x,x)=(x,x (15) the Gaussian Radial Basis Function (RBF)kernel k(x,x')=exp(-allx-xl2) (16) the polynomial kernel (x,x)=(scale.(x)+offset)degree (17) the hyperbolic tangent kernel (x,x)=tanh (scale.(x,x+offset) (18) the Bessel function of the first kind kernel k(x,x')= Bessel+1)(allx-x'll) (n-t'l)-n(v+1) (19) the Laplace Radial Basis Function(RBF)kenrel k(x,x)=exp(-allx-x'll) (20) the ANOVA radial basis kernel k(x,x)) ep-oe-ry】 (21) . the linear splines kernel in one dimension ,)=1+min(c,)-专'(min(e,y+ams,r 3 (22) and for the multidimensional case k(x,x')=II=1k(,). The Gaussian and Laplace RBF and Bessel kernels are general-purpose kernels used when there is no prior knowledge about the data.The linear kernel is useful when dealing with large sparse data vectors as is usually the case in text categorization.The polynomial kernel is popular in image processing and the sigmoid kernel is mainly used as a proxy for neural networks.The splines and ANOVA RBF kernels typically perform well in regression problems. 2.5.Software Support vector machines are currently used in a wide range of fields,from bioinformatics to astrophysics.Thus,the existence of many SVM software packages comes as little surprise. Most existing software is written in C or C++,such as the award winning libsvm(Chang and Lin 2001),which provides a robust and fast SVM implementation and produces state of the
6 Support Vector Machines in R • the linear kernel implementing the simplest of all kernel functions k(x, x 0 ) = hx, x 0 i (15) • the Gaussian Radial Basis Function (RBF) kernel k(x, x 0 ) = exp(−σkx − x 0 k 2 ) (16) • the polynomial kernel k(x, x 0 ) = scale · hx, x 0 i + offsetdegree (17) • the hyperbolic tangent kernel k(x, x 0 ) = tanh scale · hx, x 0 i + offset (18) • the Bessel function of the first kind kernel k(x, x 0 ) = Besseln (ν+1)(σkx − x 0k) (kx − x 0k)−n(ν+1) (19) • the Laplace Radial Basis Function (RBF) kenrel k(x, x 0 ) = exp(−σkx − x 0 k) (20) • the ANOVA radial basis kernel k(x, x 0 ) = Xn k=1 exp(−σ(x k − x 0k ) 2 ) !d (21) • the linear splines kernel in one dimension k(x, x0 ) = 1 + xx0 min(x, x0 ) − x + x 0 2 (min(x, x0 ) 2 + (min(x, x0 ) 3 ) 3 (22) and for the multidimensional case k(x, x 0 ) = Qn k=1 k(x k , x0k ). The Gaussian and Laplace RBF and Bessel kernels are general-purpose kernels used when there is no prior knowledge about the data. The linear kernel is useful when dealing with large sparse data vectors as is usually the case in text categorization. The polynomial kernel is popular in image processing and the sigmoid kernel is mainly used as a proxy for neural networks. The splines and ANOVA RBF kernels typically perform well in regression problems. 2.5. Software Support vector machines are currently used in a wide range of fields, from bioinformatics to astrophysics. Thus, the existence of many SVM software packages comes as little surprise. Most existing software is written in C or C++, such as the award winning libsvm (Chang and Lin 2001), which provides a robust and fast SVM implementation and produces state of the
Journal of Statistical Software 7 art results on most classification and regression problems(Meyer,Leisch,and Hornik 2003), SVMlight (Joachims 1999),SVMTorch(Collobert,Bengio,and Mariethoz 2002),Royal Hol- loway Support Vector Machines,(Gammerman,Bozanic,Scholkopf,Vovk,Vapnik,Bottou, Smola,Watkins,LeCun,Saunders,Stitson,and Weston 2001),mySVM(Ruiping 2004),and M-SVM(Guermeur 2004).Many packages provide interfaces to MATLAB (The Math Works 2005)(such as libsvm),and there are some native MATLAB toolboxes as well such as the SVM and Kernel Methods Matlab Toolbox (Canu,Grandvalet,and Rakotomamonjy 2003) or the MATLAB Support Vector Machine Toolbox (Gunn 1998)and the SVM toolbox for Matlab(Schwaighofer 2005) 2.6.R software overview The first implementation of SVM in R(R Development Core Team 2005)was introduced in the e1071(Dimitriadou,Hornik,Leisch,Meyer,and Weingessel 2005)package.The svm() function in e1071 provides a rigid interface to libsvm along with visualization and parameter tuning methods. Package kernlab features a variety of kernel-based methods and includes a SVM method based on the optimizers used in libsvm and bsvm(Hsu and Lin 2002c).It aims to provide a flexible and extensible SVM implementation. Package klaR (Roever,Raabe,Luebke,and Ligges 2005)includes an interface to SVMlight,a popular SVM implementation that additionally offers classification tools such as Regularized Discriminant Analysis. Finally,package svmpath(Hastie 2004)provides an algorithm that fits the entire path of the SVM solution (i.e.,for any value of the cost parameter). In the remainder of the paper we will extensively review and compare these four SVM imple- mentations. 3.Data Throughout the paper,we will use the following data sets accessible through R(see Table 1), most of them originating from the UCI machine learning database (Blake and Merz 1998): iris This famous (Fisher's or Anderson's)iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width,respectively,for 50 flowers from each of 3 species of iris.The species are Iris setosa,versicolor,and virginica.The data set is provided by base R. spam A data set collected at Hewlett-Packard Labs which classifies 4601 e-mails as spam or non-spam.In addition to this class label there are 57 variables indicating the frequency of certain words and characters in the e-mail.The data set is provided by the kernlab package. musk This dataset in package kernlab describes a set of 476 molecules of which 207 are judged by human experts to be musks and the remaining 269 molecules are judged to be non-musks.The data has 167 variables which describe the geometry of the molecules
Journal of Statistical Software 7 art results on most classification and regression problems (Meyer, Leisch, and Hornik 2003), SVMlight (Joachims 1999), SVMTorch (Collobert, Bengio, and Mari´ethoz 2002), Royal Holloway Support Vector Machines, (Gammerman, Bozanic, Sch¨olkopf, Vovk, Vapnik, Bottou, Smola, Watkins, LeCun, Saunders, Stitson, and Weston 2001), mySVM (Ruping ¨ 2004), and M-SVM (Guermeur 2004). Many packages provide interfaces to MATLAB (The MathWorks 2005) (such as libsvm), and there are some native MATLAB toolboxes as well such as the SVM and Kernel Methods Matlab Toolbox (Canu, Grandvalet, and Rakotomamonjy 2003) or the MATLAB Support Vector Machine Toolbox (Gunn 1998) and the SVM toolbox for Matlab (Schwaighofer 2005) 2.6. R software overview The first implementation of SVM in R (R Development Core Team 2005) was introduced in the e1071 (Dimitriadou, Hornik, Leisch, Meyer, and Weingessel 2005) package. The svm() function in e1071 provides a rigid interface to libsvm along with visualization and parameter tuning methods. Package kernlab features a variety of kernel-based methods and includes a SVM method based on the optimizers used in libsvm and bsvm (Hsu and Lin 2002c). It aims to provide a flexible and extensible SVM implementation. Package klaR (Roever, Raabe, Luebke, and Ligges 2005) includes an interface to SVMlight, a popular SVM implementation that additionally offers classification tools such as Regularized Discriminant Analysis. Finally, package svmpath (Hastie 2004) provides an algorithm that fits the entire path of the SVM solution (i.e., for any value of the cost parameter). In the remainder of the paper we will extensively review and compare these four SVM implementations. 3. Data Throughout the paper, we will use the following data sets accessible through R (see Table 1), most of them originating from the UCI machine learning database (Blake and Merz 1998): iris This famous (Fisher’s or Anderson’s) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica. The data set is provided by base R. spam A data set collected at Hewlett-Packard Labs which classifies 4601 e-mails as spam or non-spam. In addition to this class label there are 57 variables indicating the frequency of certain words and characters in the e-mail. The data set is provided by the kernlab package. musk This dataset in package kernlab describes a set of 476 molecules of which 207 are judged by human experts to be musks and the remaining 269 molecules are judged to be non-musks. The data has 167 variables which describe the geometry of the molecules
8 Support Vector Machines in R promotergene Promoters have a region where a protein (RNA polymerase)must make contact and the helical DNA sequence must have a valid conformation so that the two pieces of the contact region spatially align.The dataset in package kernlab contains DNA sequences of promoters and non-promoters in a data frame with 106 observations and 58 variables.The DNA bases are coded as follows:'a'adenine,'c'cytosine,'g' guanine,and 't'thymine. Vowel Speaker independent recognition of the eleven steady state vowels of British English using a specified training set of LPC derived log area ratios.The vowels are indexed by integers 0 to 10.This dataset in package mlbench (Leisch and Dimitriadou 2001)has 990 observations on 10 independent variables. DNA in package mlbench consists of 3,186 data points (splice junctions).The data points are described by 180 indicator binary variables and the problem is to recognize the 3 classes ('ei','ie',neither),i.e.,the boundaries between exons (the parts of the DNA sequence retained after splicing)and introns(the parts of the DNA sequence that are spliced out). Breast Cancer in package mlbench is a data frame with 699 observations on 11 variables, one being a character variable,9 being ordered or nominal,and 1 target class.The objective is to identify each of a number of benign or malignant classes. BostonHousing Housing data in package mlbench for 506 census tracts of Boston from the 1970 census.There are 506 observations on 14 variables. B3 German Bussiness Cycles from 1955 to 1994 in package klaR.A data frame with 157 observations on the following 14 variables. Dataset #Examples #Attributes Class Distribution(%) b c m cl iris 150 5 2 33.3/33.3/33.3 spam 4601 572 39.40/60.59 musk 476 166 2 42.99/57.00 promotergene 106 57 2 50.00/50.00 Vowel 990 1 910 10.0/10.0/.. DNA 3186 180 2 24.07/24.07/51.91 BreastCancer 699 9 2 34.48/65.52 BostonHousing 506 12 (regression) B3 506 13 4 37.57/15.28/29.93/17.19 Table 1:The data sets used throughout the paper. Legend: b=binary,c=categorical, m=metric.cl number of classes. 4.ksvm in kernlab Package kernlab(Karatzoglou,Smola,Hornik,and Zeileis 2004)aims to provide the R user with basic kernel functionality (e.g.,like computing a kernel matrix using a particular kernel)
8 Support Vector Machines in R promotergene Promoters have a region where a protein (RNA polymerase) must make contact and the helical DNA sequence must have a valid conformation so that the two pieces of the contact region spatially align. The dataset in package kernlab contains DNA sequences of promoters and non-promoters in a data frame with 106 observations and 58 variables. The DNA bases are coded as follows: ‘a’ adenine, ‘c’ cytosine, ‘g’ guanine, and ‘t’ thymine. Vowel Speaker independent recognition of the eleven steady state vowels of British English using a specified training set of LPC derived log area ratios. The vowels are indexed by integers 0 to 10. This dataset in package mlbench (Leisch and Dimitriadou 2001) has 990 observations on 10 independent variables. DNA in package mlbench consists of 3,186 data points (splice junctions). The data points are described by 180 indicator binary variables and the problem is to recognize the 3 classes (‘ei’, ‘ie’, neither), i.e., the boundaries between exons (the parts of the DNA sequence retained after splicing) and introns (the parts of the DNA sequence that are spliced out). BreastCancer in package mlbench is a data frame with 699 observations on 11 variables, one being a character variable, 9 being ordered or nominal, and 1 target class. The objective is to identify each of a number of benign or malignant classes. BostonHousing Housing data in package mlbench for 506 census tracts of Boston from the 1970 census. There are 506 observations on 14 variables. B3 German Bussiness Cycles from 1955 to 1994 in package klaR. A data frame with 157 observations on the following 14 variables. #Attributes Dataset #Examples b c m cl Class Distribution (%) iris 150 5 3 33.3/33.3/33.3 spam 4601 57 2 39.40/60.59 musk 476 166 2 42.99 / 57.00 promotergene 106 57 2 50.00 / 50.00 Vowel 990 1 9 10 10.0/10.0/... DNA 3186 180 3 24.07/24.07/51.91 BreastCancer 699 9 2 34.48 / 65.52 BostonHousing 506 1 12 (regression) B3 506 13 4 37.57/15.28/29.93/17.19 Table 1: The data sets used throughout the paper. Legend: b=binary, c=categorical, m=metric, cl = number of classes. 4. ksvm in kernlab Package kernlab (Karatzoglou, Smola, Hornik, and Zeileis 2004) aims to provide the R user with basic kernel functionality (e.g., like computing a kernel matrix using a particular kernel)
Journal of Statistical Software 9 along with some utility functions commonly used in kernel-based methods like a quadratic programming solver,and modern kernel-based algorithms based on the functionality that the package provides.It also takes advantage of the inherent modularity of kernel-based methods, aiming to allow the user to switch between kernels on an existing algorithm and even create and use own kernel functions for the various kernel methods provided in the package. kernlab uses R's new object model described in "Programming with Data"(Chambers 1998) which is known as the S4 class system and is implemented in package methods.In contrast to the older S3 model for objects in R,classes,slots,and methods relationships must be declared explicitly when using the S4 system.The number and types of slots in an instance of a class have to be established at the time the class is defined.The objects from the class are validated against this definition and have to comply to it at any time.S4 also requires formal declarations of methods,unlike the informal system of using function names to identify a certain method in S3.Package kernlab is available from CRAN (http://CRAN.R-project. org/)under the GPL license. The ksvm()function,kernlab's implementation of SVMs,provides a standard formula inter- face along with a matrix interface.ksvm()is mostly programmed in R but uses,through the Call interface,the optimizers found in bsvm and libsvm (Chang and Lin 2001)which provide a very efficient C++version of the Sequential Minimization Optimization (SMO). The SMO algorithm solves the SVM quadratic problem (QP)without using any numerical QP optimization steps.Instead,it chooses to solve the smallest possible optimization prob- lem involving two elements of ai because the must obey one linear equality constraint.At every step,SMO chooses two oi to jointly optimize and finds the optimal values for these ai analytically,thus avoiding numerical QP optimization,and updates the SVM to reflect the new optimal values. The SVM implementations available in ksvm()include the C-SVM classification algorithm along with the v-SVM classification.Also included is a bound constraint version of C classi- fication(C-BSVM)which solves a slightly different QP problem(Mangasarian and Musicant 1999,including the offset B in the objective function)using a modified version of the TRON (Lin and More 1999)optimization software.For regression,ksvm()includes the e-SVM regres- sion algorithm along with the v-SVM regression formulation.In addition,a bound constraint version (e-BSVM)is provided,and novelty detection (one-class classification)is supported. For classification problems which include more then two classes (multi-class case)two options are available:a one-against-one (pairwise)classification method or the native multi-class formulation of the SVM(spoc-svc)described in Section 2.The optimization problem of the native multi-class SVM implementation is solved by a decomposition method proposed in Hsu and Lin (2002c)where optimal working sets are found (that is,sets of ai values which have a high probability of being non-zero).The QP sub-problems are then solved by a modified version of the TRON optimization software. The ksvm()implementation can also compute class-probability output by using Platt's prob- ability methods (Equation 8)along with the multi-class extension of the method in Wu et al. (2003).The prediction method can also return the raw decision values of the support vector model: library("kernlab") data("iris") irismodel <-ksvm(Species ~.data iris
Journal of Statistical Software 9 along with some utility functions commonly used in kernel-based methods like a quadratic programming solver, and modern kernel-based algorithms based on the functionality that the package provides. It also takes advantage of the inherent modularity of kernel-based methods, aiming to allow the user to switch between kernels on an existing algorithm and even create and use own kernel functions for the various kernel methods provided in the package. kernlab uses R’s new object model described in “Programming with Data” (Chambers 1998) which is known as the S4 class system and is implemented in package methods. In contrast to the older S3 model for objects in R, classes, slots, and methods relationships must be declared explicitly when using the S4 system. The number and types of slots in an instance of a class have to be established at the time the class is defined. The objects from the class are validated against this definition and have to comply to it at any time. S4 also requires formal declarations of methods, unlike the informal system of using function names to identify a certain method in S3. Package kernlab is available from CRAN (http://CRAN.R-project. org/) under the GPL license. The ksvm() function, kernlab’s implementation of SVMs, provides a standard formula interface along with a matrix interface. ksvm() is mostly programmed in R but uses, through the .Call interface, the optimizers found in bsvm and libsvm (Chang and Lin 2001) which provide a very efficient C++ version of the Sequential Minimization Optimization (SMO). The SMO algorithm solves the SVM quadratic problem (QP) without using any numerical QP optimization steps. Instead, it chooses to solve the smallest possible optimization problem involving two elements of αi because the must obey one linear equality constraint. At every step, SMO chooses two αi to jointly optimize and finds the optimal values for these αi analytically, thus avoiding numerical QP optimization, and updates the SVM to reflect the new optimal values. The SVM implementations available in ksvm() include the C-SVM classification algorithm along with the ν-SVM classification. Also included is a bound constraint version of C classi- fication (C-BSVM) which solves a slightly different QP problem (Mangasarian and Musicant 1999, including the offset β in the objective function) using a modified version of the TRON (Lin and More 1999) optimization software. For regression, ksvm() includes the -SVM regression algorithm along with the ν-SVM regression formulation. In addition, a bound constraint version (-BSVM) is provided, and novelty detection (one-class classification) is supported. For classification problems which include more then two classes (multi-class case) two options are available: a one-against-one (pairwise) classification method or the native multi-class formulation of the SVM (spoc-svc) described in Section 2. The optimization problem of the native multi-class SVM implementation is solved by a decomposition method proposed in Hsu and Lin (2002c) where optimal working sets are found (that is, sets of αi values which have a high probability of being non-zero). The QP sub-problems are then solved by a modified version of the TRON optimization software. The ksvm() implementation can also compute class-probability output by using Platt’s probability methods (Equation 8) along with the multi-class extension of the method in Wu et al. (2003). The prediction method can also return the raw decision values of the support vector model: > library("kernlab") > data("iris") > irismodel <- ksvm(Species ~ ., data = iris
10 Support Vector Machines in R type ="C-bsvc",kernel ="rbfdot", kpar list(sigma =0.1),C=10, prob.model TRUE) irismodel Support Vector Machine object of class "ksvm" SV type:C-bsvc (classification) parameter cost C =10 Gaussian Radial Basis kernel function. Hyperparameter sigma =0.1 Number of Support Vectors 32 Training error:0.02 Probability model included. predict(irismodel,iris[c(3,10,56,68, 107,120),-5],type "probabilities") setosa versicolor virginica [1,]0.9864328200.0073594070.006207773 [2,]0.9833238130.0101189920.006557195 [3,]0.0048525280.9675551260.027592346 [4,]0.0095468230.9884967240.001956452 [5,]0.0127673400.0694960290.917736631 [6,]0.0115481760.1500353840.838416441 predict(irismodel,iris[c(3,10,56,68, 107,120),-5],type "decision") [,1] [,2] [,3] [1,]-1.460398-1.1910251-3.8868836 [2,]-1.357355-1.1749491-4.2107843 [3,]1.647272 0.7655001-1.3205306 [4,]1.412721 0.4736201-2.7521640 [5,]1.844763 1.00000001.0000019 [6,]1.848985 1.00690100.6742889 ksvm allows for the use of any valid user defined kernel function by just defining a function which takes two vector arguments and returns its Hilbert Space dot product in scalar form. >k <-function(x,y) (sum(x*y)+1)*exp(0.001*sum((x- y)2) +}
10 Support Vector Machines in R + type = "C-bsvc", kernel = "rbfdot", + kpar = list(sigma = 0.1), C = 10, + prob.model = TRUE) > irismodel Support Vector Machine object of class "ksvm" SV type: C-bsvc (classification) parameter : cost C = 10 Gaussian Radial Basis kernel function. Hyperparameter : sigma = 0.1 Number of Support Vectors : 32 Training error : 0.02 Probability model included. > predict(irismodel, iris[c(3, 10, 56, 68, + 107, 120), -5], type = "probabilities") setosa versicolor virginica [1,] 0.986432820 0.007359407 0.006207773 [2,] 0.983323813 0.010118992 0.006557195 [3,] 0.004852528 0.967555126 0.027592346 [4,] 0.009546823 0.988496724 0.001956452 [5,] 0.012767340 0.069496029 0.917736631 [6,] 0.011548176 0.150035384 0.838416441 > predict(irismodel, iris[c(3, 10, 56, 68, + 107, 120), -5], type = "decision") [,1] [,2] [,3] [1,] -1.460398 -1.1910251 -3.8868836 [2,] -1.357355 -1.1749491 -4.2107843 [3,] 1.647272 0.7655001 -1.3205306 [4,] 1.412721 0.4736201 -2.7521640 [5,] 1.844763 1.0000000 1.0000019 [6,] 1.848985 1.0069010 0.6742889 ksvm allows for the use of any valid user defined kernel function by just defining a function which takes two vector arguments and returns its Hilbert Space dot product in scalar form. > k <- function(x, y) { + (sum(x * y) + 1) * exp(0.001 * sum((x - + y)^2)) + }