1 简介
为了实现对大量的网络信息的正确分类以便使用户迅速获取所需信息,提出一种新的网页内容分类算法,该算法将粒子群算法(PSO)与最小二乘支持向量机(LSSVM)结合起来,利用粒子群算法良好的寻优能力优化LSSVM的分类性能。在由新闻网页文本构成的数据集上的仿真实验结果表明,PSO和LSSVM融合的算法能够有效提高LSSVM的分类性能,新算法的分类正确率相比基本的LSSVM有非常显著的提高。由此证明,提出的基于PSO的LSSVM改进算法是有效的,能够用于对大量网络信息的分类问题中。
粒子群优化算法(PSO)是由心理学家Kennedy和Eberhart博士在1995年共同提出的一种新的模仿鸟群行为的智能优化算法。该算法概念简单、实现方便、收敛速度快、参数设置少,是一种高效的搜索算法,目前被广泛应用于函数优化、神经网络训练等领域。粒子群优化算法通过粒子之间的集体协作使群体达到最优。在粒子群优化算法中,每个个体称为一个“粒 子”,代表一个潜在的解。粒子在飞行过程中能够记住自己找到的最好位置,称为“局部最优pbest”,此外还记住群体中所有粒子找到的最好位置,称为“全局最优gbest”,然后根据这两个最优来调整自己的飞行方向与飞行速度,即粒子群中的粒子根据式(1)和式(2)来更新自己的飞行速度与飞行距离。
2 部分代码
function cost = crossvalidatelssvm(model,Y, L, omega, estfct,combinefct)
% Estimate the model performance of a model with l-fold crossvalidation
%
%%%%%%%%%%%%%%%%%%%%%
% INTERNAL FUNCTION %
%%%%%%%%%%%%%%%%%%%%%
% Estimate the model performance of a model with fast l-fold crossvalidation.
% Implementation based on "De Brabanter et al., Computationsl Statistics & Data Analysis, 2010"
% Copyright (c) 2011, KULeuven-ESAT-SCD, License & help @% www.esat.kuleuven.be/sista/lssvm…
%
% See also:
% leaveoneoutlssvm, crossvalidatelssvm, trainlssvm
% Copyright (c) 2002, KULeuven-ESAT-SCD, License & help @ www.esat.kuleuven.ac.be/sista/lssvm…
% initialisation and defaults
%
%if size(X,1)~=size(Y,1), error('X and Y have different number of datapoints'); end
nb_data = size(Y,1);
d = size(model.xtrain,2);
% LS-SVMlab
eval('model = initlssvm(model{:});',' ');
model.status = 'changed';
eval('L;','L=min(round(sqrt(size(model.xfull,1))),10);');
eval('estfct;','estfct=''mse'';');
eval('combinefct;','combinefct=''mean'';');
% Y is raw data, non preprocessed
py = Y;
[~,Y] = postlssvm(model,[],Y);
gams = model.gamcsa; try sig2s = model.kernel_parscsa; catch, sig2s = [];end
%initialize: no incremental memory allocation
costs = zeros(L,length(gams));
block_size = floor(nb_data/L);
% check whether there are more than one gamma or sigma
for j =1:numel(gams)
if strcmp(model.kernel_type,'RBF_kernel') || strcmp(model.kernel_type,'RBF4_kernel')
model = changelssvm(changelssvm(model,'gam',gams(j)),'kernel_pars',sig2s(j));
elseif strcmp(model.kernel_type,'lin_kernel')
model = changelssvm(model,'gam',gams(j));
elseif strcmp(model.kernel_type,'poly_kernel')
model = changelssvm(changelssvm(model,'gam',gams(j)),'kernel_pars',[sig2s(1,j);sig2s(2,j)]);
else
model = changelssvm(changelssvm(model,'gam',gams(j)),'kernel_pars',[sig2s(1,j);sig2s(2,j);sig2s(3,j)]);
end
% calculate matrix for LS-SVM once for the entire data
S = ones(nb_data,1);
Inb = eye(nb_data);
K = kernel_matrix2(omega,model.kernel_type,model.kernel_pars,d);
Atot = K+Inb./model.gam;
% Cholesky factor
try R = chol(Atot);
% Solve full system
q = R\(R'\[py S]);
p = q(:,2); q = q(:,1);
s = 1/sum(p);
bias = s*sum(q);
alpha = q - p*bias;
% Two expensive steps yet more efficient that using LINSOLVE on each fold
Ri = R\Inb;
C = Ri*Ri' - s*(p)*p';
catch %R = cholinc(sparse(Atot),1e-5);
A = [K+Inb./model.gam S; S' 0];
C = pinv(A);
alpha = C*[py;0];
%bias = alpha(nb_data+1);
alpha = alpha(1:nb_data);
end
% start loop over l validations
for l = 1:L,
% divide data in validation set and trainings data set
if l==L,
%%train = 1:block_size*(l-1); % not used
validation = block_size*(l-1)+1:nb_data;
else
%%train = [1:block_size*(l-1) block_size*l+1:nb_data]; % not used
validation = block_size*(l-1)+1:block_size*l;
end
% Submatrix of C to compute residuals for the l-th fold left out
Ckk = C(validation,validation);
% Solution of small linear system (block_size x block_size)
try % faster
Rkk = chol(Ckk+eps);
betak = Rkk\(Rkk'\alpha(validation));
catch
betak = Ckk\alpha(validation);
end
% latent outputs for validation
yh = py(validation) - betak;
[~,yh] = postlssvm(model,[],yh);
if ~(model.type(1)=='c')
costs(l,j) = feval(estfct,yh - Y(validation,:));
else
costs(l,j) = feval(estfct,Y(validation,:),sign(yh));
end
end
end
cost = feval(combinefct, costs);
3 仿真结果
4 参考文献
[1]梁耀东, 栾元重, 刘方雨,等. 基于改进自适应粒子群算法的混合核函数最小二乘支持向量机大坝变形预测.
[2]刘晓勇. "基于GA与SVM融合的网页分类算法." 中国运筹学会模糊信息与模糊工程分会第五届学术年会 0.
部分理论引用网络文献,若有侵权联系博主删除。
5 完整MATLAB代码与数据下载地址
见博客主页头条