Weighted optimization for CP tensor decomposition with incomplete data
We explain how to use cp_wopt with the POBLANO toolbox.
Contents
- Create an example problem with missing data.
- Create initial guess using 'nvecs'
- Set up the optimization parameters
- Call the cp_wopt method
- Check the output
- Evaluate the output
- Create a SPARSE example problem with missing data.
- Create initial guess using 'nvecs'
- Set up the optimization parameters
- Call the cp_wopt method
- Check the output
- Evaluate the output
Create an example problem with missing data.
Here we have 25% missing data and 10% noise.
R = 2; info = create_problem('Size', [15 10 5], 'Num_Factors', R, ... 'M', 0.25, 'Noise', 0.10); X = info.Data; P = info.Pattern; M_true= info.Soln;
Create initial guess using 'nvecs'
M_init = create_guess('Data', X, 'Num_Factors', R, ... 'Factor_Generator', 'nvecs');
Set up the optimization parameters
It's genearlly a good idea to consider the parameters of the optimization method. The default options may be either too stringent or not stringent enough. The most important options to consider are detailed here.
% Get the defaults ncg_opts = ncg('defaults'); % Tighten the stop tolerance (norm of gradient). This is often too large. ncg_opts.StopTol = 1.0e-6; % Tighten relative change in function value tolearnce. This is often too large. ncg_opts.RelFuncTol = 1.0e-20; % Increase the number of iterations. ncg_opts.MaxIters = 10^4; % Only display every 10th iteration ncg_opts.DisplayIters = 10; % Display the final set of options ncg_opts
ncg_opts =
Display: 'iter'
DisplayIters: 10
LineSearch_ftol: 1.0000e-004
LineSearch_gtol: 0.0100
LineSearch_initialstep: 1
LineSearch_maxfev: 20
LineSearch_method: 'more-thuente'
LineSearch_stpmax: 1.0000e+015
LineSearch_stpmin: 1.0000e-015
LineSearch_xtol: 1.0000e-015
MaxFuncEvals: 10000
MaxIters: 10000
RelFuncTol: 1.0000e-020
RestartIters: 20
RestartNW: 0
RestartNWTol: 0.1000
StopTol: 1.0000e-006
TraceFunc: 0
TraceFuncEvals: 0
TraceGrad: 0
TraceGradNorm: 0
TraceRelFunc: 0
TraceX: 0
Update: 'PR'
Call the cp_wopt method
Here is an example call to the cp_opt method. By default, each iteration prints the least squares fit function value (being minimized) and the norm of the gradient. The meaning of any line search warnings can be checked via doc cvsrch.
[M,~,output] = cp_wopt(X, P, R, 'init', M_init, ... 'alg', 'ncg', 'alg_options', ncg_opts);
Iter FuncEvals F(X) ||G(X)||/N
------ --------- ---------------- ----------------
0 1 38.81030020 0.21627555
10 43 0.48416976 0.01540206
20 68 0.41476299 0.00074832
30 88 0.41463604 0.00005718
40 108 0.41463533 0.00000281
46 120 0.41463533 0.00000084
Check the output
It's important to check the output of the optimization method. In particular, it's worthwhile to check the exit flag. A zero (0) indicates successful termination with the gradient smaller than the specified StopTol, and a three (3) indicates a successful termination where the change in function value is less than RelFuncTol. The meaning of any other flags can be checked via doc poblano_params.
exitflag = output.ExitFlag
exitflag =
0
Evaluate the output
We can "score" the similarity of the model computed by CP and compare that with the truth. The score function on ktensor's gives a score in [0,1] with 1 indicating a perfect match. Because we have noise, we do not expect the fit to be perfect. See doc score for more details.
scr = score(M,M_true)
scr =
0.9946
Create a SPARSE example problem with missing data.
Here we have 95% missing data and 10% noise.
R = 2; info = create_problem('Size', [150 100 50], 'Num_Factors', R, ... 'M', 0.95, 'Sparse_M', true, 'Noise', 0.10); X = info.Data; P = info.Pattern; M_true= info.Soln;
Create initial guess using 'nvecs'
M_init = create_guess('Data', X, 'Num_Factors', R, ... 'Factor_Generator', 'nvecs');
Set up the optimization parameters
It's genearlly a good idea to consider the parameters of the optimization method. The default options may be either too stringent or not stringent enough. The most important options to consider are detailed here.
% Get the defaults ncg_opts = ncg('defaults'); % Tighten the stop tolerance (norm of gradient). This is often too large. ncg_opts.StopTol = 1.0e-6; % Tighten relative change in function value tolearnce. This is often too large. ncg_opts.RelFuncTol = 1.0e-20; % Increase the number of iterations. ncg_opts.MaxIters = 10^4; % Only display every 10th iteration ncg_opts.DisplayIters = 10; % Display the final set of options ncg_opts
ncg_opts =
Display: 'iter'
DisplayIters: 10
LineSearch_ftol: 1.0000e-004
LineSearch_gtol: 0.0100
LineSearch_initialstep: 1
LineSearch_maxfev: 20
LineSearch_method: 'more-thuente'
LineSearch_stpmax: 1.0000e+015
LineSearch_stpmin: 1.0000e-015
LineSearch_xtol: 1.0000e-015
MaxFuncEvals: 10000
MaxIters: 10000
RelFuncTol: 1.0000e-020
RestartIters: 20
RestartNW: 0
RestartNWTol: 0.1000
StopTol: 1.0000e-006
TraceFunc: 0
TraceFuncEvals: 0
TraceGrad: 0
TraceGradNorm: 0
TraceRelFunc: 0
TraceX: 0
Update: 'PR'
Call the cp_wopt method
Here is an example call to the cp_opt method. By default, each iteration prints the least squares fit function value (being minimized) and the norm of the gradient. The meaning of any line search warnings can be checked via doc cvsrch.
[M,~,output] = cp_wopt(X, P, R, 'init', M_init, ... 'alg', 'ncg', 'alg_options', ncg_opts);
Iter FuncEvals F(X) ||G(X)||/N
------ --------- ---------------- ----------------
0 1 1379.93609993 0.02752852
10 58 132.46275289 0.01902211
20 90 128.70825353 0.00582068
30 129 126.41868561 0.01373756
40 170 119.98598848 0.00928409
50 207 117.10117490 0.01090024
60 245 114.67006514 0.00842761
70 278 113.22261378 0.00612126
80 318 111.97918998 0.00674140
90 352 110.85741565 0.00471266
100 390 110.07596824 0.00504442
110 424 109.22542801 0.00857797
120 474 96.41782444 0.02745573
130 521 64.42518274 0.03921470
140 574 29.96112250 0.04534468
150 617 14.85584990 0.01242338
160 654 13.54323681 0.00849761
170 683 13.42992225 0.00257693
180 708 13.38722877 0.00084105
190 729 13.38200130 0.00078035
200 749 13.38058904 0.00038853
210 769 13.38020522 0.00021774
220 789 13.38001729 0.00008396
230 809 13.37997848 0.00007880
240 829 13.37996288 0.00003219
250 849 13.37995884 0.00002386
260 869 13.37995720 0.00001138
270 889 13.37995698 0.00000356
280 909 13.37995692 0.00000097
Check the output
It's important to check the output of the optimization method. In particular, it's worthwhile to check the exit flag. A zero (0) indicates successful termination with the gradient smaller than the specified StopTol, and a three (3) indicates a successful termination where the change in function value is less than RelFuncTol. The meaning of any other flags can be checked via doc poblano_params.
exitflag = output.ExitFlag
exitflag =
0
Evaluate the output
We can "score" the similarity of the model computed by CP and compare that with the truth. The score function on ktensor's gives a score in [0,1] with 1 indicating a perfect match. Because we have noise, we do not expect the fit to be perfect. See doc score for more details.
scr = score(M,M_true)
scr =
0.9989