## Convex Optimization
I had the pleasure to be invited to attend the German OR conference last week in Aachen. There were many highlights at this conference beside the great venue and excellent organization led by Marco Lübbecke, see for instance Mike Trick on Laur Convex optimization is a generalization of linear programming to other problem classes including second order cone programming and semi definite programming, with nice duality theorems. It has applications to operations research, machine learning, control, signal processing, etc. I won't elaborate more on convex optimization in general, except for few remarks triggered by Stephen's presentation. Readers can read Stephen's textbook to learn more. First of all, I really liked it as it was reinforcing the view that machine learning and optimization are tighltly linked. Second, Stephen made it clear that differentiability of functions appearing in constraints or objective isn't required anymore. This is an important change compared to the early days where gradient methods (eg Newton method) were used to solve convex problems. Gradient methods are not applicable to non differentiable functions like absolute value, of L1 norm. Nowadays, interior point algorithms are the method of choice for solving convex optimization problems. These algorithms do not require functions to be differentiable. The use of non differentiable functions enables new statistical regression techniques, such as the Lasso regression, which is quite powerful as it favors regressions with fewer non zero variables.
Third, Stephen has shown that for specific problem classes it is possible to generate code for specialized solvers that outperform a general solver like CVX by several orders of magnitude. This made me think of CPLEX. Could we get similar speedup by code generating specialized solvers for specific problem classes? I don't know actually, but it could be the case for one good reason: we do not tune CPLEX for easy problems (read, problems solvable in less than a second). We don't do it because we don't have much requests for it, probably because slower and cheaper solvers are good enough in that case. Another reason for the lack of such demand is analyzed in my Opti Last, Stephen discussed the fact that decomposition methods work well for convex problems. It is possible to distribute very large problems on several machines by cutting them in pieces with each machine receiving only a subset of the original problem. Variables appearing in more than one sub problem are replaced by local copies. Then an iterative method is guaranteed to converge under very general hypothesis. At each step each machine receives partial information (the average value over all machines taken by each local copy of global variables). Each machine then solves a modification of the sub problems it was assigned to. The modification adds terms in the objective that penalize deviations of the value of the local variable copies from the average value across all machines. Convexitiy of the original problem ensures that this method will converge. The above were the point that stroke me during the talk. There is much more to it, and I encourage readers to learn more via Stephen's course. |