Marginalization in optimization problems
Published:
A general optimization problem can be described as
minor a general linear form
\min_{x}{||y- Ax||^2_W}where x is the optimization variable to be estimated, y is the observation, and W is the weighted matrix.
Consider a special case with W=I. One of the most popular algorithms to solve this problem is to construct a residual for each observation or constraint, and it can be described as
\min_{x} \frac{1}{2} \sum_{i=1}^N ||y_i-f_i(x_i)||where N is the number of constraints. The number of constraints becomes very large when a large number of measurements are observed. As a result, the optimization problem becomes a large-scale problem that is difficult to solve in real-time. A strategy to solve this problem is to apply a sliding window to restrict the number of residual blocks. A general method to implement a sliding window with preserves all previous information is marginalization. The marginalization can be described with Schur’s complement. Consider the variable x= \left[x_m, x_r \right]^T , where x_m is the variable part to be marginalized out, and x_r is the remaining part. The problem can be described as
\min_{x}{||y- Ax||^2}= \min \left[\left(y- Ax \right) ^T \left(y- Ax \right) \right], \left.\left[ \begin{array}{cc} A_m & A_c \\\ A_c^T & A_r \end{array} \right. \right] \left[ \begin{array}{c} x_m \\\ x_r\end{array}\right] = \left[\begin{array}{c} y_m \\\ y_r \end{array} \right],where
A =\left.\left[ \begin{array}{cc} A_m & A_c \\\ A_c^T & A_r \end{array} \right. \right], y=\left[ \begin{array}{c} y_m \\\ y_r\end{array}\right],and then
\begin{bmatrix}I&0 \\\ -A_c^T A_m^{-1}&I\end{bmatrix} \left.\left[ \begin{array}{cc} A_m & A_c \\\ A_c^T & A_r \end{array} \right. \right] \left[ \begin{array}{c} x_m \\\ x_r\end{array}\right] = \begin{bmatrix}I&0 \\\ -A_c^T A_m^{-1}&I\end{bmatrix} \left[\begin{array}{c} y_m \\\ y_r \end{array} \right], \begin{bmatrix}A_m&A_c \\\ 0&A_r-A_c^TA_m^{-1}A_c\end{bmatrix}\begin{bmatrix} x_m \\\ x_r \end{bmatrix}= \begin{bmatrix} y_m \\\ y_r-A_c^TA_m^{-1}y_m \end{bmatrix},and finally, we have
\left(A_r- A_c^T A_m^{-1} A_c \right) x_r= y_r-A_c^TA_m^{-1} y_m.Using the equation involving only x_r above, we can marginalize out x_m from the optimization problem. With A_c^T A_m^{-1} A_c and A_c^TA_m^{-1} y_m in the equation above, we can keep the information of x_m as prior information when calculating x_r. From the above equation, we can calculate the prior information related to x_m.