There are many different filter types available, but which is the right type to use on a particular problem? There's no authoritative flow diagram for this, but we can discuss the advantages of different architectures and their assumptions.
Since they're the workhorse of many different estimation problems across a wide variety of disciplines, we'll begin with Kalman filters (including extended Kalman filters). These are an excellent choice when:
Kalman filters are extremely fast, are well studied, and have options for many different features, such as allowing for scalar updates (and therefore requiring no matrix inversion) or adding consider covariance.
If it's not reasonable to construct an analytical Jacobian for an extended Kalman filter, then it's generally better to use an unscented Kalman filter. It's rarely useful to use finite-differencing to produce the Jacobians needed for an extended Kalman filter.
Though less common, information filters work very similarly to Kalman filters and require essentially the same assumptions. Whereas a generic Kalman filter requires the inversion of an nz-by-nz matrix, where nz is the dimension of the measurement vector, information filters require the inversion of an nx-by-nx matrix, where nx is the dimension of the state. Therefore, when the dimension of the measurement is greater than the dimension of the state, information filters can be faster.
Further, information filters, because they work with the information matrix instead of the Kalman filter's covariance matrix, can represent “no knowledge” of the state (the information matrix will be all zeros), whereas the covariance matrix can never actually be infinite. This can be useful in the initialization of a filter.
Unscented filters are similar to Kalman filters, but have a few advantages. They:
However, unscented filters are slower than extended Kalman filters for a similar problem, often by a factor of 1.5 to 5. Compared to Kalman filters that employ sequential updates, unscented filters are far, far slower.
Overall, because it works better than an EKF, requires less setup, and is not too much slower, an unscented Kalman filter is often a great architecture to try early on.
They too have some features which can reduce runtime, such as specifying that the process/measurement noise is additive, and can also take advantage of consider covariance properly.
Particle filters are overwhelmingly more flexible than linear or unscented filters of any kind. They work well with bizarre probability distributions and can work essentially anywhere that any linear or unscented filter can work, and they can work in many places where those others filters cannot work at all.
Their primary drawback is the required runtime; particle filters require thousands of propagations per time step, whereas a similar unscented filter might require only a dozen. Further, because particle filter represent uncertainty with a discrete set of particles, they are not as smooth or accurate as a well-tuned linear or unscented filter, when the linear filter's assumptions apply.
There are several filtering options common to multiple architectures.
Linear filters assume that the Jacobian of the observation function and the measurement noise covariance matrix closely model reality. When those two matrices are calculated using the predicted state from the prior estimate (the default), they will certainly have some error. However, after the Kalman correction takes place, the estimate should have lower error. Therefore, this corrected estimate can now be used to re-calculate the Jacobian/measurement noise covariance matrix, and the prior prediction can be corrected more accurately. This process can repeat any number of times and is called iteration of the filter. Iteration is useful when the Jacobian of the observation function or the measurement noise covariance matrix change meaningfully with changes in the state. It can dramatically increase accuracy in these cases. However, the correction process is the most time-consuming part of a Kalman filter, so the runtime multiplies along with the number of iterations.
“Square-root” filters allow greater numerical stability than their counterparts by representing the covariance not as the straight-forward covariance matrix (or information matrix), but as some type of decomposition of the covariance matrix (or information matrix). This started with actual “matrix square roots”, such that \(S S^T = P\), where \(S\) is the square root and \(P\) is the covariance. By representing the covariance this way, a greater range of values could be maintained in any given word length in a computer (essentially doubling the floating point word length). Now, many different types of decomposition are used, including UDU forms and Cholesky factors. Though the widespread use of the double
type for covariance has reduced the need for square-root filters, they are nonetheless more stable and often incur only a trivial cost in runtime. Their only disadvantage is that the use of the decomposed covariance matrix/information matrix is simply harder to understand when debugging and won't look as familiar to most.
Not every source of uncertainty in a problem can be realistically estimated, either because it's unobservable or because it would make the state vector much too large to run in a reasonable amount of time. The use of consider parameters allows one to “consider” the effects of additional uncertainty, such as the uncertainty in some constant used in the propagation function, without actually estimating it. It's extremely useful in ensuring that a filter is consistent with its assumptions (that its covariance matrix maintains a good representation of the error). When *kf
uses consider covariance in a filter, it does so as a Schmidt-type filter.
Aside from picking the right type and options for a filter, there are many other important aspects in designing a good filter. See Workflow for more.
*kf
v1.0.3