MARSS package

1. observation equation:

$\mathbf{y}_t = \mathbf{Z}_t \mathbf{x}_{t} + \mathbf{a}_t + \mathbf{D}_t \mathbf{d}_t + \mathbf{H}_t \mathbf{v}_t\text{, where } \mathbf{v}_t \sim \text{MVN} (0, \mathbf{R}_t)$

includes $n$ measurements, $q$ dimensional inputs, and $n \times h$ dimensional input noise transformation matrix, where,

$\mathbf{y}$ is a $n \times T$ matrix of the observations. Some observations may be missing.
$\mathbf{Z}_t$ is a $n \times m$ matrix of parameters.
$\mathbf{a}_t$ is a $n \times 1$ column vector of parameters.
$\mathbf{D}_t$ is a $ n\times q$ matrix of parameters.
$\mathbf{d}_t $ is an input and a $p \times 1$ column vector.
$\mathbf{H}_t$ is an input and a $n \times h$ matrix.
$\mathbf{v}_t$ is a $n \times 1$ column vector of the non-process errors. The observation errors at time $t$ are multivariate normal with mean $0$ and covariance matrix $\mathbf{R}_t$.

2. state equation:

$\mathbf{x}_t = \mathbf{B}_t \mathbf{x}_{t-1} + \mathbf{u}_t + \mathbf{C}_t \mathbf{c}_t + \mathbf{G}_t \mathbf{w}_t\text{, where } \mathbf{w}_t \sim \text{MVN} (0, \mathbf{Q}_t)$

includes $m$ states, $p$ dimensional inputs, and $m \times g$ dimensional input noise transformation matrix, where,

$\mathbf{x}$ is a $m \times T$ matrix of states. Each $\mathbf{x}_t$ is a realization of the random variable $\mathbf{X}_t$ at time $t$.
$\mathbf{B}_t$ is a $m \times m$ matrix of parameters.
$\mathbf{u}_t$ is a $m \times 1$ column vector of parameters.
$\mathbf{C}_t$ is a $m \times p$ matrix of parameters.
$\mathbf{c}_t$ is an input and a $p \times 1$ column vector.
$\mathbf{G}_t$ is an input and a $m \times g$ matrix.
$\mathbf{w}_t$ is a $m \times 1$ column vector of the process errors. The process errors at time $t$ are multivariate normal with mean $0$ and covariance matrix $\mathbf{Q}_t$.

3. initial values

$\mathbf{x}_1 \sim \text{MVN}(\pi, \Lambda)\ \ \text{or} \ \ \mathbf{x}_0 \sim \text{MVN}(\pi, \Lambda)$

$\pi$ is either a parameter or a fixed prior. It is a $m\times 1$ matrix.
$\Lambda$ is either a parameter or a fixed prior. It is a $m\times m$ variance-covariance matrix.

$\mathbf{Z}_t, \mathbf{a}_t, \mathbf{D}_t, \mathbf{H}_t, \mathbf{R}_t$ in measurement equation and $\mathbf{B}_t, \mathbf{u}_t, \mathbf{C}_t, \mathbf{G}_t, \mathbf{Q}_t$ in state equation are system matrices that are often assumed to be not random. Although they may be time-variant, their values can be predicted.

System matrices (except $ \mathbf{a}_t$ and $\mathbf{u}_t$) are called hyperparameters if they depend on an unknown parameter and the objective of state space models is estimating these hyperparameters. These hyperparameters will influence the statistical features of the model.

Usually $\mathbf{a}_t$ and $\mathbf{u}_t$ only affect the expectations of $x_t$ and $y_t$. If they are the linear functions of unknown variables or the lag of $y_t$, they can become parts of state variables or hyperparameters.

When a model is written in a state space form, it can be solved by some algorithms and the Kalman filter is the core of these algorithms. Kalman filters make use of the information at time $t$ to make the most convincing predictions. They assume that noises and initial states are normally distributed, and calculate likelihood functions based on decomposition of error. Besides, they update the states when there is new observation.

Durbin’s book

linear Gaussian state space model

$\begin{align*} \tag{1} y_t &= Z_t \alpha_t + \varepsilon_t, \ \ \ \ &\varepsilon_t \sim \text{N}(0, H_t) \\ \alpha_{t+1} &= T_t \alpha_t + R_t \eta_t, \ \ \ \ &\eta_t \sim \text{N}(0, Q_t) \\ \end{align*}$

$y_t$ is a $p\times 1$ observation vector.
$a_t$ is an unobserved $m\times 1$ state vector.
$Z_t$, $T_t$, $R_t$, $H_t$ and $Q_t$ are initially assumed to be known. $Z_t$ and $T_{t-1}$ can depend on $y_1, \dots, y_{t-1}$.
Error terms $\varepsilon_t$ and $\eta_t$ are assumed to be serially and mutually independent.
Initial state vector $\alpha_1$ is assumed to be $\text{N}(a_1, P_1)$ independently of $\varepsilon_1, \dots, \varepsilon_n$ and $\eta_1, \dots, \eta_n$.

The general linear state space model is the same as $(1)$ except that the normality assumption is dropped.

Define $Y_T$ as the information set that can be used at $t=T$, i.e. $Y_T = (y_1^{‘},y_2^{‘}, \dots, y_T^{‘})^{‘}$. The estimation of states can be divided into three cases depending on how much information we have:

If $t>T$, it is an estimation of future states and called prediction;
If $t=T$, it is an estimation of current states and called filtering;
If $t<T$, it is an estimation of previous states and called smoothing.

Durbin notation notice:

$a_t, P_t$ without the $Y_t$ conditions are prior estimation; $v_t$ is also a prior estimation.

Wikipedia

Predict

Predicted (a priori) state estimate

$\hat{\alpha}_{t|t-1}=T_t \hat{\alpha}_{t-1|t-1} \\ (a_{t|t-1} = T_t a_{t-1|t-1})$

Predicted (a priori) estimate covariance

$P_{t|t-1} = T_t P_{t-1|t-1} T_t^\intercal + Q_k$

I. Predicted state = state equation (last updated state)

Update

Innovation or measurement pre-fit residual

$v_{t|t-1} = y_t - Z_t \hat{\alpha}_{t|t-1}$

Innovation (or pre-fit residual) covariance

$F_t = Var(v_{t|t-1}) = Z_t P_{t|t-1} Z_t^\intercal + H_t$

II. Pre-fit residual = measurement - observation equation (predicted state)

Optimal Kalman gain

$K_t = P_{t|t-1} Z_t^\intercal F_t^{-1}$

Updated (a posteriori) state estimate

$\hat{\alpha}_{t|t} = \hat{\alpha}_{t|t-1} + K_t v_{t|t-1}$

Updated (a posteriori) estimate covariance

$P_{t|t} = (I-K_t Z_t) P_{t|t-1}$

Measurement post-fit residual

$v_{t|t} = y_t - Z_t \hat{\alpha}_{t|t}$

Tutorial by Alex Becker

The $\alpha-\beta-\gamma$ Filter

EXAMPLE 1 WEIGHTING THE GOLD

The overall logic of state update equation:

$\hat{x}_{n,n}= \hat{x}_{n,n-1}+ \alpha _{n} \left( z_{n}-\hat{x}_{n,n-1} \right)$

The term $\left( z_{n}- x_{n,n-1} \right)$ is a measurement residual, also called the innovation. The innovation contains the newest information.

Before we make the first measurement, we can guess (or rough estimate) the gold bar weight simply by reading the stamp on the gold bar. It is called the Initial Guess and it will be our first estimate.

As we will see later, the Kalman Filter requires the initial guess as a preset, and it can be very rough.

The estimation algorithm:

EXAMPLE 2 TRACKING THE CONSTANT VELOCITY AIRCRAFT IN ONE DIMENSION

Equation1: State Extrapolation Equation (also called a Transition Equation, a Prediction Equation, a Dynamic Model, or a State Space Model) : the state equation in the standard form of Kalman Filters to predict the states.

Equation2: State Update Equation (also called a Filtering Equation): Equation including Kalman Gain that used to update the prediction into final estimate based on the new observation.

Kalman Filter in one dimension

The differences between the measurements (blue samples) and the true value (green line) are measurement errors. Since the measurement errors are random, we can describe them by variance ( $\sigma^2$ ). The variance of the measurement errors could be provided by the scale vendor or can be derived by calibration procedure. The variance of the measurement errors is actually the measurement uncertainty.

Note: In some literature, the measurement uncertainty is also called the measurement error.

We will denote the measurement uncertainty by $r$ .

The difference between the estimate (the red line) and the true value (green line) is the estimate error. As you can see the estimate error becomes smaller and smaller as we make more measurements, and it converges towards zero, while the estimated value converges towards the true value. We don’t know what the estimate error is, but we can estimate the uncertainty in estimate.

Intuitive derivation of the Equation3: Kalman Gain Equation:

$K_{n}= \frac{Uncertainty \quad in \quad Estimate}{Uncertainty \quad in \quad Estimate \quad + \quad Uncertainty \quad in \quad Measurement}= \frac{p_{n,n-1}}{p_{n,n-1}+r_{n}}$

where $p_{n,n-1}$ is the extrapolated estimate uncertainty, and $r_n$ is the measurement uncertainty.

Derivation is to minimize $p_{n,n} = k_{1}^{2}r_{n} + (1 - k_{1})^{2}p_{n,n-1}$. Visit here.

Let’s rewrite the state update equation:

$\hat{x}_{n,n}=~ \hat{x}_{n,n-1}+ K_{n} \left( z_{n}- \hat{x}_{n,n-1} \right) = \left( 1-K_{n} \right) \hat{x}_{n,n-1}+ K_{n}z_{n}$

As you can see the Kalman Gain ($K_n$) is the weight that we give to the measurement. And $(1âˆ’K_n)$ is the weight that we give to the estimate.

The Kalman gain tells you how much I want to change my estimate by given a measurement.

THE ESTIMATE UNCERTAINTY UPDATE IN 1D

The following equation defines the estimate uncertainty update:

$p_{n,n}=~ \left( 1-K_{n} \right) p_{n,n-1}$

where $p_{n,n-1}$ is the estimate uncertainty that was calculated during the previous filter estimation, and $p_{n,n}$ is the estimate uncertainty of the current sate.

This equation updates the estimate uncertainty of the current state. It is called the Equation4: Covariance Update Equation. Why covariance? We will see it the following chapters.

It is quite clear from the equation that the estimate uncertainty is always getting smaller with each filter iteration, since $(1-K_n)\le 1$ . When the measurement uncertainty is large, then the Kalman gain will be low, therefore, the convergence of the estimate uncertainty would be slow. However, when the measurement uncertainty is small, then the Kalman gain will be high and the estimate uncertainty would quickly converge towards zero.

THE ESTIMATE UNCERTAINTY EXTRAPOLATION IN 1D

Example:

state predict equation

$\hat{x}_{n+1,n}= \hat{x}_{n,n}+ \Delta t\hat{\dot{x}}_{n,n} \\ \hat{\dot{x}}_{n+1,n}= \hat{\dot{x}}_{n,n}$

The estimate uncertainty extrapolation equation is called Equation5: Covariance Extrapolation Equation :

$p_{n+1,n}^{x}= p_{n,n}^{x} + \Delta t^{2} \cdot p_{n,n}^{v} \\ p_{n+1,n}^{v}= p_{n,n}^{v}$

PUTTING ALL TOGETHER

The filter inputs are:

Initialization
The initialization performed only once, and it provides two parameters:
- Initial System State ( $\hat x_{1,0}$ )
- Initial State Uncertainty ( $p_{1,0}$ )
The initialization parameters can be provided by another system, another process (for instance, search process in radar) or educated guess based on experience or theoretical knowledge. Even if the initialization parameters are not precise, the Kalman filter will be able to converge close to the real value.
Measurement
The measurement is performed for every filter cycle, and it provides two parameters:
- Measured System State ( $z_n$ )
- Measurement Uncertainty ( $r_n$ )

In addition to the measured value, the Kalman filter requires the measurement uncertainty parameters. Usually, this parameter is provided by equipment vendor, or it can be derived by measurement equipment calibration. The radar measurement uncertainty depends on several parameters such as SNR (Signal to Nose Ratio), beam width, bandwidth, time on target, clock stability and more. Every radar measurement has different SNR, beam width and time on target. Therefore, that radar calculates the measurement uncertainty for each measurement and reports it to the tracker.

The filter outputs are:

System State Estimate ( $\hat x_{n,n}$ )
Estimate Uncertainty ( $p_{n,n}$ )

In addition to the System State Estimate the Kalman filter also provides the Estimate Uncertainty! We have a merit of the estimate precision. As I’ve already mentioned, the estimate uncertainty is given by:

$p_{n,n}=~ \left( 1-K_{n} \right) p_{n,n-1}$

and $p_{n,n}$ is always getting smaller with each filter iteration, since $\left( 1-K_{n} \right) \leq 1$.

So it is up to us to decide how many measurements to take. If we are measuring the building height, and we are interested in precision of 3 centimeters ( $\sigma$ ), we will make the measurements until the Estimation Uncertainty ( $\sigma^2$ ) is less than 9 centimeters.

Process error is added to covariance extrapolation equation. [This can be understood because it cannot be added to state extrapolation equation because we should not manually add an error in the prediction. However, we know that there will be an added variance to our prediction.]

EXAMPLE 7 – ESTIMATING THE TEMPERATURE OF THE HEATING LIQUID Ⅰ

There are two reasons that causing lag error in our Kalman Filter example:

The dynamic model doesn’t fit the case. [The true value has a linear trend while the dynamic model is a constant, which shows that Kalman Filters can perform well only when the dynamic model is reasonable.] If we know that the liquid temperature can change linearly, we can define a new model that takes into account a possible linear change in the liquid temperature. We did it in Example 4. This is a preferred method. However, if the temperature change can’t be modeled, this method won’t improve the Kalman Filter performance.
The process model reliability. We have chosen very low process noise $(q=0.0001)$ while the real temperature fluctuations are much bigger. On the other hand, since our model is not well defined, we can adjust the process model reliability by increasing the process noise $(q)$.

We can get rid of the lag error by setting the high process uncertainty. However, since our model is not well defined, we get noisy estimates that are almost equal to the measurements, and we miss the goal of the Kalman Filter.

The best Kalman Filter implementation shall involve the model that is very close to reality leaving a small space for the process noise. However, the precise model is not always available, for example the airplane pilot can decide to perform a sudden maneuver that will change predicted airplane trajectory. In this case, the process noise shall be increased.

Multidimensional Kalman Filter

Variance Rule

$V(X) = E(X^{2}) - \mu_{X}^{2}$
$COV(X,Y) = E(XY) - \mu_{X}\mu_{Y}$; if $X$ and $Y$ are independent, then $COV(X,Y) = 0$
$V(X \pm Y) = V(X) + V(Y) \pm 2COV(X,Y)$
$V(XY) \neq V(X)V(Y)$
$COV(\boldsymbol{x}) = E \left( \left( \boldsymbol{x - \mu_{x}} \right) \left( \boldsymbol{x - \mu_{x}} \right)^{T} \right)$

proof

State Extrapolation Equation

$\boldsymbol{\hat{x}_{n+1,n}=F\hat{x}_{n,n}+G\hat{u}_{n,n}+w_{n}}$

where,

$\boldsymbol{\hat{x}_{n+1,n}}$ is a predicted system state vector at time step $n + 1$
$\boldsymbol{\hat{x}_{n,n}}$ is an estimated system state vector at time step $n$
$\boldsymbol{\hat{u}_{n,n}}$ is a control variable or input variable - a measurable (deterministic) input to the system [so it will not count to covariance]
$\boldsymbol{w_{n}}$ is a process noise or disturbance - an unmeasurable input that affects the state
$\boldsymbol{F}$ is a state transition matrix
$\boldsymbol{G}$ is a control matrix or input transition matrix (mapping control to state variables)

Note: The process noise $w_n$ do not typically appear directly in the State Extrapolation Equation. Instead, this term is used to model the uncertainty in Covariance Extrapolation Equation.

Modeling linear dynamic systems

TBA

Covariance Extrapolation Equation

$\boldsymbol{P_{n+1,n} = FP_{n,n}F^{T} + Q}$

where,

$\boldsymbol{P_{n,n}}$ is an estimate uncertainty (covariance) matrix of the current sate
$ \boldsymbol{P_{n+1,n}}$ is a predicted estimate uncertainty (covariance) matrix for the next state
$\boldsymbol{F}$ is a state transition matrix that we’ve derived in “Modeling linear dynamic systems” section
$\boldsymbol{Q}$ is a process noise matrix

Constructing The Process Noise Matrix $Q$

We’ve seen that the process noise variance has a critical influence on the Kalman Filter performance. Too small $q$ causes a lag error (see Example 7). If the $q$ value is too large, the Kalman Filter will follow the measurements (see Example 8) and produce noisy estimations.

[Function of Q: add an additional noise to state covariance, preventing state covariance from being too small and Kalman Gain from getting close to 0.]

The process noise can be independent between different state variables. In this case, the process noise is a covariance matrix $\boldsymbol{Q}$ is a diagonal matrix:

$\boldsymbol{Q} = \left[ \begin{matrix} q_{11} & 0 & \cdots & 0 \\ 0 & q_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & q_{kk} \\ \end{matrix} \right]$

The process noise can also be dependent. In this case the process noise is corelated between the state variables.

Discrete noise model

TBA

Continuous noise model

TBA

Measurement Equation

$\boldsymbol{z_{n} = Hx_{n} + v_{n}}$

where,

$\boldsymbol{z_{n}}$ is a measurement vector
$\boldsymbol{x_{n}}$ is a true system state (hidden state)
$\boldsymbol{v_{n}}$ a random noise vector
$\boldsymbol{H}$ is an observation matrix

Usage:

Scaling: $\boldsymbol{H}$ is a constant
State selection: only a part of the states can be measured
Combination of states: multiple states with less outputs

Interim Summary - Uncertainty

The terms $w$ and $v$ which correspond to the process and measurement noise vectors are interesting in that they do not typically appear directly in the equations of interest, since they are unknown.

Instead, these terms are used to model the uncertainty (or noise) in the equations themselves.

P.S. I should say these equations are more useful in derivation of Kalman Gain instead of practical use. I think one of the reasons is that we never know the true $w$, $v$, and $e$.

The process noise uncertainty is given by:

$\boldsymbol{Q_{n}} = E\left( \boldsymbol{w_{n}w_{n}^{T}} \right)$

where,

$\boldsymbol{Q_{n}}$ is a covariance matrix of the process noise
$\boldsymbol{w_{n}}$ is a process noise

The measurement uncertainty is given by:

$\boldsymbol{R_{n}} = E\left( \boldsymbol{v_{n}v_{n}^{T}} \right)$

where,

$\boldsymbol{R_{n}}$ is a covariance matrix of the measurement
$\boldsymbol{v_{n}}$ is a measurement error

The estimation uncertainty is given by:

$\boldsymbol{P_{n,n}} = E\left( \boldsymbol{e_{n}e_{n}^{T}} \right) = E\left( \left( \boldsymbol{x_{n} - \hat{x}_{n,n}} \right) \left( \boldsymbol{x_{n} - \hat{x}_{n,n}} \right)^{T} \right)$

where,

$\boldsymbol{P_{n,n}}$ is a covariance matrix of the estimation error
$\boldsymbol{e_{n}}$ is an estimation error
$\boldsymbol{x_{n}}$ is a true system state (hidden state)
$\boldsymbol{\hat{x}_{n,n}}$ is an estimated system state vector at time step $n$

State Update Equation

The State Update Equation in the matrix form is given by:

$\boldsymbol{\hat{x}_{n,n} = \hat{x}_{n,n-1} + K_{n} ( z_{n} - H \hat{x}_{n,n-1} )}$

where,

$\boldsymbol{\hat{x}_{n,n}}$ is a estimated system state vector at time step $n$
$\boldsymbol{\hat{x}_{n,n-1}}$ is a predicted system state vector at time step $n−1$
$\boldsymbol{K_{n}}$ is a Kalman Gain
$\boldsymbol{z_{n}}$ is a measurement
$\boldsymbol{H}$ is an observation matrix

Covariance update equation

The Covariance Update Equation is given by:

$\boldsymbol{ P_{n,n} = \left( I - K_{n}H \right) P_{n,n-1} \left( I - K_{n}H \right)^{T} + K_{n}R_{n}K_{n}^{T} }$

where,

$\boldsymbol{P_{n,n} }$ is an estimate uncertainty (covariance) matrix of the current sate
$\boldsymbol{P_{n,n-1}}$ is a prior estimate uncertainty (covariance) matrix of the current sate (predicted at the previous state)
$\boldsymbol{K_{n}}$ is a Kalman Gain
$\boldsymbol{H}$ is an observation matrix
$\boldsymbol{R_{n}}$ is a Measurement Uncertainty (measurement noise covariance matrix)

In many textbooks, you will find a simplified form of the Covariance Update Equation:

$\boldsymbol{ P_{n,n} = \left( I - K_{n}H \right) P_{n,n-1} }$

The Kalman Gain

The Kalman Gain in matrix notation is given by:

$\boldsymbol{ K_{n} = P_{n,n-1}H^{T}\left( HP_{n,n-1}H^{T} + R_{n} \right)^{-1} }$

where,

$\boldsymbol{K_{n} }$ is the Kalman Gain
$\boldsymbol{P_{n,n-1}}$ is a prior estimate uncertainty (covariance) matrix of the current sate (predicted at the previous state)
$\boldsymbol{H}$ is an observation matrix
$\boldsymbol{R_{n}}$ is a Measurement Uncertainty (measurement noise covariance matrix)