Averaging rotations

Introduction
Averaging rotations
Parametrizing headings
Arithmetic mean derivation
Averaging rotations using complex numbers
1. Examples
Concluding remarks
References

Introduction

We often take the arithmetic mean, or average to be quite straightforward concept in mathematics. For example, the average of two numbers, 1 and 3 is (1 + 3)/2 = 2. But what about taking the average of two rotations? The answer is not so straightforward, but I'll explain it in this article. However, before diving into averaging rotations, let's first discuss what a rotation is in the context of this article.

Multiple angles can represent the same rotation.

Consider the blue arrow above, which is pointing in a top-right direction. There are many ways to represent the direction of this arrow. For example, it can be represented using the angle $\pi/4$ rotated in a counter clockwise direction from the horizontal, or using another angle such as $-7\pi/4$ in a clockwise direction from the horizontal. In fact, there are infinitely many angles representing the rotation of this arrow. For the purpose of this article, when I refer to a rotation or a heading, then I mean the "arrow direction" (i.e., think "top-right"), and when I refer to an angle, then I mean the number parametrizing the rotation or heading (e.g., $\pi/4$ ).

Reiterating the point above, the same arrow's rotation or heading can be represented using infinitely many angles (e.g., $\pi/4 + k2\pi$ , where $k$ is an integer). However, the heading is often represented using an angle $\theta$ in a Euclidean subspace of length $2\pi$ . For example, $\theta\in[0,2\pi)\sub\mathbb{R}$ or $\theta\in[-\pi, \pi)\sub\mathbb{R}$ .

Averaging rotations

Let's go back to the first question posed at the beginning of the article: what is the average of two rotations? To get some insight into the problem, we'll start by looking at an example.

The arithmetic mean of two angles is not necessarily the true mean of the two rotations.

Consider the blue arrow pointing to the left in the figure above and assume there are two of them (i.e., one laying on top of the other). Then, the average of the two rotations (i.e., the two blue arrows) should be the same (i.e., an arrow pointing to the left). But, what about the average of the two parametrizations? Let's make this interesting by assuming the two rotations are parametrized using the angles $\pi$ and $-\pi$ . The arithmetic mean of these two angles is $(\pi - \pi)/2 = 0$ , which is a wrong answer; it's an arrow pointing to the right, which is shown as the yellow arrow in the figure above.

To further demonstrate the point, consider averaging the same rotations but now using a different angles. Specifically, consider replacing $\pi$ with $3\pi$ , which is still the same rotation represented using the angle $\pi$ . The arithmetic mean is then $(3\pi - \pi)/2 = \pi$ , which is a correct answer. Hmmm... what's going on here?

The examples above demonstrate how the heading representation (i.e., its parameterization) affects the "arithmetic angle average". In this article, we'll dig into the reason behind this behaviour and then a solution is presented that solves the "rotation averaging" problem. The answer uses the mathematics of Lie groups, which are special manifolds that are often used in robotics ^[1], especially when describing rotations. The subject of Lie groups is somewhat abstract and was difficult to learn when I was first introduced to the subject. My attempt in this article is to motivate the usage of Lie groups through the example of rotation averaging. I hope you enjoy the journey.

Parametrizing headings

The first challenge in addressing the rotation averaging problem is to address how a rotation is parametrized. That is, how can a rotation be represented. One of the parametrizations was discussed earlier in this post, which uses real numbers (i.e., $\theta\in\mathbb{ R}$ ). An issue with this parametrization is that the such parametrization is not unique. That is, the same rotation can be represented using different angles. Why is the non-uniqueness an issue in this case?

The problem of non-unique parametrization

The reason that the non-uniqueness of a parametrization may cause issues is related to the notation of distance. For example, given two angles $\theta_{1} = -\pi$ and $\theta_{2} = \pi$ , then the distance from $\theta_{1}$ to $\theta_{2}$ is $\vert \theta_{2} - \theta_{1}\vert = 2\pi$ . However, the two angles represent the same rotation, so the distance should actually $0$ .

The notion of distance is important as many areas of mathematics rely on it. For example, the notion of distance always appears in calculus, which in turn is used all over smooth optimization theory, which is used in many engineering applications. As such, having the correct notion of distance is important when using such mathematical tools.

Bounding the number line

The problem of non-unique parameterization may be dealt with by defining the headings to belong to a continuous subset of length $2\pi$ . For example, the subset ${\theta\in[-\pi,\pi)\subset\mathbb{ R}}$ . Such parametrizations do solve the non-uniqueness problems and are often used in practice.

However, such parametrization is not a vector space, which is a small setback. Specifically, the set is not closed under addition or scaling. That is, given two headings ${\theta_{1}\in[-\pi,\pi)}$ and ${\theta_{2}\in[-\pi,\pi)}$ , then ${\theta_{1} + \theta_{2}\not\in[-\pi,\pi)}$ , in general.

The reason that this is a setback is because many algorithms rely on linear algebra, which assume that the variables belong to a vector space. For example, many numerical optimization algorithms (e.g., Newton's method) rely on such assumption.

This setback will not be an issue if there is a way to keep using the mathematical tools developed using linear algebra while still using the bounded number line ${[-\pi,\pi)}$ . One way to do this is to exploit a surjective mapping from the real number line $\mathbb{ R}$ to the bounded set ${[-\pi,\pi)}$ . This option is explored using complex numbers.

Complex numbers

A different way to represent heading is by representing a point on a unit circle, which is a more natural way to represent headings. This way, if two points lie on the same location on the unit circle then they have the same heading and parametrization.

The complex unit circle

\begin{aligned} S^{1} \coloneqq \left\{z : \vert z \vert = 1\right\} \end{aligned}

is an elegant way to represent headings. It addresses the parameterization uniqueness issue previously discussed, where a heading is uniquely represented using a single complex number $z\in S^{1}$ . The notation $S^{1}$ is used to denote that it's a one-dimensional sphere ^[1].

The unit circle $S^{1}$ is not a vector space, which is the same issue presented when using the set ${[-\pi,\pi)}$ . However, complex numbers tools can be used to remedy this issue. A point $z\in S^{1}$ on the unit circle can be parameterized by

\begin{aligned} z = e^{\jmath \theta}, \end{aligned}

where $\theta\in\mathbb{ R}$ is not a unique number. This allows us to use the linear algebra tools on $\theta\in\mathbb{ R}$ and then map them to the unit circle $S^{1}$ using the mapping (2). For instance, two headings ${\theta_{1},\theta_{2}\in\mathbb{ R}}$ can be added since they are in a vector space $\mathbb{ R},$ and then mapped to the unit circle using the exponential map (2), which is a smooth map. For example, given two headings $z_{1}, z_{2}\in S^{1}$ parameterized by ${\theta_{1},\theta_{2}\in\mathbb{ R}}$ , respectively. Then their added heading is

\begin{aligned} z_{3} = z_{1} z_{2} = e^{\jmath\theta_{1}}e^{\jmath\theta_{2}} = e^{\jmath(\theta_{1} + \theta_{2})}. \end{aligned}

The parameter $\theta_{3}$ such that $z_{3} = e^{\jmath\theta_{3}}$ is computed using the inverse map, which is the logarithm map $\log$ . The logarithm map does not have a unique solution because the exponential map (2) is a surjective map, which means that there's a unique mapping from $\mathbb{ R}$ to $S^{1}$ , but not necessarily the other way around. That is, there may exist two different parameters $\theta_{1}\neq\theta_{2}$ , $\theta_{1}, \theta_{2}\in\mathbb{ R}$ that have the same heading $e^{\jmath\theta_{1}}=e^{\jmath\theta_{2}}$ . Thus, the logarithm map is instead defined as

\begin{aligned} \log(z) = \{\jmath\theta : e^{\jmath\theta} = z\}. \end{aligned}

For example, $\log(1 + 0\jmath) = \jmath 2\pi k$ , where $k\in\mathbb{ Z}$ .

Simplified mappings

The mappings $\mathbb{ R}\to S^{1}$ and $S^{1}\to\mathbb{ R}$ will be used often, so explicitly defining functions representing these mappings will simplify the notation in this article.

Define the $\operatorname{Exp}(\theta):\mathbb{ R}\to S^{1}$ to be

\begin{aligned} \operatorname{Exp}(\theta) \coloneqq e^{\jmath\theta}, \end{aligned}

and $\operatorname{Log}: S^{1}\to[-\pi,\pi)\subset\mathbb{ R}$ to be

\begin{aligned} \operatorname{Log}(z) \coloneqq \log(z)/\jmath, \end{aligned}

where $\operatorname{Log}$ is defined to return the angle in the range $[-\pi,\pi)$ .

Another useful operation is the angle-wrapping operator $\operatorname{Wrap}: \mathbb{ R}\to[-\pi,\pi)$ , which is a mapping that takes any valid angle and wraps it to the range $[-\pi, \pi)$ . For example, $\operatorname{Wrap}(3\pi) = \pi$ . Mathematically, the angle-wrap function can be defined as

\begin{aligned} \operatorname{Wrap}(\theta) \coloneqq \operatorname{Log}(\operatorname{Exp}(\theta)). \end{aligned}

Now that the rotation parametrization using complex numbers is introduced, we can continue the rotation-averaging discussion. But before deriving the rotation averaging algorithm, we will first discuss and derive the arithmetic mean, then expand on the arithmetic mean to apply it on rotations.

Arithmetic mean derivation

In order to derive the rotation averaging algorithm, it helps to dig deeper into the arithmetic mean equation

\begin{aligned} \bar{x} = \frac{1}{m}\sum_{i=1}^{m} x_{i}, \end{aligned}

which is valid for elements in a vector space $x_{i}\in\mathcal{V}$ (e.g., $\mathcal{V} = \mathbb{ R}^{n}$ ). Deriving the arithmetic mean will provide the basis for deriving the rotation averaging algorithm.

One way to think about the mean of a set of elements $\mathcal{X}=\{x_{1}, \ldots, x_{m}\}$ , where $x_{i}\in \mathcal{V}$ , is that it's the number $\bar{x}\in\mathcal{V}$ closest to all the numbers in the set, on average. But what does this statement mean mathematically? There are two points in the above statement that will help in formulating the above statement mathematically: first, the notion "closest" implies the minimum distance, so we need to define the distance on the given set; second, the notion "on average" implies reducing the distance on aggregate (i.e., on all elements), and not on a specific element.

Let's go back to the arithmetic mean to make the example more concrete. The notion of distance for the Euclidean space $\mathbb{ R}^{n}$ is defined as

\begin{aligned} d(x_{1}, x_{2}) = \Vert x_{1} - x_{2}\Vert_{2}. \end{aligned}

Thus, the distance between the mean $\bar{x}$ and the element $x_{i}$ can be defined as

\begin{aligned} d_{i}(\bar{x}) \coloneqq d(\bar{x}, x_{i}) = \Vert \bar{x} - x_{i} \Vert_{2} \eqqcolon \Vert e_{i}(\bar{x}) \Vert_{2}, \end{aligned}

where

\begin{aligned} e_{i}(\bar{x}) \coloneqq \bar{x} - x_{i} \end{aligned}

is referred to as the error function because it's measuring the difference between the mean $\bar{x}$ (i.e., the target variable) and the $i$ th element $x_{i}$ .

The mean $\bar{x}\in\mathcal{V}$ is then the element that minimizes the total distance given by

\begin{aligned} \tilde{J}(\bar{x}) \coloneqq \sum_{i=1}^{m} d_{i}(\bar{x}) = \sum_{i=1}^{m} \Vert e_{i}(\bar{x})\Vert_{2}, \end{aligned}

which can be written mathematically as

\begin{aligned} \bar{x} = \operatorname{arg.\,min}_{x\in\mathcal{V}} \sum_{i=1}^{m} \Vert e(x) \Vert_{2}, \end{aligned}

where $\operatorname{arg.\,min}$ is read as the "argument of the minimum".

Since (11) is an error function, then (13) is a (linear) least squares problem. Without going into details, squaring the summands in (13) simplifies the problem. As such, the objective function (12) is modified to be

\begin{aligned} J(\bar{x}) \coloneqq \sum_{i=1}^{m} d_{i}(\bar{x})^{2} = \sum_{i=1}^{m} e_{i}(\bar{x})^{\mathsf{T}}e_{i}(\bar{x}), \end{aligned}

which in turn changes the optimization problem (13) to be

\begin{aligned} \bar{x} = \operatorname{arg.\,min}_{x\in\mathcal{V}} \sum_{i=1}^{m} e_{i}(x)^{\mathsf{T}}e_{i}(x). \end{aligned}

Equation (16) can be written in lifted form (i.e., using matrices)

\begin{aligned} \bar{x} = \operatorname{arg.\,min}_{x\in\mathcal{V}} \Vert A{x} - b \Vert_{2}^{2}, \end{aligned}

where

\begin{aligned} A = \begin{bmatrix} \mathbf{ 1} & \cdots & \mathbf{ 1} \end{bmatrix}^{\mathsf{T}} \in \mathbb{ R}^{mn \times n}, \; \text{and}\; b = \begin{bmatrix} x_{1}^{\mathsf{T}} & \cdots & x_{m}^{\mathsf{T}} \end{bmatrix}^{\mathsf{T}} \in \mathbb{ R}^{mn}. \end{aligned}

The solution to the least squares problem (16) is ^[2]

\begin{aligned} \bar{x} = (A^{\mathsf{T}}A)^{-1}A^{\mathsf{T}}b = \left(m\mathbf{ 1}\right)^{-1}\left(\sum_{i=1}^{m}x_{i}\right) = \frac{1}{m} \sum_{i=1}^{m} x_{i}, \end{aligned}

which matches (8).

For further reading into least squares and optimization, ^[3] is a classic optimization book.

The generalization of the linear least squares problem is the nonlinear least squares, which is used in deriving the rotation-averaging equation in the next section.

Averaging rotations using complex numbers

The rotation averaging algorithm is an on-manifold nonlinear least squares algorithm implemented, where the manifold in this case is the unit circle $S^{1}$ . The algorithm is very similar to the arithmetic mean equation derived in the previous section, so we'll follow a similar path to derive the rotation averaging algorithm.

As previously discussed, the rotations will be parametrized using complex numbers (2). And as was done with the arithmetic mean, we need a distance metric $\tilde{d}:S^{1}\times S^{1}\to\mathbb{ R}_{\geq0}$ to measure the distance between two rotations, which in this case is given by

\begin{aligned} \tilde{d}(z_{1}, z_{2}) = \Vert \operatorname{Log}(z_{1}z_{2}^{\ast}) \Vert_{2}, \end{aligned}

where $z^{\ast}$ is the complex conjugate, and $\operatorname{Log}$ is defined in (6).

Similar to the error function introduced in (10), the distance between the mean $\bar{z}$ and the $i$ th rotation $z_{i}$ is given by

\begin{aligned} \tilde{d}_{i}(\bar{z}) \coloneqq \tilde{d}(\bar{z}, z_{i}) \eqqcolon \Vert \tilde{e}_{i}(\bar{z}) \Vert_{2}, \end{aligned}

where

\begin{aligned} \tilde{e}_{i}(\bar{z}) = \operatorname{Log}(\bar{z}_{i}z_{i}^{\ast}) \end{aligned}

is the error function between the mean $\bar{z}$ and the $i$ th rotation $z_{i}$ .

It important to note that the error function (21) is smooth, near the minimum, in both arguments. Without the smoothness of the error function, it's not possible to use the nonlinear least squares algorithm, which is used in the following steps.

The rotation average is then the solution to the optimization problem

\begin{aligned} \bar{z} = \operatorname{arg.\,min}_{z\in S^{1}} \sum_{i=1}^{m} \tilde{e}_{i}(z)^{2}. \end{aligned}

The optimization problem (22) looks nice in theory, but unfortunately, it's not very straightforward to solve using complex numbers. As such, we turn to parametrizing the rotations using angles, as was introduced in (2). Specifically, I will use the notation

\begin{aligned} z(\theta) \coloneqq \operatorname{Exp}(\theta). \end{aligned}

Then, the error function (21) becomes

\begin{aligned} e_{i}(\bar{\theta}) = \tilde{e}_{i}(z(\bar{\theta})) = \operatorname{Log}(z(\bar{\theta})z(\theta_{i})^{\ast}) = \operatorname{Log}(\operatorname{Exp}(\bar{\theta} - \theta_{i})) = \operatorname{Wrap}(\bar{\theta} - \theta_{i}). \end{aligned}

The optimization problem (22) becomes

\begin{aligned} \bar{\theta} = \operatorname{arg.\,min}_{\theta\in \mathbb{ R}} \sum_{i=1}^{m} e_{i}(\theta)^{2}. \end{aligned}

The optimization problem (25) is a non-convex nonlinear least squares problem, which can be solved using the various methods such as the Gauss Newton algorithm. The Gauss Newton method is a gradient-based optimization method, which means that it requires and uses the error function Jacobian to iterate over the current solution, until convergence. As such, the Jacobian of the error function (25) is needed.

The Jacobian of the error function (25) can be computed by expanding the error function using its Taylor series approximation. Let $\bar{\theta}$ be the operating point to linearize about, and let $\theta = \bar{\theta} + \delta\theta$ , where $\delta\theta$ is the perturbation. Then, the error function can be expressed as

\begin{aligned} e_{i}(\theta) = e_{i}(\bar{\theta} + \delta\theta) = \operatorname{Log}(\operatorname{Exp}(\bar{\theta} + \delta\theta - \theta_{i})) \approx \operatorname{Log}(\operatorname{Exp}(\bar{\theta}-\theta_{i})) + \operatorname{Log}(\operatorname{Exp}(\delta\theta)) = e_{i}(\bar{\theta}) + \delta\theta. \end{aligned}

As such, the Jacobian is

\begin{aligned} \mathbf{J}_{i} \coloneqq \frac{\mathrm{d}e_{i}(\theta)}{\mathrm{d}\theta} = 1. \end{aligned}

The error functions can be stacked into a single column vector

\begin{aligned} \mathbf{ e}(\theta) = \begin{bmatrix} e_{1}(\theta) & \cdots & e_{m}\end{bmatrix}^{\mathsf{T}} \in \mathbb{ R}^{m}, \end{aligned}

and its Jacobian is given by

\begin{aligned} \mathbf{ J} \coloneqq \frac{\mathrm{d}\mathbf{ e}(\theta)}{\mathrm{d}\theta} = \begin{bmatrix}1 & \cdots & 1\end{bmatrix}^{\mathsf{T}}. \end{aligned}

The Gauss Newton algorithm is an iterative algorithm given by

\begin{aligned} \theta^{k+1} = \theta^{k} + \delta\theta^{k}, \end{aligned}

where $\delta\theta^{k}$ is the search direction (also known as the update step), and is given by solving the system of equations

\begin{aligned} \mathbf{ J}(\theta^{k})^{\mathsf{T}}\mathbf{ J}(\theta^{k})\delta\theta^{k} = -\mathbf{ J}(\theta^{k})^{\mathsf{T}}\mathbf{ e}(\theta^{k}). \end{aligned}

Inserting the error function (28) and the block Jacobian (29) into the system of equations (31) and solving for the search direction $\delta\theta^{k}$ gives the solution

\begin{aligned} \delta\theta^{k} = -\frac{1}{m}\sum_{i=1}^{m} \operatorname{Log}(\operatorname{Exp}(\theta^{k} - \theta_{i})) = -\frac{1}{m}\sum_{i=1}^{m} \operatorname{Wrap}(\theta^{k} - \theta_{i}). \end{aligned}

The rotation averaging equation is then

\begin{aligned} \theta^{k+1} = \theta^{k} - \frac{1}{m}\sum_{i=1}^{m} \operatorname{Wrap}(\theta^{k} - \theta_{i}). \end{aligned}

Given that the rotation averaging equation (33) is an iterative equation, it needs to be initialized using a starting guess $\theta^{0}$ . Due to the nonconvexity of the optimization problem, there may be local minimums that are different from the global minimum. As such, the initial value $\theta^{0}$ plays a huge role in determining whether the algorithm converges to a local or a global minimum.

Some examples will be introduced in the next section that shed some light on this issue.

Examples

To show the effect of the initial angle $\theta^{0}$ on the final result, three examples will be presented.

First, we'll present the example presented at the beginning of the article shown below, where there are two angles being averaged out: $\pi$ and $-\pi$ .

The angles to be averaged are colored in orange, the starting angle is colored in blue, and the computed mean is shown in yellow. As seen in the plot above, the rotation averaging algorithm returns the true average,

\pi

To further understand how the algorithm converges, the objective function plotted against all possible angles is plotted below.

Objective function plot for global minima.

Notice that in the above plot, there's a single minima, which is at $\pi$ . Don't let the right- and left-sides of the plots confuse you; they refer to the same heading. Since there's a single minima for this problem, the algorithm converges to the correct solution regardless from where it starts. This is because the Gauss Newton algorithm will try to reach the bottom of the valley in the plot.

The next example demonstrates that this is not always the case. Consider the two angles $\pi/4$ and $3\pi/4$ . Let's explore the first minima by starting the algorithm at $-\pi/3$ , then the algorithm converges to $\pi/2$ , as shown in the plot below.

Some problems may have multiple minimas.

This may be surprising, since we're expecting the answer to be

\pi/2

. So, what happened?

Taking a look at the objective function plot reveals that there are actually two minimas, and the algorithm converged to the local (but not global) minima.

Thinking more about the problem reveals that it's actually not that surprising that we get the

\pi/2

answer; it's an angle that's the same distance from both

\pi/4

and

3\pi/4

, but it's not the angle with the shortest distance (i.e., it's not the global minima).

Well, if we start the same optimization problem at an angle that is closer to the global optima, say $\pi/3$ , then the algorithm converges to the global minima as seen below.

Finally, one last example reveals that in fact there may not be a unique solution. Consider the angles $0$ , $2\pi/3$ , and $-2\pi/3$ as shown below.

Some problems may not have a unique solution.

Our intuition tells us that there's no "right" answer for the average of these angles. That is, there's no single angle that is equally close to all the angles in the plot. This is better explained by looking at the objective function plot, which reveals that there are multiple global minimas.

The rotation averaging algorithm simply converges to the minima closest to the starting condition. But it would have also converged to a different angle if the algorithm was initialized closer to the other angle.

This final example demonstrates that sometimes the solution may depend on the problem structure itself, which give some insight into the algorithm: the fact that the algorithm gives an answer doesn't mean that it's the only, or even the right, answer.

For the reader interested in a numeric example, check the plotting script, which uses the algorithm to compute the mean.

Concluding remarks

In this post, I used an innocent looking problem to motivate an important field of applied mathematics: optimization on manifold. This field is used in multiple robotics' disciplines such as state estimation, control, and computer vision.

Don't be scared by the word "manifold" because you've just worked with one: the unit circle. This unit circle belongs to special class of manifolds known as Lie groups, which are special manifolds that are also groups. Lie groups appear quite frequently in robotic applications, mainly because they represent rotations elegantly, both in 2D and 3D.

For readers interested in Lie groups for robotic applications, then ^[1] is a great introduction to the topic and for a more rigorous text, refer to ^[2].

If you made it this far, then you are a champion! I hope you enjoyed this post and found something useful out of it. Don't hesitate to reach out.

References

[1]	J. Solà, J. Deray, and D. Atchuthan, “A micro Lie theory for state estimation in robotics,” arXiv:1812.01537 [cs], Dec. 2021, Accessed: Mar. 20, 2022.

[2]	T. D. Barfoot, State Estimation for Robotics. Cambridge, MA, USA: 774 Cambridge Univ. Press, 2017.

[3]	J. Nocedal and S. J. Wright, Numerical optimization, 2nd ed. in Springer series in operations research. New York: Springer, 2006.