Averaging rotations

Introduction

We often take the arithmetic mean, or average to be quite straightforward concept in mathematics. For example, the average of two numbers, 1 and 3 is (1 + 3)/2 = 2. But what about taking the average of two rotations? The answer is not so straightforward, but I'll explain it in this article. However, before diving into averaging rotations, let's first discuss what a rotation is in the context of this article.

 Multiple angles can represent the same rotation.
Multiple angles can represent the same rotation.

Consider the blue arrow above, which is pointing in a top-right direction. There are many ways to represent the direction of this arrow. For example, it can be represented using the angle π/4\pi/4 rotated in a counter clockwise direction from the horizontal, or using another angle such as 7π/4-7\pi/4 in a clockwise direction from the horizontal. In fact, there are infinitely many angles representing the rotation of this arrow. For the purpose of this article, when I refer to a rotation or a heading, then I mean the "arrow direction" (i.e., think "top-right"), and when I refer to an angle, then I mean the number parametrizing the rotation or heading (e.g., π/4\pi/4).

Reiterating the point above, the same arrow's rotation or heading can be represented using infinitely many angles (e.g., π/4+k2π\pi/4 + k2\pi, where kk is an integer). However, the heading is often represented using an angle θ\theta in a Euclidean subspace of length 2π2\pi. For example, θ[0,2π)R\theta\in[0,2\pi)\sub\mathbb{R} or θ[π,π)R\theta\in[-\pi, \pi)\sub\mathbb{R}.

Averaging rotations

Let's go back to the first question posed at the beginning of the article: what is the average of two rotations? To get some insight into the problem, we'll start by looking at an example.

 The arithmetic mean of two angles is not necessarily the true mean of the two rotations.
The arithmetic mean of two angles is not necessarily the true mean of the two rotations.

Consider the blue arrow pointing to the left in the figure above and assume there are two of them (i.e., one laying on top of the other). Then, the average of the two rotations (i.e., the two blue arrows) should be the same (i.e., an arrow pointing to the left). But, what about the average of the two parametrizations? Let's make this interesting by assuming the two rotations are parametrized using the angles π\pi and π-\pi. The arithmetic mean of these two angles is (ππ)/2=0(\pi - \pi)/2 = 0, which is a wrong answer; it's an arrow pointing to the right, which is shown as the yellow arrow in the figure above.

To further demonstrate the point, consider averaging the same rotations but now using a different angles. Specifically, consider replacing π\pi with 3π3\pi, which is still the same rotation represented using the angle π\pi. The arithmetic mean is then (3ππ)/2=π(3\pi - \pi)/2 = \pi, which is a correct answer. Hmmm... what's going on here?

The examples above demonstrate how the heading representation (i.e., its parameterization) affects the "arithmetic angle average". In this article, we'll dig into the reason behind this behaviour and then a solution is presented that solves the "rotation averaging" problem. The answer uses the mathematics of Lie groups, which are special manifolds that are often used in robotics [1], especially when describing rotations. The subject of Lie groups is somewhat abstract and was difficult to learn when I was first introduced to the subject. My attempt in this article is to motivate the usage of Lie groups through the example of rotation averaging. I hope you enjoy the journey.

Parametrizing headings

The first challenge in addressing the rotation averaging problem is to address how a rotation is parametrized. That is, how can a rotation be represented. One of the parametrizations was discussed earlier in this post, which uses real numbers (i.e., θR\theta\in\mathbb{ R}). An issue with this parametrization is that the such parametrization is not unique. That is, the same rotation can be represented using different angles. Why is the non-uniqueness an issue in this case?

The problem of non-unique parametrization

The reason that the non-uniqueness of a parametrization may cause issues is related to the notation of distance. For example, given two angles θ1=π\theta_{1} = -\pi and θ2=π\theta_{2} = \pi, then the distance from θ1\theta_{1} to θ2\theta_{2} is θ2θ1=2π\vert \theta_{2} - \theta_{1}\vert = 2\pi. However, the two angles represent the same rotation, so the distance should actually 00.

The notion of distance is important as many areas of mathematics rely on it. For example, the notion of distance always appears in calculus, which in turn is used all over smooth optimization theory, which is used in many engineering applications. As such, having the correct notion of distance is important when using such mathematical tools.

Bounding the number line

The problem of non-unique parameterization may be dealt with by defining the headings to belong to a continuous subset of length 2π2\pi. For example, the subset θ[π,π)R{\theta\in[-\pi,\pi)\subset\mathbb{ R}}. Such parametrizations do solve the non-uniqueness problems and are often used in practice.

However, such parametrization is not a vector space, which is a small setback. Specifically, the set is not closed under addition or scaling. That is, given two headings θ1[π,π){\theta_{1}\in[-\pi,\pi)} and θ2[π,π){\theta_{2}\in[-\pi,\pi)}, then θ1+θ2∉[π,π){\theta_{1} + \theta_{2}\not\in[-\pi,\pi)}, in general.

The reason that this is a setback is because many algorithms rely on linear algebra, which assume that the variables belong to a vector space. For example, many numerical optimization algorithms (e.g., Newton's method) rely on such assumption.

This setback will not be an issue if there is a way to keep using the mathematical tools developed using linear algebra while still using the bounded number line [π,π){[-\pi,\pi)}. One way to do this is to exploit a surjective mapping from the real number line R\mathbb{ R} to the bounded set [π,π){[-\pi,\pi)}. This option is explored using complex numbers.

Complex numbers

A different way to represent heading is by representing a point on a unit circle, which is a more natural way to represent headings. This way, if two points lie on the same location on the unit circle then they have the same heading and parametrization.

The complex unit circle

S1{z:z=1}\begin{aligned} S^{1} \coloneqq \left\{z : \vert z \vert = 1\right\} \end{aligned}

is an elegant way to represent headings. It addresses the parameterization uniqueness issue previously discussed, where a heading is uniquely represented using a single complex number zS1z\in S^{1}. The notation S1S^{1} is used to denote that it's a one-dimensional sphere [1].

The unit circle S1S^{1} is not a vector space, which is the same issue presented when using the set [π,π){[-\pi,\pi)}. However, complex numbers tools can be used to remedy this issue. A point zS1z\in S^{1} on the unit circle can be parameterized by

z=eȷθ,\begin{aligned} z = e^{\jmath \theta}, \end{aligned}

where θR\theta\in\mathbb{ R} is not a unique number. This allows us to use the linear algebra tools on θR\theta\in\mathbb{ R} and then map them to the unit circle S1S^{1} using the mapping (2). For instance, two headings θ1,θ2R{\theta_{1},\theta_{2}\in\mathbb{ R}} can be added since they are in a vector space R,\mathbb{ R}, and then mapped to the unit circle using the exponential map (2), which is a smooth map. For example, given two headings z1,z2S1z_{1}, z_{2}\in S^{1} parameterized by θ1,θ2R{\theta_{1},\theta_{2}\in\mathbb{ R}}, respectively. Then their added heading is

z3=z1z2=eȷθ1eȷθ2=eȷ(θ1+θ2).\begin{aligned} z_{3} = z_{1} z_{2} = e^{\jmath\theta_{1}}e^{\jmath\theta_{2}} = e^{\jmath(\theta_{1} + \theta_{2})}. \end{aligned}

The parameter θ3\theta_{3} such that z3=eȷθ3z_{3} = e^{\jmath\theta_{3}} is computed using the inverse map, which is the logarithm map log\log. The logarithm map does not have a unique solution because the exponential map (2) is a surjective map, which means that there's a unique mapping from R\mathbb{ R} to S1S^{1}, but not necessarily the other way around. That is, there may exist two different parameters θ1θ2\theta_{1}\neq\theta_{2}, θ1,θ2R\theta_{1}, \theta_{2}\in\mathbb{ R} that have the same heading eȷθ1=eȷθ2e^{\jmath\theta_{1}}=e^{\jmath\theta_{2}}. Thus, the logarithm map is instead defined as

log(z)={ȷθ:eȷθ=z}.\begin{aligned} \log(z) = \{\jmath\theta : e^{\jmath\theta} = z\}. \end{aligned}

For example, log(1+0ȷ)=ȷ2πk\log(1 + 0\jmath) = \jmath 2\pi k, where kZk\in\mathbb{ Z}.

Simplified mappings

The mappings RS1\mathbb{ R}\to S^{1} and S1RS^{1}\to\mathbb{ R} will be used often, so explicitly defining functions representing these mappings will simplify the notation in this article.

Define the Exp(θ):RS1\operatorname{Exp}(\theta):\mathbb{ R}\to S^{1} to be

Exp(θ)eȷθ,\begin{aligned} \operatorname{Exp}(\theta) \coloneqq e^{\jmath\theta}, \end{aligned}

and Log:S1[π,π)R\operatorname{Log}: S^{1}\to[-\pi,\pi)\subset\mathbb{ R} to be

Log(z)log(z)/ȷ,\begin{aligned} \operatorname{Log}(z) \coloneqq \log(z)/\jmath, \end{aligned}

where Log\operatorname{Log} is defined to return the angle in the range [π,π)[-\pi,\pi).

Another useful operation is the angle-wrapping operator Wrap:R[π,π)\operatorname{Wrap}: \mathbb{ R}\to[-\pi,\pi), which is a mapping that takes any valid angle and wraps it to the range [π,π)[-\pi, \pi). For example, Wrap(3π)=π\operatorname{Wrap}(3\pi) = \pi. Mathematically, the angle-wrap function can be defined as

Wrap(θ)Log(Exp(θ)).\begin{aligned} \operatorname{Wrap}(\theta) \coloneqq \operatorname{Log}(\operatorname{Exp}(\theta)). \end{aligned}

Now that the rotation parametrization using complex numbers is introduced, we can continue the rotation-averaging discussion. But before deriving the rotation averaging algorithm, we will first discuss and derive the arithmetic mean, then expand on the arithmetic mean to apply it on rotations.

Arithmetic mean derivation

In order to derive the rotation averaging algorithm, it helps to dig deeper into the arithmetic mean equation

xˉ=1mi=1mxi,\begin{aligned} \bar{x} = \frac{1}{m}\sum_{i=1}^{m} x_{i}, \end{aligned}

which is valid for elements in a vector space xiVx_{i}\in\mathcal{V} (e.g., V=Rn\mathcal{V} = \mathbb{ R}^{n}). Deriving the arithmetic mean will provide the basis for deriving the rotation averaging algorithm.

One way to think about the mean of a set of elements X={x1,,xm}\mathcal{X}=\{x_{1}, \ldots, x_{m}\}, where xiVx_{i}\in \mathcal{V}, is that it's the number xˉV\bar{x}\in\mathcal{V} closest to all the numbers in the set, on average. But what does this statement mean mathematically? There are two points in the above statement that will help in formulating the above statement mathematically: first, the notion "closest" implies the minimum distance, so we need to define the distance on the given set; second, the notion "on average" implies reducing the distance on aggregate (i.e., on all elements), and not on a specific element.

Let's go back to the arithmetic mean to make the example more concrete. The notion of distance for the Euclidean space Rn\mathbb{ R}^{n} is defined as

d(x1,x2)=x1x22.\begin{aligned} d(x_{1}, x_{2}) = \Vert x_{1} - x_{2}\Vert_{2}. \end{aligned}

Thus, the distance between the mean xˉ\bar{x} and the element xix_{i} can be defined as

di(xˉ)d(xˉ,xi)=xˉxi2ei(xˉ)2,\begin{aligned} d_{i}(\bar{x}) \coloneqq d(\bar{x}, x_{i}) = \Vert \bar{x} - x_{i} \Vert_{2} \eqqcolon \Vert e_{i}(\bar{x}) \Vert_{2}, \end{aligned}

where

ei(xˉ)xˉxi\begin{aligned} e_{i}(\bar{x}) \coloneqq \bar{x} - x_{i} \end{aligned}

is referred to as the error function because it's measuring the difference between the mean xˉ\bar{x} (i.e., the target variable) and the iith element xix_{i}.

The mean xˉV\bar{x}\in\mathcal{V} is then the element that minimizes the total distance given by

J~(xˉ)i=1mdi(xˉ)=i=1mei(xˉ)2,\begin{aligned} \tilde{J}(\bar{x}) \coloneqq \sum_{i=1}^{m} d_{i}(\bar{x}) = \sum_{i=1}^{m} \Vert e_{i}(\bar{x})\Vert_{2}, \end{aligned}

which can be written mathematically as

xˉ=arg. minxVi=1me(x)2,\begin{aligned} \bar{x} = \operatorname{arg.\,min}_{x\in\mathcal{V}} \sum_{i=1}^{m} \Vert e(x) \Vert_{2}, \end{aligned}

where arg. min\operatorname{arg.\,min} is read as the "argument of the minimum".

Since (11) is an error function, then (13) is a (linear) least squares problem. Without going into details, squaring the summands in (13) simplifies the problem. As such, the objective function (12) is modified to be

J(xˉ)i=1mdi(xˉ)2=i=1mei(xˉ)Tei(xˉ),\begin{aligned} J(\bar{x}) \coloneqq \sum_{i=1}^{m} d_{i}(\bar{x})^{2} = \sum_{i=1}^{m} e_{i}(\bar{x})^{\mathsf{T}}e_{i}(\bar{x}), \end{aligned}

which in turn changes the optimization problem (13) to be

xˉ=arg. minxVi=1mei(x)Tei(x).\begin{aligned} \bar{x} = \operatorname{arg.\,min}_{x\in\mathcal{V}} \sum_{i=1}^{m} e_{i}(x)^{\mathsf{T}}e_{i}(x). \end{aligned}

Equation (16) can be written in lifted form (i.e., using matrices)

xˉ=arg. minxVAxb22,\begin{aligned} \bar{x} = \operatorname{arg.\,min}_{x\in\mathcal{V}} \Vert A{x} - b \Vert_{2}^{2}, \end{aligned}

where

A=[11]TRmn×n, and b=[x1TxmT]TRmn.\begin{aligned} A = \begin{bmatrix} \mathbf{ 1} & \cdots & \mathbf{ 1} \end{bmatrix}^{\mathsf{T}} \in \mathbb{ R}^{mn \times n}, \; \text{and}\; b = \begin{bmatrix} x_{1}^{\mathsf{T}} & \cdots & x_{m}^{\mathsf{T}} \end{bmatrix}^{\mathsf{T}} \in \mathbb{ R}^{mn}. \end{aligned}

The solution to the least squares problem (16) is [2]

xˉ=(ATA)1ATb=(m1)1(i=1mxi)=1mi=1mxi,\begin{aligned} \bar{x} = (A^{\mathsf{T}}A)^{-1}A^{\mathsf{T}}b = \left(m\mathbf{ 1}\right)^{-1}\left(\sum_{i=1}^{m}x_{i}\right) = \frac{1}{m} \sum_{i=1}^{m} x_{i}, \end{aligned}

which matches (8).

For further reading into least squares and optimization, [3] is a classic optimization book.

The generalization of the linear least squares problem is the nonlinear least squares, which is used in deriving the rotation-averaging equation in the next section.

Averaging rotations using complex numbers

The rotation averaging algorithm is an on-manifold nonlinear least squares algorithm implemented, where the manifold in this case is the unit circle S1S^{1}. The algorithm is very similar to the arithmetic mean equation derived in the previous section, so we'll follow a similar path to derive the rotation averaging algorithm.

As previously discussed, the rotations will be parametrized using complex numbers (2). And as was done with the arithmetic mean, we need a distance metric d~:S1×S1R0\tilde{d}:S^{1}\times S^{1}\to\mathbb{ R}_{\geq0} to measure the distance between two rotations, which in this case is given by

d~(z1,z2)=Log(z1z2)2,\begin{aligned} \tilde{d}(z_{1}, z_{2}) = \Vert \operatorname{Log}(z_{1}z_{2}^{\ast}) \Vert_{2}, \end{aligned}

where zz^{\ast} is the complex conjugate, and Log\operatorname{Log} is defined in (6).

Similar to the error function introduced in (10), the distance between the mean zˉ\bar{z} and the iith rotation ziz_{i} is given by

d~i(zˉ)d~(zˉ,zi)e~i(zˉ)2,\begin{aligned} \tilde{d}_{i}(\bar{z}) \coloneqq \tilde{d}(\bar{z}, z_{i}) \eqqcolon \Vert \tilde{e}_{i}(\bar{z}) \Vert_{2}, \end{aligned}

where

e~i(zˉ)=Log(zˉizi)\begin{aligned} \tilde{e}_{i}(\bar{z}) = \operatorname{Log}(\bar{z}_{i}z_{i}^{\ast}) \end{aligned}

is the error function between the mean zˉ\bar{z} and the iith rotation ziz_{i}.

It important to note that the error function (21) is smooth, near the minimum, in both arguments. Without the smoothness of the error function, it's not possible to use the nonlinear least squares algorithm, which is used in the following steps.

The rotation average is then the solution to the optimization problem

zˉ=arg. minzS1i=1me~i(z)2.\begin{aligned} \bar{z} = \operatorname{arg.\,min}_{z\in S^{1}} \sum_{i=1}^{m} \tilde{e}_{i}(z)^{2}. \end{aligned}

The optimization problem (22) looks nice in theory, but unfortunately, it's not very straightforward to solve using complex numbers. As such, we turn to parametrizing the rotations using angles, as was introduced in (2). Specifically, I will use the notation

z(θ)Exp(θ).\begin{aligned} z(\theta) \coloneqq \operatorname{Exp}(\theta). \end{aligned}

Then, the error function (21) becomes

ei(θˉ)=e~i(z(θˉ))=Log(z(θˉ)z(θi))=Log(Exp(θˉθi))=Wrap(θˉθi).\begin{aligned} e_{i}(\bar{\theta}) = \tilde{e}_{i}(z(\bar{\theta})) = \operatorname{Log}(z(\bar{\theta})z(\theta_{i})^{\ast}) = \operatorname{Log}(\operatorname{Exp}(\bar{\theta} - \theta_{i})) = \operatorname{Wrap}(\bar{\theta} - \theta_{i}). \end{aligned}

The optimization problem (22) becomes

θˉ=arg. minθRi=1mei(θ)2.\begin{aligned} \bar{\theta} = \operatorname{arg.\,min}_{\theta\in \mathbb{ R}} \sum_{i=1}^{m} e_{i}(\theta)^{2}. \end{aligned}

The optimization problem (25) is a non-convex nonlinear least squares problem, which can be solved using the various methods such as the Gauss Newton algorithm. The Gauss Newton method is a gradient-based optimization method, which means that it requires and uses the error function Jacobian to iterate over the current solution, until convergence. As such, the Jacobian of the error function (25) is needed.

The Jacobian of the error function (25) can be computed by expanding the error function using its Taylor series approximation. Let θˉ\bar{\theta} be the operating point to linearize about, and let θ=θˉ+δθ\theta = \bar{\theta} + \delta\theta, where δθ\delta\theta is the perturbation. Then, the error function can be expressed as

ei(θ)=ei(θˉ+δθ)=Log(Exp(θˉ+δθθi))Log(Exp(θˉθi))+Log(Exp(δθ))=ei(θˉ)+δθ.\begin{aligned} e_{i}(\theta) = e_{i}(\bar{\theta} + \delta\theta) = \operatorname{Log}(\operatorname{Exp}(\bar{\theta} + \delta\theta - \theta_{i})) \approx \operatorname{Log}(\operatorname{Exp}(\bar{\theta}-\theta_{i})) + \operatorname{Log}(\operatorname{Exp}(\delta\theta)) = e_{i}(\bar{\theta}) + \delta\theta. \end{aligned}

As such, the Jacobian is

Jidei(θ)dθ=1.\begin{aligned} \mathbf{J}_{i} \coloneqq \frac{\mathrm{d}e_{i}(\theta)}{\mathrm{d}\theta} = 1. \end{aligned}

The error functions can be stacked into a single column vector

e(θ)=[e1(θ)em]TRm,\begin{aligned} \mathbf{ e}(\theta) = \begin{bmatrix} e_{1}(\theta) & \cdots & e_{m}\end{bmatrix}^{\mathsf{T}} \in \mathbb{ R}^{m}, \end{aligned}

and its Jacobian is given by

Jde(θ)dθ=[11]T.\begin{aligned} \mathbf{ J} \coloneqq \frac{\mathrm{d}\mathbf{ e}(\theta)}{\mathrm{d}\theta} = \begin{bmatrix}1 & \cdots & 1\end{bmatrix}^{\mathsf{T}}. \end{aligned}

The Gauss Newton algorithm is an iterative algorithm given by

θk+1=θk+δθk,\begin{aligned} \theta^{k+1} = \theta^{k} + \delta\theta^{k}, \end{aligned}

where δθk\delta\theta^{k} is the search direction (also known as the update step), and is given by solving the system of equations

J(θk)TJ(θk)δθk=J(θk)Te(θk).\begin{aligned} \mathbf{ J}(\theta^{k})^{\mathsf{T}}\mathbf{ J}(\theta^{k})\delta\theta^{k} = -\mathbf{ J}(\theta^{k})^{\mathsf{T}}\mathbf{ e}(\theta^{k}). \end{aligned}

Inserting the error function (28) and the block Jacobian (29) into the system of equations (31) and solving for the search direction δθk\delta\theta^{k} gives the solution

δθk=1mi=1mLog(Exp(θkθi))=1mi=1mWrap(θkθi).\begin{aligned} \delta\theta^{k} = -\frac{1}{m}\sum_{i=1}^{m} \operatorname{Log}(\operatorname{Exp}(\theta^{k} - \theta_{i})) = -\frac{1}{m}\sum_{i=1}^{m} \operatorname{Wrap}(\theta^{k} - \theta_{i}). \end{aligned}

The rotation averaging equation is then

θk+1=θk1mi=1mWrap(θkθi).\begin{aligned} \theta^{k+1} = \theta^{k} - \frac{1}{m}\sum_{i=1}^{m} \operatorname{Wrap}(\theta^{k} - \theta_{i}). \end{aligned}

Given that the rotation averaging equation (33) is an iterative equation, it needs to be initialized using a starting guess θ0\theta^{0}. Due to the nonconvexity of the optimization problem, there may be local minimums that are different from the global minimum. As such, the initial value θ0\theta^{0} plays a huge role in determining whether the algorithm converges to a local or a global minimum.

Some examples will be introduced in the next section that shed some light on this issue.

Examples

To show the effect of the initial angle θ0\theta^{0} on the final result, three examples will be presented.

First, we'll present the example presented at the beginning of the article shown below, where there are two angles being averaged out: π\pi and π-\pi.

 Averaging two identical angles.
Averaging two identical angles.
The angles to be averaged are colored in orange, the starting angle is colored in blue, and the computed mean is shown in yellow. As seen in the plot above, the rotation averaging algorithm returns the true average, π\pi.

To further understand how the algorithm converges, the objective function plotted against all possible angles is plotted below.

 Objective function plot for global minima.
Objective function plot for global minima.

Notice that in the above plot, there's a single minima, which is at π\pi. Don't let the right- and left-sides of the plots confuse you; they refer to the same heading. Since there's a single minima for this problem, the algorithm converges to the correct solution regardless from where it starts. This is because the Gauss Newton algorithm will try to reach the bottom of the valley in the plot.

The next example demonstrates that this is not always the case. Consider the two angles π/4\pi/4 and 3π/43\pi/4. Let's explore the first minima by starting the algorithm at π/3-\pi/3, then the algorithm converges to π/2\pi/2, as shown in the plot below.

 Some problems may have multiple minimas.
Some problems may have multiple minimas.
This may be surprising, since we're expecting the answer to be π/2\pi/2. So, what happened?

Taking a look at the objective function plot reveals that there are actually two minimas, and the algorithm converged to the local (but not global) minima.

 Converging to a local minima.
Converging to a local minima.
Thinking more about the problem reveals that it's actually not that surprising that we get the π/2\pi/2 answer; it's an angle that's the same distance from both π/4\pi/4 and 3π/43\pi/4, but it's not the angle with the shortest distance (i.e., it's not the global minima).

Well, if we start the same optimization problem at an angle that is closer to the global optima, say π/3\pi/3, then the algorithm converges to the global minima as seen below.

 Converging to a global minima.
Converging to a global minima.

Finally, one last example reveals that in fact there may not be a unique solution. Consider the angles 00, 2π/32\pi/3, and 2π/3-2\pi/3 as shown below.

 Some problems may not have a unique solution.
Some problems may not have a unique solution.

Our intuition tells us that there's no "right" answer for the average of these angles. That is, there's no single angle that is equally close to all the angles in the plot. This is better explained by looking at the objective function plot, which reveals that there are multiple global minimas.

 Converging to a global minima.
Converging to a global minima.
The rotation averaging algorithm simply converges to the minima closest to the starting condition. But it would have also converged to a different angle if the algorithm was initialized closer to the other angle.

This final example demonstrates that sometimes the solution may depend on the problem structure itself, which give some insight into the algorithm: the fact that the algorithm gives an answer doesn't mean that it's the only, or even the right, answer.

For the reader interested in a numeric example, check the plotting script, which uses the algorithm to compute the mean.

Concluding remarks

In this post, I used an innocent looking problem to motivate an important field of applied mathematics: optimization on manifold. This field is used in multiple robotics' disciplines such as state estimation, control, and computer vision.

Don't be scared by the word "manifold" because you've just worked with one: the unit circle. This unit circle belongs to special class of manifolds known as Lie groups, which are special manifolds that are also groups. Lie groups appear quite frequently in robotic applications, mainly because they represent rotations elegantly, both in 2D and 3D.

For readers interested in Lie groups for robotic applications, then [1] is a great introduction to the topic and for a more rigorous text, refer to [2].

If you made it this far, then you are a champion! I hope you enjoyed this post and found something useful out of it. Don't hesitate to reach out.

References

[1] J. Solà, J. Deray, and D. Atchuthan, “A micro Lie theory for state estimation in robotics,” arXiv:1812.01537 [cs], Dec. 2021, Accessed: Mar. 20, 2022.
[2] T. D. Barfoot, State Estimation for Robotics. Cambridge, MA, USA: 774 Cambridge Univ. Press, 2017.
[3] J. Nocedal and S. J. Wright, Numerical optimization, 2nd ed. in Springer series in operations research. New York: Springer, 2006.