Let us assume we can locate a robot in some 'world' coordinate frame $\textbf{X}_\textbf{w},\textbf{Y}_\textbf{w}$ at the position $x_c,y_c$ and angled at $\theta_c$ (see Figure t.b.d. (fig:HT)). If the robot can see something at \text{X} in its frame of reference $\textbf{X}_\textbf{r},\textbf{Y}_\textbf{r}$ at the position $x_{\text{X}r},y_{\text{X}r}$ the question is, where is this X in the 'world' coordinate frame - i.e. what is $x_{\text{X}w},y_{\text{X}w}$ given $x_{\text{X}r},y_{\text{X}r}$? This is a problem that can be found in a wide range of areas from computer graphics to fractal geometry and quantum physics. One popular way to solve this problem is to use the homogeneous transform matrix, then translating points from the robots point of view to the world becomes simply that of matrix multiplication.

To see how this happens we can draw a horizontal line (yellow~{\color{myyellow}\rule[1mm]{5mm}{2pt}} in the world coordinate frame) and a vertical line (orange~{\color{myorange}\rule[-1.5mm]{2pt}{5mm}}) through $x_{\text{X}r}$.

Observe that $y_{\text{X}r}$ is inclined at an angle $\theta_c$ with respect to a vertical line (purple~{\color{mypurple}\rule[-1.5mm]{2pt}{5mm}}). From this it is possible to write out the solutions as

\[ x_{\text{X}w}=x_{\text{X}r}\cos\theta_c - y_{\text{X}r}\sin\theta_c +x_c \]that is $X=$ red - yellow $+x_c$and

\[ y_{\text{X}w}=x_{\text{X}r}\sin\theta_c + y_{\text{X}r}\cos\theta_c +y_c \]or

missing image (html)xx missing image (html)image_origHowever these equations do not separate information about the location of the robot from the location of the cross (X). We can rewrite the equations as the homogeneous transform equation

\[ \left[\begin{array}{c} x_{\text{X}w}\\ y_{\text{X}w}\\ 1 \end{array}\right] = \left[\begin{array}{ccc}\cos\theta_c&-\sin\theta_c&x_c\\ \sin\theta_c&\cos\theta_c&y_c\\ 0&0&1\\ \end{array}\right] \left[\begin{array}{c} x_{\text{X}r}\\ y_{\text{X}r}\\ 1 \end{array}\right] \]The matrix is known as the `homogeneous transform' and is sometimes written

\[ ^w_rT=\left[\begin{array}{ccc}\cos\theta_c&-\sin\theta_c&x_c\\ \sin\theta&\cos_c\theta_c&y_c\\ 0&0&1\\ \end{array}\right] \]with the leading superscript $w$ and subscript $r$ indicating that the transform operates on robot coordinates to produce world coordinates. It contains only information about the position of the robot but any thing that the robot knows about can be multiplied by this matrix to transform the location in the world coordinate frame.

So we can write the homogeneous transform as

\[ \left[\begin{array}{c} x_{\text{X}w}\\ y_{\text{X}w}\\ 1 \end{array}\right] = {}^w_rT \left[\begin{array}{c} x_{\text{X}r}\\ y_{\text{X}r}\\ 1 \end{array}\right] \]and if we want to go from the world coordinate frame to the robot's coordinate frame we simply use the matrix inverse, that is

\[ \left[\begin{array}{c} x_{\text{X}r}\\ y_{\text{X}r}\\ 1 \end{array}\right] = {}^r_wT \left[\begin{array}{c} x_{\text{X}w}\\ y_{\text{X}w}\\ 1 \end{array}\right] = {}^w_rT^{-1} \left[\begin{array}{c} x_{\text{X}w}\\ y_{\text{X}w}\\ 1 \end{array}\right] \]One of the key uses of Homogeneous transforms is to combine movements into a single entity.

Suppose that the robot does a series of translations and rotations, what is the homogeneous transform of the result.

As an example, assume the robot translates 9.5 units along the x axis, rotates by 90 degrees, and translates 4.5 units along the y axis.

The individual matrices are

\[ {}^w_aT= \left[\begin{array}{ccc} 1&0&9.5\\ 0&1&0\\ 0&0&1 \end{array}\right] \]\[ {}^a_bR= \left[\begin{array}{ccc} 0&-1&0\\ 1&0&0\\ 0&0&1 \end{array}\right] \]\[ {}^b_rT= \left[\begin{array}{ccc} 1&0&0\\ 0&1&4.5\\ 0&0&1 \end{array}\right] \]These matrices can then be multiplied together to get the resulting homogeneous transform of the robot and hence translate anything from the world to the robot coordinate frame and vice versa.

\[ {}^w_rT={}^w_aT\,\,\,{}^a_bR\,\,\,{}^b_rT \]so to shift points from the robots coordinate frame to the world frame requires ${}_w\mathbf{p}={}^w_rT\,\,{}_r\mathbf{p}$.

Likewise to move points from the world space into the robot's coordnate frame requires ${}_r\mathbf{p}={}^w_rT^{-1}\,\,{}_w\mathbf{p}$.

If for some reason we loose the ability to rotate the robot in its own coordinate frame, we can still do the rotation by translating it to the world origin, performing the rotation and translating it back again.

This is the full Camera transform and uses the concept of a lens with a focal length $f$.

\includegraphics[width=0.95\textwidth]{image_orig/perspectivetf}

From the picture it can be seen that for the ray of light passing through the right focus that

\[ \frac{-y_2}{f}=\frac{y_1}{x_1-f} \]Thus

\begin{equation} y_2=\frac{y_1}{1-\frac{x_1}{f}} \label{eq:persp1} \end{equation}Also from the principal ray passing through the origin

\[ \frac{-x_2}{-y_2}=\frac{x_1}{y_1} \]so

\begin{equation} x_2=y_2\frac{x_1}{y_1}=\frac{y_1}{1-\frac{x_1}{f}}\frac{x_1}{y_1}=\frac{x_1}{1-\frac{x_1}{f}} \label{eq:persp2} \end{equation}Thus the following perspective transform can be used followed by a normalization step.

\[ \begin{bmatrix}x_2\\y_2\\1-\frac{x_1}{f}\end{bmatrix}= \begin{bmatrix}1& 0& 0\\0& 1& 0\\-\frac{1}{f}& 0& 1\end{bmatrix} \begin{bmatrix}x_1\\y_1\\1\end{bmatrix} \]Which on normalisation results in the transforms shown in equations t.b.d. (eq:persp1) and \ref{eq:persp2}

updated Mon 30-11-2020