Given the list of corresponding points between the two images, it
remains to find the transformation which registers the overlapping
portions of the images. Assuming a pinhole camera model, the
transformation between pixels (u,v) in image 1 and pixels
in image 2 is described by a
plane-to-plane projectivity [8]:
The 8 parameters of the projectivity can be found from four pairs of matching points. Since we typically have many more than four matches, we use RANSAC regression [3] to reject outlying matches and estimate the projectivity from the remaining, good matches. The projectivity is fine-tuned using correlation at the corners of the overlap region to obtain four correspondences to sub-pixel accuracy. Image 1 is then transformed into image 2's coordinate system using (1), and the two displayed together in image 2's coordinate system: Figure 5(a) shows a typical result. For comparison, we show in Figure 5(b) the result obtained using the optimal affine transformation in place of the projectivity. Since there is little depth variation in the image one might expect an affine model to suffice: however, even though some parts of the overlap region are well registered, other parts are clearly blurred. The projectivity is necessary to achieve the high accuracy required for document mosaicing, where even single pixel registration errors are noticeable.
Figure 5: Mosaicing of two document images. In the overlap region
the pixels are blended, using an unweighted mean at the centre of the
region and increasingly weighted means towards the edges. This
blending operation eliminates any abrupt seams in the mosaic, but will
result in a blurred composite if the registration is not
accurate. Blurring is evident in the affine mosaic (b), but not in the
mosaic constructed using a plane-to-plane projectivity (a). Close-ups
of typical seams from (a) and (b) are shown in (c) and (d)
respectively. Note the system's ability to cope with mixtures of fonts
and unaligned columns.
Overlapping images are registered pair-wise in the order that they are
acquired. A final composite of the whole page is then constructed by
mapping all the images into the coordinate system of an ``anchor''
image, usually chosen to be the one nearest the page centre. The
transformations to the anchor frame are calculated by concatenating
the pair-wise transformations found earlier. Care must be taken to
ensure that images acquired late in the sequence do not overlap images
acquired much earlier on. Without such precautions error accumulation
could be a problem, though such errors could be eliminated using
hierarchical sub-mosaics [7]. A typical whole-page mosaic
(rotated through
) is shown in Figure 6. The
mosaic is approximately
pixels, giving a resolution
of about 150 dpi.
Figure 6: A whole page mosaic. To construct the mosaic all images
were transformed to the anchor frame by concatenating the pair-wise
projectivities. Where images overlap a weighted blending operation was
performed, as described in the caption to Figure 5.
Blending was strictly pair-wise: at any pixel where more than two
images overlap, only the two intensities with the largest weights were
blended. Note how the system copes with multiple fonts, including
mathematics.
Note that, typically, the camera is not perfectly parallel to the tabletop. In the example in Figure 6, the bottom of each image is slightly more distant than the top. The effect is barely noticeable in the individual images, but more evident in the mosaic. While it is straightforward to rectify the mosaic using a single plane-to-plane projectivity, we chose to display the raw mosaic to illustrate this point. Likewise, shading variations could be removed by histogram equalisation.