TP1 --- Ming Hou 910 011 248

Project Description

In this small program, we aim to produce a single color image from three glass plate images in B,G,R channels, respectively. Specifically, we first vertically divide the original image into three parts corresponding to three template images. Then, the second and the third parts (G and R) are aligned to the first part(B).

Alignment methods

For images in small resolution, the intuitive way is to search over a window of possible displacements in a brute force way, and we compute the score that measures the differences between the target plate image B and the displaced image G (R), and choose the displacment with the best score. The two types of image matching metric we tested are Sum of Squared Differences (SSD) and normalized cross-correlation (NCC), and the result is roughly the same for small resolution images, we thus use SSD in this implement. The defualt value for the searching window ranges from -10 to 10. Before aligning the images, we appropriately trim the template images to smaller size so as to get rid of the meaningless border and improve the accuracy of the alignment. When computing the score, we calculate it according to the instersection part of two images for each specific displacemnt, and then normlize the result according to the size of common erea.

In the case of large resolution images, the brute force searching method will cost high computational load. To deal with this problem, we use a multiple scales methods, namely, an image pyramid represents the image at multiple scales and the processing is done sequentially starting from the smallest image. After the displacement at previous level has been computed, it is rescaled by a factor of 2 at the current level. Specifically, the search at current level is based on the previous level displacement , which can be regarded as an offset for the current level. The image will also be rescaled for current level. We iteratively proceed this procedure until the largest image is processed. The number of levels of pyramaid is a user-specified parameter and the default value we used is 3.

Alignment results

For jpeg images, if the value for triming the border is 20, and the searching window is in [-10, 10], then the displacments are listed as follows:

image name displacment (y, x) in G displacment (y, x) in R
00106v (4, 0) (8, -4)
00757v (2, 4) (4, 4)
00888v (6, 1) (10, 0)
00889v (2, 2) (4, 3)
00907v (3, 1) (6, 0)
00911v (1, -1) (10, -1)
01031v (1, 1) (4, 2)
01657v (6, 1) (10, 1)
01880v (6, 2) (10, 1)
00001v (selected) (4, 0) (10, -1)
00002v (selected) (3, 2) (10, 2)
00003v (selected) (2, 1) (6, 3)

00106v :

00757v :

00888v :

00889v :

00907v :

00911v :

01031v :

01657v :

01880v :

00001v :

00002v :

00003v :

For tif images, we use multiple scales algorithm to alignment the template images. The value for triming the border is set to 200. We choose two different window sizes to be 5 and 15. We also choose two different levels, which are 2 and 3. Thus, three combinations of parameters are tested in this section, which are "size=5, level=2", "size=5, level=3" and "size=15, level=3". The results are illusrated in following table and pictures.

image name window size=5, level=2 window size=5, level=3 window size=15, level=3
displacment in R displacment in Q displacment in R displacment in Q displacment in R displacment in Q
00029u (15, 15) (15, 15) (35, 16) (35, 35) (40, 16) (90, 34)
00087u (15, 6) (10, -8) (35, 35) (20, -8) (48, 40) (105, 0)
00128u (15, 15) (15, 15) (35, 25) (35, 35) (36, 24) (52, 40)
00458u (15, 9) (15, 15) (35, 7) (35, 35) (44, 4) (88, 32)
00737u (15, 7) (15, 13) (16, 8) (20, 12) (16, 8) (48, 16)
00822u (15, 15) (15, 15) (35, 26) (35, 31) (56, 24) (105, 34)
00892u (15, 2) (15, 2) (16, 4) (35, 4) (16, 2) (44, 4)
01043u (-15, 10) (11, 15) (-16, 8) (12, 16) (-16, 8) (12, 16)
01047u (15, 15) (15, 15) (24, 20) (35, 34) (24, 20) (72, 34)
00001u (selected) (10, 4) (15, -6) (35, 4) (16, -4) (36, 4) (97, -7)
00002u (selected) (15, 15) (15, 15) (30, 22) (35, 23) (36, 24) (96, 22)
00003u (selected) (15, 6) (15, 15) (26, 6) (35, 30) (24, 4) (60, 32)

00029u :     left : size=5 level=2         center : size=5 level=3         right : size=15 level=3

00087u :     left : size=5 level=2         center : size=5 level=3         right : size=15 level=3

00128u :     left : size=5 level=2         center : size=5 level=3         right : size=15 level=3

00458u :     left : size=5 level=2         center : size=5 level=3         right : size=15 level=3

00737u :     left : size=5 level=2         center : size=5 level=3         right : size=15 level=3

00822u :     left : size=5 level=2         center : size=5 level=3         right : size=15 level=3

00892u :     left : size=5 level=2         center : size=5 level=3         right : size=15 level=3

01043u :     left : size=5 level=2         center : size=5 level=3         right : size=15 level=3

01047u :     left : size=5 level=2         center : size=5 level=3         right : size=15 level=3

00001u :     left : size=5 level=2         center : size=5 level=3         right : size=15 level=3

00002u :     left : size=5 level=2         center : size=5 level=3         right : size=15 level=3

00003u :     left : size=5 level=2         center : size=5 level=3         right : size=15 level=3

As a result, we can see from the 2nd and the 3rd colum that for each pair of images the larger searching size is the higher accuracy we can obtain. It is also shown in the 1st and the 2nd colum that the less levels indicate worse performance, since the range of serching is restricted by the number of levels. For instance, this is obvious for image '00029u' in that displacements reduce from the pair (35, 16)(G) (35, 35)(R) in 3 levels to the pair (15, 15)(G) (15, 15)(R) in 2 levels. However, this is not obvious for image '00892u', because the displacments is relative small in 3 levels case, which is (16, -8)(G) (12, 16)(R). It can be reached by 2 levels pyramaid. Note that we will spend more time as the number of levels increases.

Bells and whistles

The border of the images is meaningless and noisy, hence it can severely affect the accuracy of the socre computed from SSD. To illusrate, the boder of right image '00106v' is trimed by 20 pixels and the image on the left is not trimed. Obviously, the performance of the alignment on the right is much better than that of left. Therefore, we trim the boder be before aligning the images, we also add borders back after aligning the images.