TP1: Colorizing the Russian Empire

Due Date: 23h55 on February 2nd, 2014

Background

Sergei Mikhailovich Prokudin-Gorskii (1863-1944) was a man well ahead of his time. Convinced, as early as 1907, that color photography was the wave of the future, he won Tzar's special permission to travel across the vast Russian Empire and take color photographs of everything he saw. And he really photographed everything: people, buildings, landscapes, railroads, bridges... thousands of color pictures! His idea was simple: record three exposures of every scene onto a glass plate using a red, a green, and a blue filter. Never mind that there was no way to print color photographs until much later -- he envisioned special projectors to be installed in "multimedia" classrooms all across Russia where the children would be able to learn about their vast country. Alas, his plans never materialized: he left Russia in 1918, right after the revolution, never to return again. Luckily, his RGB glass plate negatives, capturing the last years of the Russian Empire, survived and were purchased in 1948 by the Library of Congress. The LoC has recently digitized the negatives and made them available on-line.

Overview

The goal of this assignment is to take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce a color image with as few visual artifacts as possible. In order to do this, you will need to extract the three color channel images, place them on top of each other, and align them so that they form a single RGB color image. We will assume that a simple x,y translation model is sufficient for proper alignment. However, the full-size glass plate images are very large, so your alignment procedure will need to be relatively fast and efficient.

Details

A few of the digitized glass plate images (both hi-res and low-res versions) are available at the following link (note that the filter order from top to bottom is BGR, not RGB!). Your program will take a glass plate image as input and produce a single color image as output. The program should divide the image into three equal parts and align the second and the third parts (G and R) to the first (B). For each image, you will need to print the (x,y) displacement vector that was used to align the parts.

The easiest way to align the parts is to exhaustively search over a window of possible displacements (say [-15,15] pixels), score each one using some image matching metric, and take the displacement with the best score. There is a number of possible metrics that one could use to score how well the images match. The simplest one is just the L2 norm also known as the Sum of Squared Differences (SSD) distance which is simply:

sum((image1(:)-image2(:)).^2)

Another is normalized cross-correlation (NCC), which is simply a dot product between two normalized vectors:

image1 / norm(image1) and image2 / norm(image2)

Note that in this particular case, the images to be matched do not actually have the same brightness values (they are different color channels), so a cleverer metric might work better.

Exhaustive search will become prohibitively expensive if the pixel displacement is too large (which will be the case for high-resolution glass plate scans). In this case, you will need to implement a faster search procedure such as an image pyramid. An image pyramid represents the image at multiple scales (usually scaled by a factor of 2) and the processing is done sequentially starting from the coarsest scale (smallest image) and going down the pyramid, updating your estimate as you go. It is very easy to implement by adding recursive calls to your original single-scale implementation.

Your job will be to implement an algorithm that, given a 3-channel image, produces a color image as output. Implement a simple single-scale version first, using for loops, searching over a user-specified window of displacements. Next, add a coarse-to-fine pyramid speedup to handle large images. To help you start your assignment, we give you a skeleton Matlab code.

Bells & Whistles (Extra Credit)

Although the color images resulting from this automatic procedure will often look strikingly real, they are still a far cry from the manually restored versions available on the LoC website and from other professional photographers. Can we make some of these adjustments automatically, without the human in the loop? Feel free to come up with your own approaches or talk to me about your ideas. There is no right answer here -- just try out things and see what works. For example, the borders of the photograph will have strange colors since the three channels won't exactly align. See if you can devise an automatic way of cropping the border to get rid of the bad stuff. One possible idea is that the information in the good parts of the image generally agrees across the color channels, whereas at borders it does not.

You do not need to use color filters to test your algorithms on your own images. You only need to take three pictures, one after the other, and then extract the 'R' channel from the first, the 'G' channel from the second, and the 'B' channel from the third. This way, you can simulate what Mr. Prokudin-Gorskii has done more than 100 years ago with nowadays technology. Is the alignment as good with the three pictures from your own camera? What happens if there are some elements that move in the scene? Experiment with different scenes and comment on your results.

Deliverables

For this project you must turn in both your code and a project webpage in which you will put your results and a short discussion on these. To help you start, here's a webpage template (optional). The aesthetic appearance of the website will not be evaluated, but it is important for the information to be clearly presented.

More precisely, the webpage should contain:

A short description of the project and your approach;
If you encounter any problem on some images, describe these problems and tell us own you tried to solve them;
The result of your algorithm on all of our example images. List the offsets you calculated. For the large tif images, display a jpeg image with a link to your full size tif;
The result of your algorithm on a few examples of your own choosing, downloaded from the Prokudin-Gorskii collection. Here is a list of all the images of the library in JPG (low resolution) and in TIF (high resolution).
If your algorithm failed to align any image, provide a brief explanation of why;
Describe any bells and whistles you implemented. For maximum credit, show before and after images.

For this practical work, you must create a tp1.zip file. In this file you'll put:

Your report in an HTML format inside a folder named tp1/web. Your images for this web page should be inside a folder named tp1/web/images.
Your matlab code should be put inside the folder tp1/code. Please do not include the images you have used to generate your results inside this folder.

Finally, you should upload this file (tp1.zip) on pixel (http://pixel.fsg.ulaval.ca) before the deadline. The late submission policy described in the course plan will be applied. For any question regarding the submission process or the project as such, send your questions to the course's email address.

Evaluation

This assignment is evaluated on 100 points, as follows:

60 points (45 for those in the grad version of the class) for a single-scale implementation with successful results on low-res images;
40 points for a multiscale pyramid version that works on the large images;
Up to 10 points for bells & whistles explicitly mentioned above;
Up to 5 points for bells & whistles you come up with on your own (and OK with course staff);
Up to 5 points if you have tested your algorithms on your own pictures.

Thanks

Many thanks to Alyosha Efros for the original version of this assignment!

Back to the course webpage.