Description
Machine perception
Create a folder assignment2 that you will use during this assignment. Unpack the content of the assignment2.zip that you can download from the course webpage to the folder. Save the solutions for the assignments as Python scripts to assignment2 folder. In order to complete the exercise you have to present these files to the . Some assignments conin questions that require sketching, writing or manual calculation. Write these answers down and bring them to the presention as well. The sks that are marked with (cid:70) are optional. Without completing them you can get at most 75 points for the exercise. The maximum tol amount of points for each assignment is 100. Each optional sk has the amount of additional points written next to it.
Introduction
This exercise conins two exercises. In the first one you will familiarize yourself with several methods of histogram comparison on the domain of image retrieval. In the sec- ond assignment you will learn about convolution, which is a basic operation in image processing.
Exercise 1: Global approach to image description
In the previous assignment we used histograms created from a single channel images. Now we will extend this to multi-dimensional histograms and use histograms as global image descriptors in order to compare images.
(a) Firstly, you will implement the function myhist3 that computes a 3-D histogram from a three channel image. The images you will use are RGB but the function should also work on other color spaces. The resulting histogram is stored in a 3-D matrix. The size of the resulting histogram is determined by the parameter n_bins. The bin range calculation is exactly the same as in the previous assignment, except now you will get one index for each image channel. Iterate through the image pixels and increment the appropriate histogram cells. You can create an empty 3-D numpy array with H = np.zeros((n_bins,n_bins,n_bins)). Take care that you normalize the resulting histogram.
(b) In order to perform image comparison using histograms we need to implement some disnce measures. These are defined for two input histograms and return a sin-
gle scalar value that represents the similarity (or disnce) between the two his- tograms. Implement a function compare_histograms that accepts two histograms and a string that identifies the disnce measure you wish to calculate. You can srt with the L2 metric.
The L2 metric (commonly known as Euclidean disnce) treats histograms with N bins as points in N -dimensional space. For histograms h1 in h2 (that of course must have the same number of bins) the L2 disnce is defined as:
L2 =
(cid:104) (cid:88)
Also implement the following measures that are more suible for histogram com- parison:
• Chi-square disnce χ2 = 1 2
(h1(i)−h2(i))2 h1(i)+h2(i)+ε0
, where ε0 is a very small consnt
• Intersection I = 1 − (cid:80)
Try to avoid looping over histogram values and instead use vector operations on entire matrices at once.
(c) Test your function with the following images:
• daset/object_01_1.png,
• daset/object_02_1.png,
• daset/object_03_1.png.
Question: Which image (object_02_1.png or object_03_1.png) is more similar to image object_01_1.png considering the L2 disnce? How about the other three disnces? We can see that all three histograms conin a strongly expressed com- ponent (one bin has a much higher value than the others). Which color does this bin represent?
(d) You will now implement a simple image retrieval system that will use histograms. Write a function that will accept the path to the image directory and the parameter n_bins and then calculate RGB histograms for all images in the directory as well as transform them to 1-D arrays. Store the histograms in an appropriate da structure. Select some image from the directory daset/ and compute the disnce between its histogram and all the other histograms you calculated before. Sort the list ac- cording to the calculated similarity and display the reference image and the first five most similar images to it. Also display the corresponding histograms. Do this for all four disnce measures that you implemented earlier.
Question: Which disnce is in your opinion best suited for image retrieval? How does the retrieved sequence change if you use a different number of bins? Is the execution time affected by the number of bins?
(e) You can get a better sense of the differences in the disnce values if you plot all of them at the same time. Use the function plt.plot() to display image indices on the x axis and disnces to the reference image on the y axis. Display both the unsorted and the sorted image sequence and mark the most similar values using a circle (see pyplot documention).
(f) (cid:70) (10 points) This simple retrieval system is strongly influenced by dominant colors that carry no discriminative information. Analyze the presence of different colors by summing up all image histograms bin-wise and displaying the resulting histogram. Which bins dominate this histogram?
To address this issue, you will implement a simple frequency-based weighting tech- nique, similar to the ones that are employed in document retrieval systems. Use the combined frequency histogram you calculated to determine the weight for each bin. The weights should be lower for bins that are strongly represented in the frequency histogram and vice versa One way of computing this weight is to use exponential function wi = e−λF (i), where F (i) represents a frequency of the i-th bin and λ is a scaling consnt that you have to set. There are also other ways of computing weights that you can experiment with. Before calculating histogram similarity, you should multiply each histogram with the weights vector bin-wise (and normalize the result). Finally, you can compare the retrieval process for the weighted and the unweighted histograms. Report your observations. Did the weighting help with retrieving relevant results?
Exercise 2: Convolution
A basic operation that is used in image processing is called convolution. For easier un- dersnding, you will first implement the 1-D version of the operation. Convolution of a kernel g(x) with a signal I(x) is defined by the following equation:
Ig(x) = g(x) (cid:63) I(x) =
∞ (cid:90)
−∞
g(u)I(x − u)du,
or in the case of discrete signals (i.e. images):
Ig(i) = g(i) (cid:63) I(i) =
∞ (cid:88)
−∞
g(j)I(i − j)dj.
A nice visualization of convolution can be seen on Wikipedia1. In practice, the kernel is of finite size, thus the sum only runs from the leftmost element of the rightmost element of the kernel. Intuitively, the kernel is overlayed on the signal and the overlapping region is multiplied element- wise with signal, then summed. This sum is the result for position i.
1http://en.wikipedia.org/wiki/Convolution
(a) Compute the convolution between the signal and kernel below (k (cid:63) f ) by hand.
f = 0
0
0
0
0
Question: Can you recognize the shape of the kernel? What is the sum of the elements in the kernel? How does the kernel affect the signal?
(c) (cid:70) (5 points) Improve the function simple_convolution by also addressing the edges of the signal. To do this, implement one of the common methods for padding the signal so the operation can be performed on all the signal elements. Take care that the input and the output signals are the same length.
(d) Write a function that calculates a Gaussian kernel. Use the definition:
g(x) =
√
1 2πσ
exp(−
x2 2σ2 ).
Question: The figure below shows two kernels (a) and (b) as well as signal (c). Sketch (do not focus on exact proportions of your drawing but rather on the under- snding of what you are doing) the resulting convolved signal of the given input signal and each kernel. You can optionally also implement a convolution demo based on the signals and your convolution code, but the impornt part here is your un- dersnding of the general idea of convolution.
Exercise 3: Image filtering
This exercise will teach you how to use convolution on 2-D signals to achieve image filtering. Filtering can be used to smooth or sharpen images, or to remove cerin types
of noise from the image. You will also experiment with some filters that are not based on convolution.
(a) An impornt property of the Gaussian kernel is its separability in multiple dimen-
sions. A Gaussian kernel in 2-D space can be written as:
G(x, y) =
)
1 2πσ 1 2πσ
√
= [
exp(−
x2 + y2 σ2 x2 2σ2 )][
√
1 2πσ
exp(−
y2 2σ2 )]
= g(x)g(y),
which is a product of two 1D Gaussian kernels, each one in its own dimension. If we again write a continuous convolution
G(x, y) ∗ I(x, y) =
(cid:90)
(cid:90)
g(u)g(v)I(x − u, y − v)dudv
v
u (cid:90)
=
(cid:90)
g(u)(
g(v)I(x − u, y − v)du)dv
u
v = g(x) ∗ [g(y) ∗ I(x, y)],
we can see that we get the same result if we filter a 2-D signal using a single 2-D Gaussian kernel or if we do it using two 1D Gaussian kernels that represent the separated components of the former 2-D Gaussian kernel. This technique can be used to translate a slow n-D filtering operation to a fast sequence of 1-D filtering operations.
Write a function gaussfilter that generates a Gaussian filter and applies it to a 2-D image. You can use the function cv2.filter2D() to perform the convolution using the desired kernel. Generate a 1-D Gaussian kernel and first use it to filter the image along the first dimension, then convolve the result using the same kernel, but transposed.
Hint: Numpy arrays have an attribute named T, which is used to access the transpose of the array, e.g. k_transposed = k.T.
Test the function by loading the image lena.png and converting it to grayscale. Then, corrupt the image with Gaussian noise (every pixel value is offset by a ran- dom number sampled from the Gaussian distribution) and separately with salt- and-pepper noise. You can use the functions gauss_noise and sp_noise that are included with the instructions (a2_utils.py). Use the function gaussfilter to try and remove noise from both images.
Question: Which noise is better removed using the Gaussian filter?
(b) Convolution can also be used for image sharpening. Look at its definition in the
lecture slides and implement it. Test it on the image from file museum.jpg.
(c) Implement a nonlinear median filter. While a Gaussian filter locally computes a weighted average, the median filter sorts the signal values in the given filter window and uses the median value of the sorted sequence as the result. Implement a simple median filter as a function simple_median that kes the input signal I and the filter width w and returns the filtered signal.
Create a 1-D signal corrupted with salt and pepper noise and filter it using sim- ple_median. Display the result using different window sizes. Also try to remove the noise using the Gaussian filter. What does the result look like and why?
Question: Which filter performs better at this specific sk? In comparison to Gaussian filter that can be applied multiple times in any order, does the order matter in case of median filter? What is the name of filters like this?
(d) (cid:70) (10 points) Implement a 2-D version of the median filter. Test it on an image that was corrupted by Gaussian noise and on an image that was corrupted by salt and pepper noise. Compare the results with the Gaussian filter for multiple noise intensities and filter sizes.
Question: What is the computional complexity of the Gaussian filter operation? How about the median filter? What does it depend on? Describe the computional complexity using the O(·) notion (you can assume n log n complexity for sorting).
(e) (cid:70) (5 points) Implement the hybrid image approach that was presented at lectures. To do this you also have to implement the Laplacian filter, filter two images (one with Gaussian and one with Laplacian filter) and combine them together. You can use images cat1.jpg and cat2.jpg as a reference since they are both of the same size. Do not convert them to grayscale, simply apply the same operations to all three channels and display the result as a color image.



