Description
The goals of this assignment are to 1) expand your understanding of image transformations and filtering, and 2) have you gain practical experience with image feature extraction. Here, you will implement a Harris corner feature detector, which uses a measure of 2D “cornerness” to identify salient local regions in an input image. You will also implement a simple patch-based matching procedure to identify corresponding 2D feature points for a pair of images. Skeleton Python code
(described below) has been provided for this assignment, along with images for you to experiment with.
Reading
- Szeliski 1 (intro), 3.1.1-2, 3.2 (intro), 3.2.1, 3.3.1-2
- (optional) Szeliski 3.1.3-5, 3.2.2+, 3.3.3+, 3.5
- Harris, Chris, and Mike Stephens. “A combined corner and edge detector.” Alvey Vision Conference. Vol. 15. No. 50. 1988. [link]
- Lowe, David G. “Distinctive image features from scale-invariant keypoints.”
International Journal of Computer Vision 60.2 (2004): 91-110. [link]
(This is the SIFT feature paper. You will not use SIFT in this assignment, but you should read the paper and understand the basics of the method.)
Assignment
In this assignment, you will implement the following pipeline for feature exaction and matching (specific steps are described in detail below):
- Harris “cornerness” filter for a gray-value image
- Non-maximum suppression of cornerness scores
- Corner keypoint extraction
- Patch-based feature description
- One-to-one matching using SSD and NCC
Steps for Feature Detection (Harris Corner Detection)
The general steps of Harris corner detection are as follows:
- Convert input image 𝐼𝑅𝐺𝐵(𝑥, 𝑦) to grayscale image 𝐼(𝑥, 𝑦).
- Apply a “cornerness” filter ℎ(𝐼). This results in an image 𝑅(𝑥, 𝑦) with larger values corresponding to image points that are more likely to be corners. The filter ℎ is formed from a series of independent filter steps:
- Compute the image gradients 𝑋 = 𝐼𝑥(𝑥, 𝑦) and 𝑌 = 𝐼𝑦(𝑥, 𝑦). It is suggested you use the Sobel filter for increased robustness to noise.
- Compute the matrix-valued image 𝑀(𝑥, 𝑦), as defined by Harris and Stephens. For the Gaussian weighting, consider using SciPy’s gaussian_filter
- Compute 𝑅(𝑥, 𝑦) = det(𝑀(𝑥, 𝑦)) − 𝑘 tr(M(x, y))2, for some constant 𝑘.
- Apply non-maximum suppression to 𝑅(𝑥, 𝑦), keeping only pixel locations that have the strongest response within their 𝑤 × 𝑤 pixel neighborhood. Consider using a maximum filter for this operation.
- Return the (𝑥, 𝑦) coordinates of the strongest corner response maxima, according to some thresholding operation. Here, we will simply select as features the up-to-𝐾 strongest maxima with response 𝑅(𝑥, 𝑦) > 0. Note that the keypoints should have integer pixel positions.
Feature Description and Keypoint Matching
We will take a very simple approach to feature description: For each keypoint, take the 𝑛 × 𝑛 image patch of 𝐼(𝑥, 𝑦) centered at that keypoint. For keypoint matching between two images, you should implement two methods for comparing descriptors:
- Sum-of-squares difference (SSD)
- One minus normalized cross-correlation (1 – NCC; this assigns a distance of zero for identical descriptors)
Given two images, compute the distance of every feature in the first image to every feature in the second image and store the result in a match matrix, with rows corresponding to first-image features and columns corresponding to second-image features.[1] Then, compute keypoint correspondences using one-to-one matching:
One-to-one Matching: For each keypoint in the first image, find the most similar keypoint in the second image. Repeat this for the keypoints in the second image against the first. Keep the feature correspondences that are mutual best matches.
Note: While you could use SciPy’s cdist function to compute the pairwise scores (i.e., SSD and 1 – NCC), please implement the metrics on your own for this assignment.
Summary: Algorithm Parameters
- 𝜎 : standard deviation of the Gaussian filter when computing 𝑅(𝑥, 𝑦)
- 𝑘 : typical values range from 0.05 to 0.15
- 𝑤 : for non-maximum suppression, use a fixed window size of 𝑤 = 11px
- 𝐾 : maximum number of keypoints to return (note: only return points where 𝑅(𝑥, 𝑦) > 0)
- 𝑛 : for feature description, use a fixed window size of 𝑛 = 7px
- Matching method: either SSD or (one minus) NCC
Code There are four files provided to help you get started:
- py: This file is already set up to extract and match features for two images, and to display the result. For usage, execute “python main.py –help”.
- py: This file contains skeleton code for feature extraction. You will need to fill in two functions (respecting the algorithm parameters already defined in the class): o compute_corner_response() : Given grayscale image 𝐼(𝑥, 𝑦), compute 𝑅(𝑥, 𝑦).
- get_keypoints() : Given 𝑅(𝑥, 𝑦), apply non-maxima suppression and return the strongest up-to-𝐾 keypoints having 𝑅(𝑥, 𝑦) > 0. The 𝑁 selected keypoints are returned as an 𝑁 × 2 array, with each row giving the (𝑥, 𝑦) coordinate of a detected keypoint.
- py: This file contains skeleton code for feature matching. You will need to fill in four functions (respecting the algorithm parameters already defined in the constructor of the FeatureMatcher class):
- get_descriptors() : Given an image and a set of keypoint locations, extract a feature descriptor for each keypoint. You should decide how to obtain descriptors for keypoints near the edge of the image. For an 𝑁 × 2 array of keypoints and a
descriptor window of size 𝑛, this function should return an 𝑁 × 𝑛2array of keypoint descriptors, with each row consisting of a flattened image patch. o match_ssd() : Compute a match matrix for two sets of descriptors using SSD. o match_ncc() : Compute a match matrix for two sets of descriptors using 1 – NCC.
- compute_matches() : Given a match matrix, compute one-to-one matches. Given 𝑀 selected matches, this function should return an 𝑀 × 2 array, where each row contains a pair of corresponding keypoint indices. For example, if the first row of the matches array contains indices (𝑖, 𝑗), the associated keypoints are the 𝑖th keypoint of the first image and the 𝑗th keypoint of the second image.
- py: This file is already set up to display keypoints and matches.
[1] For example, the first column will contain the distance from the first keypoint in the first image to every keypoint in the second image. (Note that the keypoint ordering for each image is arbitrary, but fixed.)



