[SOLVED] CS4243 Assignment 4

35.00 $

Category:

Description

5/5 - (1 vote)

This lab has four parts. For each part, implement the outlined functions in lab4.py and run them in the notebook lab4.ipynb. For the Q&A in Part 4, write your answers directly into the notebook. Expected outputs are supplied as reference but your results need not match exactly.
Upload to LumiNUS your completed lab4.py and lab4.ipynb by zipping them into a file named AXXX1_AXXX2.zip, where AXXX is the student number of the group members. Submit one file per group. Missing files, incorrectly formatted code that does not run, etc. will be penalized.
Note: You are free to re-use functions in previous labs and use any NumPy or OpenCV functions, except the built-in functions for mean-shift e.g. cv2.meanShift. For k-means clustering, you can use KMeans from scikit-learn. Please do not import other high-level computer vision libraries. We provide default parameter values but you are free to adjust these.
Objective

(a)RGB Image (b) Detected Keypoints (c) Clustered Points (d) Lattice Model (e) Final Output
Figure 1: Overview of translation symmetry detection. From the original image (a), the detected keypoints (b) are clustered based on similar appearance (c). RANSAC is applied to find vector pairs representing the lattice structure (d) before scoring the proposals and generating the final lattice (e).
Part 1: Shi-Tomasi Keypoint Detection and Clustering (30%)
We cluster keypoints based on the appearance of a local patch of β„Žπ‘ Γ— 𝑀𝑝 around the keypoint to find repeating patterns. Keypoints from the same cluster should be located at the same relative positions for a repeating pattern (see Fig 1c, 1d, the clustered keypoints appear at the fence grid crossing).
o Patch-wise keypoint detection: Given an image of size (β„Ž, 𝑀) , divide the image into approximately 50×50 patches and apply keypoint detection to each patch individually. Your image should have 𝑃 βˆ™ 𝑄 patches, where 𝑃 β‰ˆ (β„Ž / 50) and 𝑄 β‰ˆ (𝑀 / 50).

o Quality decay: 𝜌 thresholds the minimum acceptable quality for detected keypoints. For each patch, lower 𝜌 progressively by a factor π›ΎπœŒ < 1. Stop lowering 𝜌 when 𝑁𝑝, a required number of keypoints for each patch, is found or when 𝜌 falls below some threshold 𝜏𝜌.
β€’ cluster() the patches with mean-shift clustering with bandwidth β„Ž to produce a coarse set of clusters 𝐢𝑖, 𝑖 = 1, … , 𝑁𝑐. To avoid 𝑁𝑐 becoming too large, e.g., 𝑁𝑐 > 𝑁 / 3 , tune the bandwidth by gradually increasing β„Ž by a factor π›Ύβ„Ž > 1. Limiting the number of clusters ensures that resulting clusters are not too small with too few points per cluster.
β€’ Refine the mean shift clusters with the following conditions:
o Discard small clusters, i.e. those with less than 𝜏1 points.
o Partition large clusters with more than 𝜏2 points by applying k-means, 𝐾 = 𝑁𝑖 / 𝜏2, where 𝑁𝑖 is the number of points in the 𝑖-th cluster.

Part 2: Lattice Model Proposals and Evaluations (20%)
For a cluster of similar keypoints, we can solve for a lattice model, i.e. after applying some global transformation, inlier keypoint should fall onto a uniformly spaced grid (see Figure 2a). The grid of the lattice model is defined by a starting point π‘π‘œ and its associated basis vector pair (𝑑1,𝑑2) (see yellow points in Figure 2a). Depending on the number of clusters present, many proposals will be generated. Use an appearance score (A-score) to evaluate the proposals. The A-score is the average per-pixel standard deviation among the aligned texels (see Figure 2c); it gives low scores to texels that are similar pixel-wise. We aim to find proposals with the lowest A-scores.

Figure 2: Overview of lattice model proposal. And proposal scoring.
β€’ get_proposal()applies RANSAC to find the most suitable basis vector pair for each cluster. For each cluster 𝐢𝑖, 𝑖 = 1, … , 𝑁𝑐:
1. β€œRandomly” sample a triplet (denoted as yellow points in Figure 2a), with priority given to points close to each other. Denote the three points as {π‘Ž, 𝑏, 𝑐}. Define as point π‘Ž the point opposite the longest edge of the triangle formed by the three points.

π‘š1 π‘š2 π‘š3
𝑀 = [π‘š4 π‘š5 π‘š6]. To solve for M, use cv2.getAffineTransform .
0 0 1
2. Apply 𝑀 to transform the 𝑋 points in the cluster closest to π‘Ž. Count the number of inliers (Figure 2a, red points) whose transformed coordinate falls within a threshold πœπ‘Ž of an integer position (𝑖, 𝑗) on the lattice grid. Remaining points are labelled as outliers (Figure 2b, blue points), i.e. those which are either too far or do not sit on the grid.
3. Repeat Step 1-3 for π‘π‘Ž times and return the triplet {π‘Ž, 𝑏, 𝑐} with the largest number of inliers.
β€’ find_texels()forms a set of texels {π‘‡π‘˜,π‘˜ = 1, … , 𝑛} of shape (π‘ˆ, 𝑉) based on the inlier keypoint set and their associated integer positions. Each texel is defined by 3 or 4 inlier keypoints on the corners. You are provided code in lab4.ipynb to project the texels from the image space into uniform square patches. For each patch, we compute a perspective transformation and warp it with cv2.getPerspectiveTransform and cv2.warpPerspective (please be careful about the coordinate ordering of these functions). Normalize the intensity of texels so that they have a mean 0.0 and a standard deviation of 1.0 before the calculation of A-score.
β€’ score_proposal()computes an appearance score for the texels of a proposal by measuring the standard deviation of the aligned pixels. For a set of 𝑛 texels, the score is defined as:
π‘ˆ,𝑉
A-score 𝑒,𝑣 , … , π‘‡π‘˜ , … , 𝑇𝐾 ,
π‘ˆ 𝐾
where π‘‡π‘˜,π‘˜ = 1, … , 𝐾 are the texels and are the local pixel indices of the texel. The denominator weighs the score based on π‘ˆ 𝑉, the number of pixels in the warped texel; 𝐾 is the number of texels and std() is standard deviation function. In case of RGB texels, compute the Ascore for each channel and then compute an average on the three channels. Keep the 3 proposals with the lowest A-scores for further processing to generate the lattice.

Part 3: Generate Lattice (20%)
The proposals from Part 2 can be applied to capture the repetitive patterns to generate the lattice:
points from local maxima
Figure 3: Overview of lattice generation.
β€’ template_match() estimates a template and performs template matching. The template is defined by a patch 𝑇 which is the smallest rectangle that encloses the corner coordinates {π‘Ž, 𝑏, 𝑐, 𝑑} of the proposal, where the point 𝑑 can be predicted from {π‘Ž, 𝑏, 𝑐} (see Figure 3a). Perform template matching over the entire image and perform non-maximum suppression to isolate the local peaks similar to the template 𝑇 of Lab 1 (see Figure 3b).
β€’ maxima2grid()estimates 4 lattice grid points from each local maxima. The maxima coincides with the template center; based on the locations of {π‘Ž, 𝑏, 𝑐, 𝑑} on the original template, estimate the analogous grid points {π‘Žβ€²,𝑏′,𝑐′, 𝑑′} for each found maxima (see Figure 3c).
β€’ refine_grid()Merge any β€œduplicate” grid points from neighbouring maxima, e.g. 𝑏′ with the 𝑐′ of a maxima to the left, and π‘Žβ€² with 𝑑′ of a neighbour below, by taking the mid-point between the two to recover a set of unique lattice points. Interpolate missing grid points e.g. based on weak maxima (see Figure 3d).
β€’ grid2latticeunit()converts each found lattice grid point in image coordinates to the integer lattice grid shown in Figure 2a. For example, the points {π‘Ž, 𝑏, 𝑐} from the proposal corresponds respectively to {(0,0), (1, 0), (0,1)}. You can implement this any way you want, but clearly describe your intuition in the notebook. We show sample results in the notebook.
o Hint: Use the affine transformation to build the mapping. Consider multiple transformations to be applied locally in one region and then extend progressively to cover the entire image.
o Note that the lattice grid point location depends highly on the keypoint clusters. Grid points that coincide with any repeated patterns are all acceptable.
β€’ draw_lattice()visualizes the grid points and their associated integer grids found by refine_grid() and grid2latticeunit()(see Figure 3e). This is provided for you.

Part 4: Exploratory Questions (30%)
β€’ Patch-wise keypoint detection: Instead of detecting keypoints in a patch-wise manner, apply cv2.goodFeaturesToTrack to the entire image fence.jpg directly. Compare the difference when you take 𝑁 = (𝑃 βˆ™ 𝑄) βˆ™ 𝑁𝑝 points for the entire image vs. considering 𝑁𝑝 points for each (𝑃 βˆ™ 𝑄) patch. Briefly describe the difference in outcome and explain why differences arise. What are the benefits of patch-wise keypoint detection for translation symmetry detection?
β€’ Clustering: Part 1 groups the points first with mean-shift clustering and then refines with a Kmeans clustering. For the first stage, explain the benefits of applying mean-shift instead of Kmeans. For the refinement stage, why do we discard the small clusters? Please briefly describe the relationship between the radius c and running time in the efficient implementation of mean-shift clustering.
β€’ Affine transformation: In Part 2, affine transformation and RANSAC are used to find proposals. In the function of get_proposal(), only the points closest to π‘Ž are transformed. Why don’t we transform all points in the cluster to count the number of inliers?
β€’ Different samples: Please apply your algorithm to at least 3 other images in the inputs folder. For each image, select two results you think are the best and append them to the notebook. In this question, you do not need to select the two results with the lowest A-score. Instead, you should notice that the A-score does not really give us the best results. Please briefly explain the disadvantages of A-score.
β€’ Hard samples: Generate lattices for wallpaper.jpg and house.jpg and append your results to the notebook. Explain your outcome based on your method for generating the lattices. For those methods which succeed, explain which aspect of the lattice generating algorithm allows it to work on these samples. For those methods which fail, explain why your algorithm fails and possibilities for improvement. Possibilities for failure include not detecting the full lattice structure (some repetitive patterns are not detected), or a distorted structure mis-aligned to the underlying image.