Description
Assignment overview
Three-dimensional (3D) object recognition is a technique for identifying objects in images or point clouds. The goal of such techniques is to teach a computer to gain a level of understanding of what an image contains. We can use a variety of machine learning or deep learning approaches for object recognition. In this assignment, you will work with two sets of popular approaches: the hand-crafted methods and the deep transfer learning approach. Figure 1 shows the abstract architecture of these approaches.
Cognitive science revealed that humans learn to recognize
object categories ceaselessly over time. This ability allows Figure 1: Abstract architecture of (top) hand-crafted and (bottom) them to adapt to new environments, by enhancing their deep learning techniques for object recognition. knowledge from the accumulation of experiences and the conceptualization of new object categories. Taking this theory as an inspiration, we seek to create an interactive object recognition system that can learn 3D object categories in an open-ended fashion. In this project, “open-ended” implies that the set of categories to be learned is not known in advance. The training instances are extracted from on-line experiences of a robot, and thus become gradually available over time, rather than being completely available at the beginning of the learning process.
In this assignment, students have to optimize an open-ended learning approach for 3D object recognition and get familiar with the basic functionalities of ROS. We break this assignment down into two parts:
- The first part is about optimizing offline 3D object recognition systems, which take an object view as input and produces the category label as output (e.g., apple, mug, fork, etc).
- The second part of this assignment is dedicated to testing object recognition approaches in an openended fashion. In this assignment, the number of
categories is not pre-defined in advance and the Figure 2:lated teacher and the learning agent.Abstract architecture for interaction between the simuknowledge of agent/robot is increasing over time by interacting with a simulated teacher using three actions: teach, ask, and correct (see Fig. 2).
Further details of these assignments are explained in the following sections. To make your life easier, we provide a virtual machine that has all the necessary programs, codes, dataset, libraries, and packages. We also offer template codes for each assignment.
If you are not familiar with the concept of ROS, please follow the beginner level of ROS Tutorials. For all student, going over all basic beginner level tutorials is strongly recommended.
Z I recommend installing MATLAB on your machine since the output of experiments are automatically visualized in MATLAB. You can download it from download portal or use an online version provided by the university. As an alternative, we also provide a python script to visualize the generated MATLAB plots automatically.
Part I: Offline 3D object recognition setting (50%)
In this assignment, we assume that an object has already been segmented from the scene and we want to recognize its label. We intent to use an instance-based learning (IBL) approach to form new categories. From a general perspective, IBL approaches can be viewed as a combination of an object representation approach, a similarity measure, and a classification rule. Therefore, we represent an object category by storing the representation of objects’ views of the category. Furthermore, the choice of the object representation and similarity measure have impacts on the recognition performance as shown in Fig. 3.
Figure 3: The components used in a 3D object recognition system.
In the case of the similarity measure, since the object representation module represents an object as a normalized histogram, the dissimilarity between two histograms can be computed by different distance functions. In this assignment, you need to select 5 out of 14 distance functions that are dissimilar from each other. This policy will increase the chance that different functions lead to different results. The following 14 functions have been implemented and exist in the RACE framework:
Euclidean, Manhattan, χ2, Pearson, Neyman, Canberra, KL divergence, symmetric KL divergence, Motyka, Cosine, Dice, Bhattacharyya, Gower, and Sorensen.
Z For the mathematical equations of these functions, we refer the reader to a comprehensive survey on distance/similarity measures provided by S. Cha (1).
The main intuition behind using instance-based learning in this study is that, IBL serves as a baseline approach for evaluating the object representations used in object recognition. More advance approaches, e.g., SVM-based and Bayesian learning, can be easily adapted.
To examine the performance of an object recognition, we provide a K-fold cross-validation procedure. K-fold crossvalidation is one of the most widely used methods for estimating the generalization performance of a learning algorithm. In this evaluation protocol, K folds are randomly created by dividing the dataset into K equal-sized subsets, where each subset contains examples from all the categories. In each iteration, a single fold is used for testing, and the remaining nine folds are used as training data. For K-fold cross-validation, we set K to 10, as is generally recommended in the literature. This type of evaluation is useful not only for parameter tuning but also for comparing the performance of your method with other approaches described in the literature.
L What we offer for this part
- A detail instruction about how to run each of the experiments
- A ROS-based cpp code for 10 fold-cross validation: we have implemented a set of object representationapproaches and different distance functions for object recognition purpose. You need to study each approach in depth and optimize its parameters.
- A ROS-based cpp code for K-fold-cross validation with various deep learning architectures as object representa-tion and a set of distance functions for object recognition purpose. You need to study each approach in depth and optimize its parameters.
- Sample bash scripts for running a bunch of experiments based on GOOD descriptor (hand-crafted), and Mo-bileNetV2 architecture (deep transfer learning), find them in rug_kfold_cross_validation/result
- -A python script to visualize the confusion matrix as the output. Runp PATH_TO_EXP_DIR/ –offline to visualize the confusion matrix. You can use [-h] to see the instruction.python3 matlab_plots_parser.py
L How to run the experiments
We created a launch file for each of the mentioned object recognition Algorithms. A Launch file provides a convenient way to start up the roscore, and multiple nodes and set the parameters’ value (read more about launch file here). Before running an experiment, check the following:
- Yourug_kfold_cross_validation/launch/kfold_cross_validation.launchhave to update the value of different parameters of the system in the launch) file (e.g.,
Z You can also set the value of a parameter when you launch an experiment using the following
command: $ roslaunch package_name launch_file.launch parameter:=value This
option is useful for running a bunch of experiments using a bash/python script
Z The system configuration is reported at the beginning of the report file of the experiment. Therefore, you can use it as a way to debug/double-check the system’s parameters.
For the hand-crafted based object recognition approaches:
After adjusting all necessary parameters in the launch file, you can run an experiment using the following command:
$ roslaunch rug_kfold_cross_validation kfold_cross_validation_hand_crafted_descriptor.launch For the deep transfer learning based object representation approaches:
After adjusting all necessary parameters in the launch file, you need to open three terminals and use the following commands to run a deep transfer learning based object recognition experiment:
í MobileNetV2 Architecture
$ roscore
$ rosrun rug_deep_feature_extraction multi_view_RGBD_object_representation.py mobileNetV2
$ roslaunch rug_kfold_cross_validation kfold_cross_validation_RGBD_deep_learning_descriptor.launch ortho graphic_image_resolution:=150 base_network:=mobileNetV2 K_for_KNN:=3 name_of_approach:=TEST í VGG16 Architecture
$ roscore
$ rosrun rug_deep_feature_extraction multi_view_RGBD_object_representation.py vgg16_fc1
$ roslaunch rug_kfold_cross_validation kfold_cross_validation_RGBD_deep_learning_descriptor.launch ortho graphic_image_resolution:=150 base_network:=vgg16_fc1 K_for_KNN:=3 name_of_approach:=TEST
L What are the outputs of each experiment
- Results of an experiment, including a detail summary, and a confusion matrix (see Fig. 5 and 4), will be saved in:
$HOME/student_ws/rug_kfold_cross_validation/result/experiment_1/
After each experiment, you need to either rename the experiment_1 folder or move it to another folder, otherwise its contents will be replaced by the results of a new experiment. • We also report a summary of a bunch of experiments in a txt file in the following path (see Fig. 6):
rug_kfold_cross_validation/result/results_of_name_of_approach_experiments.txt
Figure 4: Confusion matrices showing how well each model performed in object recognition task on restaurant object dataset. In each cell of a confusion matrix, we present the percentage and the absolute number of predictions. The darker diagonal cell shows the better prediction by the models.
Figure 5: A detailed summary of an experiment: the system configuration is specified at the beginning of the file. A summary of the experiment is subsequently reported. Objects that are incorrectly classified are highlighted by double dash-line, e.g., No. 9.
Figure 6: A summary of a bunch of experiments for the GOOD descriptor with diffident K and various distance functions: in these experiments, we trained all data first. We then saved the perceptual memory to be used in other experiments.
L What we offer for this part
- Provide the simulated teacher code to assess the performance of your code in open-ended settings.
- Provide a set of MATLAB/Python codes to visualize the progress of the agent (related to task #3).
$ python3 matlab_plots_parser.py -p PATH_TO_EXP_DIR/ –online
- A bash script for running a bunch of experiments (find it out in rug_simulated_user/result folder).
L How to run the experiments
Similar to the offline evaluation, we created a launch file for hand-crafted and deep transfer learning based algorithms.
However, before running an experiment, check the following items:
- You have to update the value of different parameters of the system in the relative launch file.
Z The system configuration is reported at the beginning of the report file of the experiment. Therefore, you can use it as a way to debug/double-check the system’s parameters.
For hand-crafted based object representation approaches:
After setting a proper value for each of the system’s parameter, you can run an open-ended object recognition experiment using the following command:
$ roslaunch rug_simulated_user simulated_user_hand_crafted_descriptor.launch For deep learning based object representation approaches:
Similar to the offline evaluation for deep learning based approaches, you need to open three terminals and use the following commands to run an open-ended object recognition experiment for an specific network architecture:
í MobileNetV2 Architecture
$ roscore
$ rosrun rug_deep_feature_extraction multi_view_RGBD_object_representation.py mobileNetV2 $ roslaunch rug_simulated_user simulated_user_RGBD_deep_learning_descriptor.launch ortho graphic_image_resolution:=150 base_network:=mobileNetV2 K_for_KNN:=7 name_of_approach:=TEST
í VGG16 Architecture
$ roscore
$ rosrun rug_deep_feature_extraction multi_view_object_RGBD_representation.py vgg16_fc1
$ roslaunch rug_simulated_user simulated_user_RGBD_deep_learning_descriptor.launch ortho graphic_image_resolution:=150 base_network:=vgg16_fc1 K_for_KNN:=7 name_of_approach:=TEST
To have a fair comparison, the order of introducing categories should be same in both approaches. Therefore, we design a Boolean parameter named random_sequence_generator that can be used for this purpose. Check out the script we have provided for more details.
L What are the outputs of each experiment
- Results of an experiment, including a detail summary and a set of MATLAB files (see Fig. 9), will be saved in:
$HOME/student_ws/rug_simulated_user/result/experiment_1/
After each experiment, you need to either rename the experiment_1 folder or move it to another folder, otherwise its contents will be replaced by the results of a new experiment. • The system also reports a summary of a bunch of experiments as a txt file in the following path :
rug_simulated_user/result/results_of_name_of_approach_experiments.txt
Z Each time you run an experiment, the experiment results will be automatically appended to the log file. After running a bunch of 10 experiments, you have to report the content of the log file as a table in your report, compare the obtained results, and visualize the output of the best experiment for hand-crafted and deep transfer learning experiments (as an example see Fig. 9).



