Description
Software Testing, Quality Assurance, and Maintenance Project, Version 1.1[1]
Part (I) Building an Automated Bug Detection Tool
You will learn likely-invariants from call graphs in a way reminiscent to that in SOSP’01 paper discussed in class (Coverity). In particular, you will learn what pairs of functions are called together. You will then use these likely-invariants to automatically detect software bugs.
(a) Inferring Likely Invariants for Bug Detection (40 points)
Take a look at the following contrived code segment (also seen in lecture):
void scope1() {
A(); B(); C(); D();
} void scope2() {
A(); C(); D();
} void scope3() {
A(); B(); B();
} void scope4() {
B(); D(); scope1();
} void scope5() {
B(); D(); A();
} void scope6() { B(); D();
}
We can learn that function A and function B are called together three times in function scope1, scope3, and scope5. Function A is called four times in function scope1, scope2, scope3, and scope5. We infer that the one time that function A is called without B in scope2 is a bug, as function A and B are called together 3 times. We only count B once in scope3 although scope3 calls B two times.
We define support as the number of times a pair of functions appears together. Therefore, support({A,B}) is 3. We define confidence({A,B},{A}) as supportsupport({(A,B{A})}), which is . We set the threshold for support and confidence to be T SUPPORT and T CONFIDENCE, whose default values are 3 and 65%. You only print bugs with confidence T CONFIDENCE or more and with pair support T SUPPORT times or more. For example, function B is called five times in function scope1, scope3, scope4, scope5, and scope6. The two times that function B is called alone are not printed as a bug as the confidence is only 60% , lower than the T THRESHOLD, which is 65%. Please note that support(A,B) and support(B,A) are equal, and both 3.
Perform intra-procedural analysis. For example, do not expand scope1 in scope4 to the four functions A, B, C, and D. Match function names only.
The sample output with the default support and confidence thresholds should be:
bug: A in scope2, pair: (A, B), support: 3, confidence: 75.00% bug: A in scope3, pair: (A, D), support: 3, confidence: 75.00% bug: B in scope3, pair: (B, D), support: 4, confidence: 80.00% bug: D in scope2, pair: (B, D), support: 4, confidence: 80.00%
Generate call graphs using LLVM. Given a bitcode file, use LLVM opt to generate call graphs in plain text format, and then analyze the textual call graphs to generate function pairs and detect bugs. The opt tool is installed on the ECE machines (ecetesla0.uwaterloo.ca) as well as the CS machines. If you want to work on your own computer, you need to use 64-bit LLVM 3.0 to avoid compatibility issues. LLVM 3.0 is available as a package in many popular Linux distributions.
A sample call graph in text format is shown as follows:
Call graph node for function: ‘ap_get_mime_headers_core’<<0x42ff770>> #uses=4
CS<0x4574458> calls function ‘ap_rgetline_core’
CS<0x4575958> calls external node
CS<0x4575a50> calls external node
CS<0x4575b20> calls function ‘apr_table_setn’
CS<0x45775a8> calls external node
CS<0x45776a0> calls external node
CS<0x4577770> calls function ‘apr_table_setn’ CS<0x45783b8> calls external node
CS<0x4579dc0> calls function ‘apr_table_setn’
CS<0x4579f78> calls function ‘strchr’
CS<0x457a948> calls external node
CS<0x457aa40> calls external node
CS<0x457ab10> calls function ‘apr_table_setn’
CS<0x457d9d0> calls function ‘apr_table_addn’
CS<0x457e4e8> calls function ‘apr_table_compress’ and
Call graph node <<null function>><<0x2411e40>> #uses=0
The following explanation should help you understand the call graph:
- The above call graph represents the function ap get mine headers core calls functions ap rgetline core, apr table setn, etc.
- #uses= gives the number of times the caller is called by other functions. For example, ap get mine headers core is called 4 times.
- CS denotes call site.
- 0x????? indicates the memory address of the data structure, i.e., call graph nodes and call sites.
- An external node denotes a function that is not implemented in this object file. Ignore external node functions.
- A call graph node of null function represents all external callers. Ignore null function nodes entirely.
Performance. We will evaluate the performance and scalability of your program on a pass/fail basis. The largest program that we test will contain up to 20k nodes and 60k edges. Each test case will be given a timeout of two minutes. A timed-out test case will receive zero points. We do not recommand using multi-threading because it only provides a linear gain in performance.
Hints:
- (Very Important!) Do not use string comparisons because it will greatly impact the performance. Use a hashing method to store and retrieve your mappings.
- Minimize the number of nested loops. Pruning an iteration in the outer loop is more helpful than in the inner loop.
- None of the provided test cases should take more than 5 seconds to run.
Examine the output of your code for (a) on the Apache httpd server (test3 from part (a)) by running verify.sh. Do you think you found any real bugs? There are some false positives. A false positive is where a line says “bug …” in your output, but there is nothing wrong with the corresponding code. Write down two different fundamental reasons for as to why false positives occur in general.
Identify 2 pairs of functions that are false positives for httpd. Explain why you think each of the two pairs that you selected are false positives. The two pairs should have a combined total of at least 10 “bug …” line locations (e.g., first pair can have 4 and second pair can have 6). For example, the following sample output contains 4 locations regarding 3 pairs: (A, B) (A, D) and (B, D). You do not have to provide explanations for all 10 locations, but you will want to use at least one location to discuss each pair. Briefly explain why the pair is a false positive with reference to the source code.
bug: A in scope2, pair: (A, B), support: 3, confidence: 75.00% bug: A in scope3, pair: (A, D), support: 3, confidence: 75.00% bug: B in scope3, pair: (B, D), support: 4, confidence: 80.00% bug: D in scope2, pair: (B, D), support: 4, confidence: 80.00%
The source code of httpd has been provided on ecelinux for your convenience. It is located at: /opt/testing/apache src.
This directory contains the source code, the bitcode file (.bc), the call graph file in plain text format (.txt), and the sample output (.out) for a run of httpd. At the end of the call graph, you’ll find the file path where each function is defined, which will help you find the actual function bodies.
If you find new bugs (bugs that have not been found by other people yet), you may receive bonus points. Check the corresponding bug databases of Apache[2] to make sure that the bug you found has not been reported there already. Clearly explain why you think it is a bug and provide evidence that it is a new bug.
(c) Inter-Procedural Analysis. (10 points)
One solution to reduce false positives is inter-procedural analysis. For example, one might expand the function scope1 in the function scope4 to the four functions A, B, C, and D. Implement this solution by adding an optional fourth command line parameter to pipair. We leave the algorithm’s implementation details up to you.
Write a report (up to 2 pages) to describe your algorithm and what you found. Run an experiment on your algorithm and report the details. For example, you can vary the numbers of levels that you expand as an experiment. Provide a concrete example to illustrate that your solution can do better than the default algorithm used in (a). We will read your report and code. If you only submit a report, you will receive at most 50% of the points for this part (c). Make sure your code is well documented and your report is understandable. Follow the submission protocol in part (a) and place the code inside the pi/partC directory.
Test 3 from part (a) utilizes the same program (httpd) as part (b). Therefore, you might find it convenient to use test 3 to verify your inter-procedural experiment.
Resources for students who would like to generate a call graph with function arguments (optional):
- Check out the following post: http://stackoverflow.com/questions/5373714/generate-calling-graph-for-ccode
- Once you generated the call graph, use c++filt to demangle the function names (takes care of function overloading). After that use sed to transform the text and you will get the call graph.
- It is important that you disable C/C++ optimizations during the call graph generation. Add the following flags in “opt” to ensure there is no inlining and ensure “opt” would not optimize the function arguments: -disable-inlining -disable-opt
- You have to use clang++ to generate the bitcode.
Part (II) — Using a Static Bug Detection Tool
For this part of the project you will be using the static code analysis tool Coverity to find defects in source code.
We have acquired a Coverity license for student use. Occasionally, we will collect groups, generate Coverity credentials, and mail you a link which allows you choose a password for your credentials.
(a) Resolving Bugs in Apache Commons (10 points)
In this task, you are to triage warning messages resulting from running Coverity static analysis on Apache Commons (a library of useful Java tools). We have already compiled and analyzed a version of Apache Commons for you. The pre-analyzed code is available on Coverity’s web interface at http://ecetesla0.uwaterloo.ca: 8080 behind the firewall. Log in using username “student” and password “%Ccb2wXQ”. To view and triage warnings from the analysis, select the “Outstanding Defects” item in the navigation tree on the left hand side of the page. Click on warnings to view details of the warning and the source code that triggered the warning.
The analysis results contain warnings from both Coverity and FindBugs static analysis tools.
Your task is to triage and repair these warnings. There are 20 warnings for you to triage and repair. Triage each warning by classifying it as one of the following: False Positive, Intentional or Bug. Do not use the built-in Triage tool. Write your classification and explanations directly in your report. Below are descriptions of each triage classification:
- False Positive: The warning does not indicate a fault or deficiency.
- Intentional: You believe the warning points to code that the developer intentionally created that is incorrect (contains a fault) or considered bad practice.
- Bug: The warning points to a defect (a fault or bad practice) in the code.
If you classify the warning as False Positive, provide an explanation as to why you believe this is the case. If you classify the warning as Intentional, explain how the code can be re-factored to remove the warning, or explain why it should be left as-is. If you classify the warning as bug, provide the faulty lines (1–2 lines is often enough, including the file name and line numbers) and a description of a proposed bug fix.
The difference between False Positive/Intentional and Intentional/Bug is not always clear, so there is not necessary one unique good classification. Make sure you explain clearly your choice!
For details on each Coverity checker, read the checker documentation (see https://patricklam.ca/stqam/ coverity/#reference_material). Information on FindBugs checkers can be found at http://findbugs. sourceforge.net/bugDescriptions.html.
The report for this part should be no more than 4 pages. Be concise.
(b) Analyzing Your Own Code (10 points)
Run Coverity on your own code from part (I) (a). See instructions on https://patricklam.ca/stqam/ coverity/ to analyze your code with Coverity Connect interface. Once your project is uploaded on the web interface, you must make it inaccessible to other students: connect on http://ecetesla0.uwaterloo.ca: 8080 with your credentials. Click on “Configuration” , “Projects & Streams” and Select your project. In the “Roles” tab, click on “add” and check “No Access” for the group “Students”.
You learn more from having the tool find defects in code that you wrote, fixing the problem and seeing the defect go away. Discuss two of the bugs detected by Coverity using up to 1 page. If Coverity finds no bugs, what are the possible reasons?
Include the results from your analysis with your submission. Put them in directory pii.
You can find instructions on how to use Coverity to build and analyze your code at https://patricklam.ca/ stqam/coverity/.
[1] updated URLs
[2] https://issues.apache.org/bugzilla/
Software Testing, Quality Assurance, and Maintenance Assignment 3, version 1.0
Getting set up
We will create a copy of the starter repo for you in your git.uwaterloo.ca account. You need to log in to git.uwaterloo.ca for that to work. There is a Vagrant VM description, but you shouldn’t need it.
Tools used (all required): PMD; C++; Java plus Maven; Valgrind (not strictly required but highly recommended).
Submission summary
Here’s what you need to submit in your fork of the repo. Be sure to commit and push your changes back to git.uwaterloo.ca.
- in directory q1, your modified a1q1-automarker.xml file and java;
- in directory q2, file icalendarlib-memory.diff as described in the question;
- in directory q3, either file txt or bugreports.pdf;
- in directory shared/itext, files xml and src/test/java/com/itextpdf/text/pdf/TaggedPdfText.java.
- in directory q5, either file txt or codereview.pdf;
Once again, you may choose to move the q? directories into the shared subdirectory if you want to access them in Vagrant. We’ll mark submissions either in the original place or under shared.
Question 1 (10 points)
A few months ago, I wrote an automarker for your Assignment 1 Question 1 submissions. You will know enough to write this automarker yourself. We’ll focus on just the technically challenging part.
I’ve included PMD in the a3 skeleton and provided a template PMD a1q1-automarker.xml file. Use the following command in your VM to run your automarker:
~/shared/pmd/bin/run.sh pmd -f text -d ~/shared/q1/A1Q1Test.java -R ~/shared/q1/a1q1-automarker.xml
Your task is to write a PMD rule that detects JUnit 4 test methods which have no calls to assert methods with arguments named mockCommandSender.getLastMessage. JUnit 4 test methods have a Test annotation. Calls to assert methods are Statements with a PrimaryPrefix descendant whose name starts with “assert”. Submit your a1q1-automarker.xml file in directory q1/.
In file q1/A1Q1Test.java, write test methods that show that your query works properly; these tests should show that your query flags methods that it should and doesn’t flag methods that it shouldn’t.
Question 2 (10 points)
Recall icalendarlib from Assignment 2. I’ve made some changes to it to make it more Valgrind-friendly and again placed it in shared/icalendarlib. The code contains 4 memory errors. Using valgrind, or otherwise, find the errors and submit the diff for your changes in file q2/icalendarlib-memory.diff. I believe that two of the errors are not reachable from the current main.cpp file. You’ll need to add more code to main.cpp if you want to trigger them.
Question 3 (10 points)
In this question, you will critique and improve an existing bug report in Mozilla and write a bug report from scratch.
- Read Mozilla bug report 112785 (https://bugzilla.mozilla.org/show_bug.cgi?id=112785). What are some problems with the initial bug report (as seen in the “Title” and the “Description”)? Identify four problems. How would you improve this bug report? (5 points)
- The class q3/HashTable.java is an implementation of a hash table using linear open addressing and division in Java. This program has a bug, because the put function will not update the element associated with the given key if an entry with the same key already exists. The hash table implementation should always update the element, even if the element’s key already exists in the hash table. (See the comment in the put function).
Write a good bug report for this bug using the Bugzilla bug report format. (5 points)
(Recommended exercise, not for marks: write a JUnit test that illustrates this bug.)
Question 4 (10 points)
In this question, add unit tests to the com.itextpdf.text.pdf.TaggedPdfTest class for the add(final Element o) method of the com.itextpdf.text.List class from iText (in shared/itext). Your solution must use mock objects (with a mock object library of your choice; modify your pom.xml accordingly). Your tests must kill the following mutants:
public boolean add(final Element o) { if (o instanceof ListItem) { ListItem item = (ListItem) o;
if (numbered || lettered) { // mutant 1: || -> && Chunk chunk = new Chunk(preSymbol, symbol.getFont()); chunk.setAttributes(symbol.getAttributes());
int index = first + list.size(); // mutant 2: index = first (NOTE EDIT), mutant 3: list -> item if ( lettered )
chunk.append(RomanAlphabetFactory.getString(index, lowercase));
else chunk.append(String.valueOf(index));
chunk.append(postSymbol); item.setListSymbol(chunk);
} else { item.setListSymbol(symbol);
} item.setIndentationLeft(symbolIndent, autoindent); item.setIndentationRight(0); return list.add(item); // mutant 4: list -> item
}
else if (o instanceof List) { List nested = (List) o; nested.setIndentationLeft(nested.getIndentationLeft() + symbolIndent); first–; return list.add(nested); // mutant 5: nested -> null
} return false;
}
I’m aware that the unit tests that come with iText also kill the mutants. But they don’t use mock objects. As I wrote above, your solutions are required to use mock objects.
Here is an additional hint:
- Be aware of the EasyMock factory method notNull() and class Capture<T>.
Question 5 (10 points)
In this question, you will perform code review. You may: 1) review your own code from the past; 2) review some code that a friend provides for you; or 3) review code that I suggest, namely the com.itextpdf.text.pdf.SimpleBookmark class from iText (which we saw in Question 4). The advantage of (1) and (2) is that you have the opportunity to get your questions about the code answered. The code may be in any language but should be between 500 and 1000 lines of code. You may also review a pull request of a similar magnitude.
Your task is to apply the code review checklist at https://jesseheines.com/~heines/91.462/Resources/CodeReviewChecklists/
Binder_4documents_2016-03-20.pdf (formerly at Fog Creek, but that doesn’t seem to be on the Internet anymore.) In particular, pick 3 questions from that list and answer them for the code that you are reviewing. Explain your answer in a couple of sentences, supporting it with examples from the code that you’re reviewing. Put your solution in either file q5/codereview.md (Markdown syntax preferred) or q5/codereview.pdf.



