[SOLVED] VE280-Project Four

30.00 $

Category:

Description

Rate this product

Project Four: Binary Tree

II.   Introduction

Binary tree is a data structure that is commonly used in computer science. A typical binary tree structure is shown in the picture below:

 

https://en.wikipedia.org/wiki/Binary_tree

As the picture shows, each circle in the tree is a “tree node”. A tree node contains its own information, as well as “links” (pointers) to its child nodes (nodes linked below it). As the name of “binary tree” suggests, one tree node can have at most two children, namely, the left child and the right child. Notice that a node does not contain any information or link to its parent node (node linked above it). Therefore, the only way to search for a node from a binary tree is to visit nodes recursively from the root node (the top node in the tree).

Binary trees are often useful for searching and sorting, and you are going to learn more about it in Ve281. In this project, you will get familiar with binary trees by implementing some basic operations on them.  Also, you will explore an interesting application of binary tree: dictionarybased compression.

 

Part II. Dictionary-Based Compression

Information theory developed by Shannon provides the theoretical bound for any lossless compression algorithm. Also, Shannon created a simple coding algorithm called Shannon code which was proved to meet the theoretical bound in terms of mathematical expectation. However, researchers wanted to find an algorithm that could compress and decompress quickly in practice. In 1978, Ziv and Lempel developed an easy but powerful algorithm based on dynamic dictionary and its variation is still used in zip files nowadays. In Part II, we will use our binary tree to implement this algorithm.

In this project, for simplicity, we only consider strings with 0s and 1s, such as “11110100100111”. To compress a string, a binary tree to encode it is built iteratively. A path from the root node in this tree corresponds to a substring: moving to the left child corresponds to 0 while moving to the right child corresponds to 1. See the next figure for an example. The compression algorithm that builds this binary tree and the decompression algorithm are described next.

 Compression

The compression algorithm starts from a binary tree with only a root node labeled by #0. Initially, the whole string to be compressed is marked as unprocessed. At each step, given the current binary tree and the unprocessed part of the string, we first find the longest substring  that appears in the tree and then update the binary tree with the next extra character from the unprocessed string, if any. See the nextfigure showing an example with input string “11110100100111”.

The substring in gray corresponding to “11110” is the processed part and we have built the current binary tree (leftmost tree of figure) in the previous steps using that processed part. First, as shown in the figure above, we find the longest substring that occurs in the treeby reading the characters of the unprocessed part one by one. The longest substring is 10 marked in blue. Second, we add a new node into the tree by reading the next extra character from the unprocessed part. In this example, this character is 0 marked in red. Therefore, here, we add a left child. Please note that all the nodes are labeled according to the order in which they are added. This node is the fourth node that is added. So, we label it as #4. In addition to building this binary tree, the compression algorithm outputs the compressed string. This string is expanded every time a new node is added to the binary tree with a pair, (node_index, char), where node_index is the label of the new node’s parent and char determines whether the node is the left or right child. In this example, the pair is (#3, 0) (0 means left and 1 means right).

The figure below gives a complete example illustrating the algorithm for the input string “11110100100111”. For each step, the green string is the longest substring and the red one is the extra character. For the last part, the remaining string may  not be long enough to create a new node. So, we can directly output the node’s label corresponding to the remaining string. In this example, it is (#2,@). Here, @ represents empty. If there is no remaining string, just output (#0, @).

 

The pseudo code of the compression algorithm is given below:

Build a tree with node #0 w ← “”

while (read a character c) {     w’ ← w + c

if w’ exists in the tree         w ← w’     else      n ← node corresponding to w

add a child for n and output the child node information         w ← “”

}

Output the node corresponding to w

 

Decompression

The procedure for decompression is just the inverse of compression. We need to build the tree according to the pairs (node_index, char). Noting that the pairs are outputted in order, we only need to retrieve the original string by printing the path of the new node in order. The detail is again shown on an example illustrated in the next figure.

 

 

 

The pseudo code of the decompression algorithm is given below:

Build a tree with node #0 while (read a pair (node_index, c)) { n ← node labeled by parent;

add a child for n according to c (c is 0 or 1 or @) print the path from root to this child;

}