[SOLVED] CS373-Homework 3 Perceptron-NBC

20.99 $

Category:

Description

5/5 - (2 votes)

1.1        Theory

  1. Perceptron model discussed in class is shown as:

(

1       if Σwjxj > 0

f(x) =

0       if Σwjxj ≤ 0

The bias term is missing in the shown equation. Write the equation with the bias term included. Is perceptron with a bias term more expressive (can represent more classification scenarios) compared to the one without bias? Why or why not?

P.T.O

  1. Given below are four figures that show the distribution of data points with binary classes. The two colors denote the different classes. For each of these, reason whether the following would give a high(> 0.95) classification accuracy.
    • a perceptron without bias
    • a perceptron with bias
  2. (5 points) What is the update rule for the bias term b of a vanilla perceptron, given the learning rate γ and gold label y, when the classifier label doesn’t match the gold label during training? What is the update rule when the classifier label matches the gold label?

1.2        Implementation

You need to implement a vanilla perceptron and an averaged perceptron model for this part. Both the models should be implemented with a bias term. This part could be completed by editing only perceptron.py, unless you plan to extract any new features for the task. You need to initialize the parameters ( init () method), learn them from training data (fit() method) and use the learned parameters to classify new instances (predict() method) for each of the models. Take note that init .py, main.py and classifier.py may be replaced while grading. Do not make any modifications to these files. You need to follow the description of the models discussed in the lecture slides (link). Report the results obtained on the given test set for both the models in your report. You should submit your code with the hyperparameter settings (config.py) that produce the best performance.

2         Naive Bayes

2.1        Theory

  1. Given a text document d which is a sequence of words w1,w2,…,wn, we want to compute P(c+|d) and P(c|d). We use Bayes theorem to estimate the probabilities. Compute the equation for P(c+|d) in terms of P(d|c+) using Bayes theorem.

P.T.O

  1. To estimate P(d|c+) using the training data without making any assumptions, we need impractically large amounts of data. Let us say that the size of the vocabulary is V and length of all documents is exactly l (we can ensure this by padding shorter texts with dummy tokens). In a binary classification task, how many parameters do we need to learn, in order to correctly estimate P(d|c+) for any given document without making independence assumptions?
  2. If we make the unigram assumption, that is, if we assume that occurrence of each word in the document is independent of the other words, then how many parameters do we need to learn to be able to estimate P(d|c+)?
  3. In a binary text classification task, mention the equations you would use to estimate P(c+) and P(c) from the training data (c+ and care the two classes for the classification problem).

2.2        Implementation

You need to implement a Naive Bayes classifier for the given task. This part could be completed by editing only naive bayes.py, unless you plan to extract any new features for the task. You need to initialize the parameters ( init () method), learn them from training data (fit() method) and use the learned parameters to classify new instances (predict() method). Take note that init .py, main.py and classifier.py may be replaced while grading. Do not make any modifications to these files. You need to follow the description of the model discussed in the lecture slides (link). Report the results obtained on the given test set in your report. You should submit your code with the hyperparameter settings (config.py) that produce the best performance.

3         Analysis

You need to perform additional experiments to answer the following questions. You don’t need to submit your code for this part. You only need to include your plots and discussions in your report. Make sure that the code you submit doesn’t include any changes you don’t want to be included, as that might affect your chances of getting extra credit.

  1. Remove the bias term from the Perceptron and Averaged Perceptron and report the performance of the models. Is the performance better or worse than the respective models with a bias term? Discuss in one sentence.
  2. Plot a graph of test accuracy vs number of iterations for Averaged Perceptron for number of iterations = { 1, 2, 5, 10, 20, 50 }. Does Perceptron converge i.e. does the training accuracy become 100? If yes, after how many iterations? If not, why?
  3. Plot graphs of Naive Bayes test accuracy vs vocabulary size, Averaged Perceptron test accuracy vs vocabulary size for vocabulary sizes = {100, 500, 1000, 5000, 10000, 20000}. How does the performance change? Discuss in two sentences.

4         Time Limit

Your code must terminate with in 2 minutes for all 3 models combined together. If it doesn’t terminate with in 2 minutes, you will be graded for the output your code generates at the end of the 2 minute time limit.

5         Extra Credit

Your submission will be evaluated on a hidden dataset of similar nature. The hidden dataset is of similar size and splits. Top-10% (17 in number) of the students in the class would be awarded 10 extra credit points. You are not allowed to use any optimization libraries but you are allowed to use feature extraction software. If in doubt, ask a question on piazza and the instructors would respond. Remember that the extra credit only depends on the results on the hidden dataset. Overfitting the given dataset might prove counter-productive.