Description
Part 1: Find a least-squares binary classifier for handwritten mnist digit set, i.e. determine if an
image � is a digit � or not digit �. Use label �$ = 1 if �$ is digit � and �$ = −1 otherwise. Find
�, and � , such that �$ ≈ �+$ = ���� (�2�$ + �) = ����(�5$)
Note: The real-valued offset � in the linear expression �2�$ + � can be “folded” into vector “�”.
In order to do that, think of �2�$ + � = �2�$ + � ∗ 1 = [�8 �9 … �; �] ∙ [�$
8 �$
9 … �$
; 1]
2 ,
where �$’th image is written as a 1-D column vector of length � = 289: �$ = [�$
8 �$
9 … �$
; 1]
2.
Count the number of correctly identified hand-written digits using your classifier. You can use
either the same “training” set, or “test” set.
Part 2: Modify the LS setting in Part 1 to include the regularization parameter:
A(�5$ − �$)9
B
$C8
+ �‖�‖9
Display the image corresponding to � for different values of �. Do you see any change?
Count the number of correctly identified hand-written digits using your classifier. Is there any
change of that count as you vary �? How does that compare with result in Part 1?
Part 3: Compute 10 largest eigenvalues and corresponding eigenvectors of covariance matrix for
the selected digit � of the “train” data set. Display the eigenvectors as 28×28 images.
Part 4: Extend the linear formulation of Least Squares in Part 1 to a non-linear LS:
A(�5$ − �$)9
B
$C8
,
where �5$ = �2 �(�$) + � (instead of �5$ = �2 �$ + � as in Part 1) and �(�) being a Logistic
function:
�(�) = 1
1 + �IJ(KIKL) ,
Where parameter � represents the rate of steepness of the logistic curve and �M corresponds to
the midpoint of the sigmoid �(�) (c.f. figure above with �M = 0 and � = 1).
The logistic function �(�) replaces the ����(x) used in linear formulation and since the range of
�(�) is [0,1] (vs. [-1,1] for ����), the labels should be: �$ = 1 if �$ is digit � and �$ = 0
otherwise (vs. 1 and -1 used in linear case).
You can use ‘least_squares’ function available in scipy.optimize (with ‘lm’ option for
Levenberg-Marquardt algorithm) to find parameters �. The rate � of the logistic function needs
to be hand-picked though. Experiment with different values for �.
Compare with results in Part 1.
Also, experiment with adding a normalization, as in Part 2, penalizing large norms of �:
∑ (�5$ − �$) B 9
$C8 + �‖�‖9, and compare with results in Part 2.
Material for Part 3:



