Description
Task: Multiclass Classification M M M AH AH SH SH IH IH IH N N N N …
Framewise phoneme prediction from speech.
What is a phoneme?
A unit of speech sound in a language that can serve to distinguish one word from the other.
- bat / pat , bad / bed
- Machine Learning → M AH SH IH N L ER N IH NG
Data Preprocessing
Acoustic Features – MFCCs (Mel Frequency Cepstral Coefficients)
shape (11,39) label
More Information About the Data
Since each frame only contains 25 ms of speech, a single frame is prev frames future frames unlikely to represent a complete phoneme
- Usually, a phoneme will span several frames flatten reshape to (11,39)
- Hint: post-processing may help
- Concatenate the neighboring phonemes for training
- In this HW, we concatenate the past and the future five frames for training (total 11 frames)
○ You may reshape the input (1,429) back to (11,39) to get separated 11 frames
○ Just remember that the label corresponds to the center frame
- Finding testing labels or doing human labeling are strictly prohibited!
Introduction to Digital Speech Processing
Dataset & Data Format
- Dataset: TIMIT Acoustic-Phonetic Continuous Speech Corpus
○ Phonetically balanced for English
- Data Format (The TAs have already preprocessed the data) timit_11/
- npy → training data (# of training frames, 11 x feature dim)
- npy → framewise phoneme label (0-38)
- npy → testing data (# of testing frames, 11 x feature dim) ● Acoustic features (39-dim MFCC)
○ Concatenate the past and the future five frames (feature dim = 11 x 39)
○ The phoneme label of each input corresponds to the center frame
- Using additional data is prohibited. Your final grade will be multiplied by 0.9!
| Class | Phoneme | Example | Class | Phoneme | Example | Class | Phoneme | Example |
| 0 | iy | beet | 13 | l | lay | 26 | dx | muddy |
| 1 | ih | bit | 14 | r | ray | 27 | g | gay |
| 2 | eh | bet | 15 | y | yacht | 28 | p | pea |
| 3 | ae | bat | 16 | w | way | 29 | t | tea |
| 4 | ah | but | 17 | er | bird | 30 | k | key |
| 5 | uw | boot | 18 | m | mom | 31 | z | zone |
| 6 | uh | book | 19 | n | noon | 32 | v | van |
| 7 | aa | bob | 20 | ng | sing | 33 | f | fin |
| 8 | ey | bait | 21 | ch | choke | 34 | th | thin |
| 9 | ay | bite | 22 | jh | joke | 35 | s | sea |
| 10 | oy | boy | 23 | dh | then | 36 | sh | she |
| 11 | aw | bout | 24 | b | bee | 37 | hh | hay |
| 12 | ow | boat | 25 | d | day | 38 | sil | silence/closure sounds |
Sample Code
Colab Link:
https://colab.research.google.com/github/ga642381/ML2021-Spring/blob/main/HW 02/HW02-1.ipynb ● Simple baseline
○ You should able to pass the simple baseline using the sample code provided.
- Strong baseline
○ Model architecture (layers? dimension? activation function?)
○ Training (batch size? optimizer? learning rate? epoch?)
○ Tips (batch norm? dropout? regularization?)
2 Hessian Matrix
Task Introduction
Task: Hessian Matrix
Imagine we are training a neural network, and we try to find out whether the model reaches a local minima-like point, saddle point, or none of the above. We can make our decision by calculating the Hessian matrix. What is Hessian?
Hessian is the second order partial derivatives of a model. It is highly recommended to watch the lecture video before starting this part.
Task Introduction
The target function in this task is a one-variable sinc function.
You will get
- a model checkpoint trained by TA, ● a batch of training data, ● a loss function.
You will calculate the Hessian matrix and make the decision accordingly.
Gradient Norm / Minimum Ratio
1. Gradient Norm
In a normal training process, we rarely have gradients equal to zero. In this homework, we regard those gradient norm less than 1e-3 as zero.
2. Minimum Ratio
For an ideal local minima, all the eigenvalues of the hessian matrix are greater than zero. We define the proportion of positive eigenvalues as minimum ratio.
In this homework, if minimum ratio is greater than 0.5 and gradient norm is less than 1e-3, then we assume that the model is at “local minima like”.
Gradient Norm / Minimal Ratio
In this homework, we assume that
- gradient norm < 1e-3 and minimum ratio > 0.5 => local minima like, ● gradient norm < 1e-3 and minimum ratio <= 0.5 => saddle point, ● gradient norm >= 1e-3 => none of the above.
Important Notice
- You don’t need to and shouldn’t change any part of the code.
- You can only use colab to run the code. Otherwise, your result might differ due to environmental issue.
- You will get a different checkpoint according to your student ID, so please make sure to fill in your student ID in the sample code correctly.
Sample Code
Colab Link:
https://colab.research.google.com/github/ga642381/ML2021-Spring/blob/main/HW
02/HW02-2.ipynb
- After executing the sample code, you should get a result like this.
- Notice that each student will get a different answer, so your answer may differ from the example.
Choose your answer from local minima like, saddle point, or none of the above







