Name: Machine-Learning- HW5: Sequence to sequence Solved
SKU: 85361
Availability: InStock

Description

Rate this product

Introduction to sequence to sequence

Sequence to sequence

Generate a sequence from another sequence

Translation ASR TTS

text to text speech to text text to speech

and more…

Sequence to sequence

Often composed of encoder and decoder

Encoder: encodes input sequence into a vector or sequence of vectors
Decoder: decodes a sequence one token at a time, based on 1) encoder output and 2) previous decoded tokens

HW5: Machine Translation

Neural Machine Translation

We will translate from english to traditional chinese

Cats are so cute. -> 貓咪真可愛。

A sentence is usually translated into another language with different length.

Naturally, the seq2seq framework is applied on this task.

Training datasets

Paired data
- TED2020: TED talks with transcripts translated by a global community of volunteers to more than 100 language

○ We will use (en, zh-tw) aligned pairs

Monolingual data
- More TED talks in traditional Chinese

source: Cats are so cute.

Evaluation

target:貓咪真可愛。

BLEU output: 貓好可愛。

Modified^[1] n-gram precision (n=1~4)
Brevity penalty: penalizes short hypotheses

○ c is the hypothesis length, r is the reference length

The BLEU score is the geometric mean of n-gram precision, multiplied by brevity penalty

Workflow

Preprocessing
download raw data
clean and normalize
remove bad data (too long/short)
tokenization Training
initialize a model
train it with training data
Testing
generate translation of test data
evaluate the performance

Training tips

Tokenize data with sub-word units
Label smoothing regularization
Learning rate scheduling
Back-translation

Tokenize data with sub-word units
- For one, we can reduce the vocabulary size (common prefix/suffix)

○ For another, alleviate the open vocabulary problem

○ example

■ ▁new ▁ways ▁of ▁making ▁electric ▁trans port ation ▁.

■ new ways of making electric transportation.

Label smoothing regularization
- When calculating loss, reserve some probability for incorrect labels ○ Avoids overfitting
Learning rate scheduling
- Linearly increase lr and then decay by inverse square root of steps

○ Stablilize training of transformers in early stages

Back-translation (BT)

Leverage monolingual data by creating synthetic translation data

Train a translation system in the opposite direction
Collect monolingual data in target side and apply machine translation
Use translated and original monolingual data as additional parallel data to train stronger translation systems

back-translation

translated monoligual datamonolingual data

original data original data

Back-translation

Some points to note about back-translation

Monolingual data should be in the same domain as the parallel corpus
The performance of the backward model is critical
You should increase model capacity (both forward and backward), since the data amount is increased.

Requirements

You are encouraged to follow these tips to improve your performance in order to pass the 3 baselines.

Train a simple RNN seq2seq to acheive translation
Switch to transformer to boost performance
Apply back-translation to furthur boost performance

Train a simple RNN seq2seq to acheive translation ● Running the sample code should pass the baseline!

Switch to transformer to boost performance

Change the encoder/decoder architecture to transformer based, according to the hints in sample code
- RNNEncoder -> TransformerEncoder

○ RNNDecoder -> TransformerDecoder

Change architecture configurations
- encoder_ffn_embed_dim -> 1024

○ encoder_layers/decoder_layers -> 4

○ #add_transformer_args(arch_args) -> add_transformer_args(arch_args)

Apply back-translation to furthur boost performance

Train a backward model by switching languages
- source_lang = “zh”

○ target_lang = “en”

Remember to change architecture to transformer-base
Translate monolingual data with backward model to obtain synthetic data
- complete TODOs in the sample code.

○ all the TODOs can be completed by using commands from earlier cells.

Train a stronger forward model with the new data
- if done correctly, ~30 epochs on new data should pass the baseline.

Expected Run Time

on colab with Tesla T4

Baseline	Details	Total time
Simple	2m15s x 30 epochs	1hr 8m
Medium	4m x 30 epochs	2hr
Strong	8m x 30 epochs (backward) + 1hr (back-translation) + 15m x 30 epochs (forward)	12hr 30m

TA’s training curve https://wandb.ai/george0828zhang/hw5.seq2seq.ne

Regulation

You should NOT plagiarize, if you use any other resource, you should cite it in the reference. (＊)
You should NOT modify your prediction files manually.
Do NOT share codes or prediction files with any living creatures.
Do NOT use any approaches to submit your results more than 5 times a day.
Do NOT search or use additional data or pre-trained models.
Your final grade x 0.9 if you violate any of the above rules.
Lee & TAs preserve the rights to change the rules & grades.

(＊) Academic Ethics Guidelines for Researchers by the

Ministry of Science and Technology

[SOLVED] Machine-Learning- HW5: Sequence to sequence

If Helpful Share:

Description

Introduction to sequence to sequence

Sequence to sequence

Sequence to sequence

HW5: Machine Translation

Neural Machine Translation

Training datasets

Evaluation

Workflow

Workflow

Training tips

Back-translation (BT)

Back-translation

Requirements

Requirements

Expected Run Time

Regulation

Related products

Machine-Learning – Homework 5 – Gaussian Process & SVM –

Machine-Learning- Homework 3

Machine-Learning- HW7: BERT – Question Answering

Related in this category

More in this category

Machine-Learning- HW10: Adversarial Attack

Machine-Learning Exercise 1- Linear Regression

Machine-Learning- Homework 2

Machine-Learning-Homework 8

Machine-Learning-Evaluation and Generalization

Machine-Learning – Homework 5 – Gaussian Process & SVM –