[SOLVED] Machine-Learning- HW5: Sequence to sequence

35.00 $

Category:

Description

Rate this product

Introduction to sequence to sequence

Sequence to sequence

Generate a sequence from another sequence

Translation                            ASR                                           TTS

text to text                             speech to text                       text to speech

and more…

Sequence to sequence

Often composed of encoder and decoder

  • Encoder: encodes input sequence into a vector or sequence of vectors
  • Decoder: decodes a sequence one token at a time, based on 1) encoder output and 2) previous decoded tokens

HW5: Machine Translation

Neural Machine Translation

We will translate from english to traditional chinese

  • Cats are so cute. -> 貓咪真可愛。

A sentence is usually translated into another language with different length.

Naturally, the seq2seq framework is applied on this task.

Training datasets

  • Paired data
    • TED2020: TED talks with transcripts translated by a global community of volunteers to more than 100 language

○        We will use (en, zh-tw) aligned pairs

  • Monolingual data
    • More TED talks in traditional Chinese

source: Cats are so cute.

Evaluation

target:貓咪真可愛。

BLEU                                                                                              output: 貓好可愛。

  • Modified[1] n-gram precision (n=1~4)
  • Brevity penalty: penalizes short hypotheses

○ c is the hypothesis length, r is the reference length

  • The BLEU score is the geometric mean of n-gram precision, multiplied by brevity penalty

Workflow

Workflow

  1. Preprocessing
  2. download raw data
  3. clean and normalize
  4. remove bad data (too long/short)
  5. tokenization Training
  6. initialize a model
  7. train it with training data
  8. Testing
  9. generate translation of test data
  10. evaluate the performance

Training tips

  • Tokenize data with sub-word units
  • Label smoothing regularization
  • Learning rate scheduling
  • Back-translation

 

  • Tokenize data with sub-word units
    • For one, we can reduce the vocabulary size (common prefix/suffix)

○          For another, alleviate the open vocabulary problem

○      example

■ ▁new ▁ways ▁of ▁making ▁electric ▁trans port ation ▁.

■ new ways of making electric transportation.

  • Label smoothing regularization
    • When calculating loss, reserve some probability for incorrect labels ○ Avoids overfitting
  • Learning rate scheduling
    • Linearly increase lr and then decay by inverse square root of steps

○          Stablilize training of transformers in early stages

 

Back-translation (BT)

Leverage monolingual data by creating synthetic translation data

  1. Train a translation system in the opposite direction
  2. Collect monolingual data in target side and apply machine translation
  3. Use translated and original monolingual data as additional parallel data to train stronger translation systems

back-translation

translated monoligual datamonolingual data

original data        original data

Back-translation

Some points to note about back-translation

  1. Monolingual data should be in the same domain as the parallel corpus
  2. The performance of the backward model is critical
  3. You should increase model capacity (both forward and backward), since the data amount is increased.

Requirements

Requirements

You are encouraged to follow these tips to improve your performance in order to pass the 3 baselines.

  1. Train a simple RNN seq2seq to acheive translation
  2. Switch to transformer to boost performance
  3. Apply back-translation to furthur boost performance

 

Train a simple RNN seq2seq to acheive translation ● Running the sample code should pass the baseline!

Switch to transformer to boost performance

  1. Change the encoder/decoder architecture to transformer based, according to the hints in sample code
    • RNNEncoder -> TransformerEncoder

RNNDecoder -> TransformerDecoder

  1. Change architecture configurations
    • encoder_ffn_embed_dim -> 1024

○ encoder_layers/decoder_layers -> 4

○ #add_transformer_args(arch_args) ->  add_transformer_args(arch_args)

Apply back-translation to furthur boost performance

  1. Train a backward model by switching languages
    • source_lang = “zh”

○ target_lang = “en”

  1. Remember to change architecture to transformer-base
  2. Translate monolingual data with backward model to obtain synthetic data
    • complete TODOs in the sample code.

○             all the TODOs can be completed by using commands from earlier cells.

  1. Train a stronger forward model with the new data
    • if done correctly, ~30 epochs on new data should pass the baseline.

 

Expected Run Time

  • on colab with Tesla T4
Baseline Details Total time
Simple 2m15s x 30 epochs 1hr 8m
Medium 4m x 30 epochs 2hr
Strong 8m x 30 epochs (backward)

+ 1hr (back-translation)

+ 15m x 30 epochs (forward)

12hr 30m
  • TA’s training curve https://wandb.ai/george0828zhang/hw5.seq2seq.ne

Regulation

  • You should NOT plagiarize, if you use any other resource, you should cite it in the reference. (*)
  • You should NOT modify your prediction files manually.
  • Do NOT share codes or prediction files with any living creatures.
  • Do NOT use any approaches to submit your results more than 5 times a day.
  • Do NOT search or use additional data or pre-trained models.
  • Your final grade x 0.9 if you violate any of the above rules.
  • Lee & TAs preserve the rights to change the rules & grades.

(*) Academic Ethics Guidelines for Researchers by the

Ministry of Science and Technology