Name: Machine-Learning- HW8: Anomaly Detection Solved
SKU: 85375
Availability: InStock

Description

5/5 - (2 votes)

Unsupervised anomaly detection in computer vision: Whether a machine learning model is able to tell a testing image is of the same class (distribution) as the training images
Unsupervised anomaly detection in computer vision: Whether a machine learning model is able to tell a testing image is of the same class (distribution) as the training images

Training

Model

Unsupervised anomaly detection in computer vision: Whether a machine learning model is able to tell a testing image is of the same class (distribution) as the training images

Training Testing

Model

Seen

Normal

Unsupervised anomaly detection in computer vision: Whether a machine learning model is able to tell a testing image is of the same class (distribution) as the training images

Training Testing

Model

Seen

Normal

Unseen

Anomaly

Data

Trainingset: About 140k human faces (size 64*64*3)
Testingset: Another 10k data from the same distributions as the trainingset (normal data, of class label 0) along with 10k human face images from the other distributions (anomalies, of class label 1)
Notice: Additional training data and pretrained models are prohibited
Data format: tar zxvf data-bin.tar.gz
data-bin/

○ trainingset.npy

○ testingset.npy

Method – Autoencoder

Autoencoder

When to stop training? Training should stop when the mse loss converges
During inference, we calculate the reconstruction error between the input image and the reconstructed one
The reconstruction error will be referred to as abnormality (anomaly score)
The abnormality of an image can be a metric of the possibility that it’s distribution is unseen during training
Therefore we use the abnormality as our predicted values

Accuracy score

Usually we compute accuracy scores for classification tasks
Here, our model functions as a sensor (or a detector) rather than a classifier
Thus, we need a threshold with respect to abnormality (usually the reconstruction error) to determine whether a piece of data is an anomaly
If we used accuracy score for this assignment, you would have to try every possible threshold for one single model to get a satisfactory score
However, what we want is a sensor that gets the highest accuracy on the average of every possible threshold

Which sensor is better?

Metric – ROC_AUC score

A good sensor should

○ Give high anomaly scores to the anomalies and low scores to the normal data

○ Exhibit a large gap between the scores of 2 groups

An ROC is suitable for our task
Each point on the ROC curve stands for true positive rate and false positive rate at certain threshold
The Area Under the ROC curve is calculated to measure the general ability of the model

ROC_AUC score

https://en.wikipedia.org/wiki/Receiver_operating_characteristic

Kaggle

Metric: ROC_AUC score Sample output:

ID		Anomaly score	Label
	0	11383		0
	1	256676		1
	2	862365		1
	3	152435		0
	4	848171		0

ID		Anomaly score	Label
	2	862365		1
	4	848171		0
	1	256676		1
	3	152435		0
	0	11383		0

Sort by score

https://towardsdatascience.com/how-to-calculate-use-the-auc-score-1fc85c9a8430

Anomaly score

Label

fp before normalization

tp before normalization

862365

848171

256676

152435

11383

ID		Anomaly score	Label		fp		tp
	0	11383		0		0		0.5
	3	152435		0		0.333333		0.5
	1	256676		1		0.333333		1
	4	848171		0		0.666667		1
	2	862365		1		1		1

Area Under Curve: 0.5*⅓ + ⅔ = 0.8333

Scoring

Code submission: 4 pt
Baselines 6 pt (3 pt for the public ones and the other 3 pt for the private ones)

○ Simple public: 1 pt (public score: 0.64046)

○ Medium public: 1 pt (public score: 0.75719)

○ Strong public: 0.5 pt (public score: 0.81304)

○ Boss public: 0.5 pt (public score: 0.86590)

○ Simple private: 1 pt

○ Medium private: 1 pt

○ Strong private: 0.5 pt

○ Boss private: 0.5 pt

Bonus for submitting report: 0.5 pt

Bonus

If you succeed in beating both boss baselines, you can get extra 0.5 pt by submitting a brief report to explain your methods (in less than 100 English words), which will be made public to the whole class ● Report Template

Baseline guides

Simple
- FCN autoencoder
Medium
- CNN autoencoder

○ Try smaller models (less layers)

○ Smaller batch size

Strong
- Add BatchNorm

○ Train for longer

Boss:
- Add an extra classifier

○ Sample random noises as anomaly images

○ Or one-class-classification (OCC) with GANs: OCGAN, End-to-end OCC, paper pool for Anomaly Detection

Baseline training statistics

Simple
- Number of parameters: 3176419

○ Training time on colab: ~ 30 min

Medium
- Number of parameters: 47355 ○ Training time on colab: ~ 30 min
Strong
- Number of parameters: 47595 ○ Training time on colab: 4 ~ 5 hrs
Boss:
- Number of parameters: 4364140

○ Training time on colab: 1.5~3 hrs

[SOLVED] Machine-Learning- HW8: Anomaly Detection

Description

Data

Method – Autoencoder

Autoencoder

Accuracy score

Metric – ROC_AUC score

ROC_AUC score

Kaggle

Scoring

Bonus

Baseline guides

Baseline training statistics

Strong baseline training curve

[SOLVED] Machine-Learning- HW8: Anomaly Detection

If Helpful Share:

Description

Data

Method – Autoencoder

Autoencoder

Accuracy score

Metric – ROC_AUC score

ROC_AUC score

Kaggle

Scoring

Bonus

Baseline guides

Baseline training statistics

Strong baseline training curve

Related products

Machine-Learning – Homework 5 – Gaussian Process & SVM –

Machine-Learning- HW5: Sequence to sequence

Machine-Learning- HW11: Domain Adaptation

Related in this category

More in this category

Machine-Learning- Homework 2

Machine-Learning-Lab3_Neural Networks

Machine-Learning-Homework 7

Machine-Learning- HW5: Sequence to sequence

Machine-Learning Exercise 6-Support Vector Machines

Machine-Learning Exercise 4- Neural Networks Learning