FIT3181: Deep Learning Neural NetworksFIT 3181: ****Deep ***

FIT 3181: ****Deep ****Learning (2024)

Deep ****Neural ****Networks

Due: ****11:55pm Sunday, 8 September 2024 (Sunday)

Important ****note: This ****is an ****individual assignment. ****I t contributes 25% to your final ****mark. ****Read the ****assignment ****instructions ****carefully.

What ****to ****submit

This assignment is to be completed individually and submitted to Moodle unit site. By ****the ****due ****date, you ****are ****required ****to ****submit ****one ****single ****zip ****file, ****named ****xxx_assignment01_solution.zip where xxx is your student ****ID, to the corresponding Assignment ****(Dropbox) ****in ****Moodle. You can use Google Colab to do Assigmnent 1 but you need to save it to an *. ipynb file to submit to the unit Moodle.

More ****importantly, ****if you ****use Google Co lab to do this assignment, you ****need to first ****make ****a ****copy ****of this ****notebook ****on your ****Google ****drive ****.

For example, if ******your ******student ID is ******12356 **, ********then gather all of ******your assignment solution ******to folder, create a zip file named ********123456_ **assignment **01_ **solution **. **zip ******and ******submit ******this ******file .

Within this zipfolder, you must submit the following files:

1. Assignment01_solution.ipynb: this is your Python notebook solution source file.

2. Assignment01_output.html: this is the output of your Python notebook solution exported **in html format.

Any extra files or folder needed to complete your assignment (e.g., images used in your answers).

Since the notebook is quite big to load and work together, one recommended option is to split solution into three parts and work on them seperately. In that case, replace Assignment01_solution.ipynb ****by three notebooks: Assignment0 1_Part1_solution.ipynb, Assignment01_Part2_solution.ipynb and Assignment01_Part3_solution.ipynb

You can ****run your codes on Google Colab. ****In this ****case, you ****ha ve ****to ****make ****a ****copy ****of your ****Google ****colab ****notebook ****including the ****traces ****and ****progresses ****of model training before ****subm itting.

Part ****1: Theory ****and ****Knowledge ****Questions [ Total ****marks ****for ****this ****part : 30 ****points ]

The first part of this assignment is to demonstrate your knowledge in deep learning that you have acquired from the lectures and tutorials materials. Most of the contents in this assignment are drawn from the ****lectures and tutorials from weeks ****1 ****to ****4. Going through these materials before attempting this part is highly recommended.

Question ****1.1 Activation function ****plays an ****important ****role ****in modern ****Deep ****NNs. ****For each of the activation functions ****below, state ****its ****output ****range, find its derivative (show your steps), and ****plot the activation fuction ****and ****its ****derivative

(b) Gaussian Error Linear Unit (GELU): GELU(x) = xΦ(x) where Φ(x) is the probability cummulative function of the standard Gaussian distribution or Φ(x) = P (X ≤ x) where X ~ N (0, 1) . In addition, the GELU activation fuction (the link for the main paper (arxiv.org/pdf/1606.08…)) has

been widely used in the state-of-the-art Vision for Transformers (e.g., here is the link for the main ViT paper (arxiv.org/pdf/2010.11…)). [1.5 points]

Write your answer **here. **You can add mo re cells if **needed.

Question ****1.2: Assume that we feed a data point x with a ****ground-truth ****label ****y = 2 to the ****feed-forward ****neural ****network ****with the ****ReLU activation function as shown ****in the follo wing figure

(a) What is the numerical value of the latent presentation h1 (x)? [1 point]

(b) What is the numerical value of the latent presentation h2 (x)? [1 point]

(c) What is the numerical value of the logith3 (x)? [1 point]

(d) What is the corresonding prediction probabilities p(x)? [1 point]

(e) What is the predicted label y(^)? Is it a correct and an incorect prediction? Remind that y = 2. [1 point]

(f) What is the cross-entropy loss caused by the feed-forward neural network at (x, y)? Remind that y = 2. [1 point]

(g) Why is the cross-entropy loss caused by the feed-forward neural network at (x, y) (i.e., CE(1y, p(x))) always non-negative? When does this CE(1y, p(x)) loss get the value 0? Note that you need to answer this question for a general pair (x, y) and a general feed-forward neural network with, for example M = 4 classes? [1 point]

You must show both formulas and numerical results for earning full mark. Although it is optional, it is great if **you show **your PyTorch code for **your computation.

Question ****1.3:

For Question ****1.3, you have two options:

· (1) perform **the **forward , backward **propagation , and **SGD **update **for **one mini - batch (10 ****points), or

· (2) manually implement a feed-forward neural network that can work on real tabular datasets (20 ****points).

You can choose either (1) ****or (2) ****to proceed.

Option ****1 [ Total ****marks ****for ****this ****option : ****10 ****points ]

Assume that we are constructing a multilayered feed-forward ****neural ****network for a ****classification ****problem with t hree ****classes where ****the ****model parameters ****will ****be ****generated ****randomly ****using ****your ****student ****ID . The ****architecture ****of ****this ****network ****is ****3(Input) → 5(ELU) → 3(output) as ****shown ****in the following figure. ****Note that the ****ELU ****has t he same formula as the one ****in ****Q1.1.

We feed a batch X with the labels Y as shown in the figure. Answer the following questions.

You ****need to show ****both formulas, ****numerical ****results, and your PyTorch code for your computation ****for ****earning ****full ****marks.

In [ ]:

Out [3]:

< torch ._C. Generator at 0x7 dc 439f98810>

In [ ]:

# Code to generate random matrices and biases for W 1, b1, W2, b2

Forward ****propagation

(a) What is the value of h(¯)1 (x) (the pre-activation values of h1 )? [0.5 point]

In [ ]:

(b) What is the value of h1 (x)? [0.5 point]

In [ ]:

(c) What is the predicted value y(^)? [0.5 point]

In [ ]:

(d) Suppose that we use the cross-entropy (CE) loss. What is the value of the CE loss l incurred by the mini-batch? [0.5 point]

In [ ]:

Backward ****propagation

(e) What are the derivatives , , and ? [3 points]

In [ ]:

(f) What are the derivatives , , , and ? [3 points]

In [ ]:

SGD ****update

(g) Assume that we use SGD with learning rate η = 0.01 to update the model parameters. What are the values of W 2 , b2 and W 1 , b1 after updating? [2 points]

In [ ]:

Option 2 [ Total ****marks ****for ****this ****option : ****20 ****points ]

In [ ]:

import torchfrom torch.utils.data import DataLoaderfrom torchvision import datasets, transforms

In Option 2, you ****need to ****implement a feed-forward ****NN manua lly ****using ****PyTorch and ****auto-differentiation ****of ****PyTorch. ****We ****then ****manually ****train ****the ****model on the ****MNIST dataset.

We first download the MNIST dataset and preprocess it.

In [ ]:

Each data point has dimension [28,28] . We need to flatten it to a vector to input to our FFN.

In [ ]:

train _ dataset . data = train _ data . data . view ( - 1 , 28 ***** 28 ) test _ dataset . data = test _ data . data . view ( - 1 , 28 ***** 28 **)**train _ data , train _ labels = train _ dataset . data , train _ dataset . targets test _ data , test _ labels = test _ dataset . data , test _ dataset . targets****print ( train _ data . shape , train _ labels . shape **)**print ( test _ data . shape , test _ labels . shape )

In [ ]:

train _ loader = DataLoader ( dataset = train _ dataset , batch _ size = 64 , shuffle = True ) test _ loader = DataLoader ( dataset = test _ dataset , batch _ size = 64 , shuffle = False )

Develop the feed-fo rward ****neural ****networks

(a) You need to develop the class MyLinear with the following skeleton. You need to declare the weight matrix and bias of this linear layer. [3 points]

In [ ]:

(b) You need to develop the class MyFFN with the following skeleton [7 points]

In [ ]:

myFFN = MyFFN ( input _ size = 28 *** 28 , num _ classes = 10 , hidden _ sizes = [ 100 , 100 ], act = torch . nn . ReLU ) myFFN . create _ FFN ()**print ( myFFN )

(c) Write the code to evaluate the accuracy of the current myFFN model on a data loader (e.g., train_loader or test_loader). [2.5 points]

In [ ]:

(c) Write the code to evaluate the loss of the current myFFN model on a data loader (e.g., train_loader or test_loader). [2.5 points]

In [ ]:

def compute _ loss ( model , data _ loader ):"""This function computes the los s of the model on a data loader"""# Your code here

Train on the MNIST data with 50 epochs using updateSG D .

In [ ]:

(d) ****Implement the function updateSGDMomentum in the class and train the model with this optimizer in 50 epochs. You can update the corresponding function in the MyFNN class. [2.5 points]

In [ ]:

(e) ****Implement the function updateAdagrad in the class and train the model with this optimizer in 50 epochs. You can update the corresponding function in the MyFNN class. [2.5 points]

In [ ]:

Part 2: ****Deep ****Neural ****Ne tworks (DNN) [ Total ****marks ****for ****this ****part : 25 ****points ]

The second part of this assignment is to demonstrate your basis knowledge in deep learning that you have acquired from the lectures and tutorials materials. Most of the contents in this assignment are drawn from the ****tut orials ****covered ****from weeks ****1 ****to ****2. Going through these materials before attempting this assignment is highly recommended.

In the second part of this assignment, you are going to work with the FashionMNIST dataset for image recognition task. It has the exact same format as MNIST (70,000 grayscale images of 28 × 28 pixels each with 10 classes), but the images represent fashion items rather than handwritten digits, so each class is more diverse, and the problem is significantly more challenging than MNIST.

In [ ]:

import torchfrom torch.utils.data import DataLoaderfrom torchvision import datasets, transforms torch . manual _ seed ( 1234 )

Load the Fashion MNIST ****using torchvision

In [ ]:

torch . Size代写FIT3181: Deep Learning Neural Networks ([60000, 28, 28]) torch . Size ([60000]) torch . Size ([10000, 28, 28]) torch . Size ([10000]) torch . Size ([60000, 784]) torch . Size ([60000])

torch . Size ([10000, 784]) torch . Size ([10000])

Number of training samples : 18827 Number of training samples : 16944 Number of validation samples : 1883

Question 2.1: Write the code to visualize a mini-batch in ****train_loader including ****its ****images ****and ****labels. [5 points]

In [ ]:

####Question 2.2: Write the code for the feed-forward neural net using PyTorch [5 points]

We now develop a feed-forward neural network with the architecture 784 → 40(ReLU) → 30(ReLU) → 10(softmax) . You can choose your own way to implement your network and an optimizer of interest. You should train model in 50 epochs and evaluate the trained model on the test set.

In [ ]:

Question 2.3: Tuning ****hyper-parameters w ith grid search [5 points]

Assume that you need to tune the number of neurons on the first and second hidden layers n1 ∈ {20, 40} , n2 ∈ {20, 40} and the used activation function act ∈ {sigmoid, tanh, relu} . The network has the architecture pattern 784 → n1 (act) → n2 (act) → 10(softmax) where n1 , n2 , and act are in their

grides. Write the code to tune the hyper-parameters n1 , n2 , and act. Note that you can freely choose the optimizer and learning rate of interest for this tasWX：codinghelp