1. 如果将小批量的总损失替换小批量损失的平均值，需要如何更改学习率？

Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input xx and target yy.

The mean operation still operates over all the elements, and divides by nn.

The division by nn can be avoided if one sets reduction = 'sum'.

Parameters:
- reduction (str , optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'

loss = nn.MSELoss()
lr = 0.03

loss = nn.MSELoss(reduction='sum')
lr = 0.03 / batch_size

2. 查看深度学习框架文档，它们提供了哪些损失函数和初始化方法？用Huber损失代替原损失，即

l(y,y') = \begin{cases}|y-y'| -\frac{\sigma}{2} & \text{ if } |y-y'| > \sigma \\ \frac{1}{2 \sigma} (y-y')^2 & \text{ 其它情况}\end{cases}

Loss Functions

`nn.L1Loss`	Creates a criterion that measures the mean absolute error (MAE) between each element in the input xx and target yy.
`nn.MSELoss`	Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input xx and target yy.
`nn.CrossEntropyLoss`	This criterion computes the cross entropy loss between input logits and target.
`nn.CTCLoss`	The Connectionist Temporal Classification loss.
`nn.NLLLoss`	The negative log likelihood loss.
`nn.PoissonNLLLoss`	Negative log likelihood loss with Poisson distribution of target.
`nn.GaussianNLLLoss`	Gaussian negative log likelihood loss.
`nn.KLDivLoss`	The Kullback-Leibler divergence loss.
`nn.BCELoss`	Creates a criterion that measures the Binary Cross Entropy between the target and the input probabilities:
`nn.BCEWithLogitsLoss`	This loss combines a Sigmoid layer and the BCELoss in one single class.
`nn.MarginRankingLoss`	Creates a criterion that measures the loss given inputs x1x1, x2x2, two 1D mini-batch or 0D Tensors, and a label 1D mini-batch or 0D Tensor yy (containing 1 or -1).
`nn.HingeEmbeddingLoss`	Measures the loss given an input tensor xx and a labels tensor yy (containing 1 or -1).
`nn.MultiLabelMarginLoss`	Creates a criterion that optimizes a multi-class multi-classification hinge loss (margin-based loss) between input xx (a 2D mini-batch Tensor) and output yy (which is a 2D Tensor of target class indices).
`nn.HuberLoss`	Creates a criterion that uses a squared term if the absolute element-wise error falls below delta and a delta-scaled L1 term otherwise.
`nn.SmoothL1Loss`	Creates a criterion that uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise.
`nn.SoftMarginLoss`	Creates a criterion that optimizes a two-class classification logistic loss between input tensor xx and target tensor yy(containing 1 or -1).
`nn.MultiLabelSoftMarginLoss`	Creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input xx and target yyof size (N,C)(N,C).
`nn.CosineEmbeddingLoss`	Creates a criterion that measures the loss given input tensors x1x1, x2x2 and a Tensor label yy with values 1 or -1.
`nn.MultiMarginLoss`	Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input xx (a 2D mini-batch Tensor) and output yy (which is a 1D tensor of target class indices, 0≤y≤x.size(1)−10≤y≤x.size(1)−1):
`nn.TripletMarginLoss`	Creates a criterion that measures the triplet loss given an input tensors x1x1, x2x2, x3x3 and a margin with a value greater than 00.
`nn.TripletMarginWithDistanceLoss`	Creates a criterion that measures the triplet loss given input tensors aa, pp, and nn (representing anchor, positive, and negative examples, respectively), and a nonnegative, real-valued function ("distance function") used to compute the relationship between the anchor and positive example ("positive distance") and the anchor and negative example ("negative distance").

loss = nn.HuberLoss()
lr = 0.03

3. 如何访问线性回归的梯度？

wg = net[0].weight.grad
print(f'w grad: {wg}')
bg = net[0].bias.grad
print(f'b grad: {bg}')

w grad: tensor([[ 0.0009, -0.0017]])
b grad: tensor([-0.0008])

线性回归的简洁实现｜线性神经网络｜动手学深度学习

1. 如果将小批量的总损失替换小批量损失的平均值，需要如何更改学习率？

2. 查看深度学习框架文档，它们提供了哪些损失函数和初始化方法？用Huber损失代替原损失，即

Loss Functions

3. 如何访问线性回归的梯度？