「这是我参与11月更文挑战的第29天,活动详情查看:2021最后一次更文挑战」
Notes of Andrew Ng’s Machine Learning —— (15) Anomaly Detection
Density Estimation
Problem Motivation
Anomaly Detection is a type of machine learning problem.
Imagine that you're a manufacturer of aircraft engines, and let's say that as your aircraft engines roll off the assembly line, you're doing quality assurance testing, and as part of that testing you measure features of your aircraft engine, like the heat generated, the vibrations and so on. So you now have a data set of through , if you have manufactured aircraft engines and plot your data. Then, given a new engine , you want to know is anomalous? This problem is called Anomaly Detection.
Here is want we are going to do:
Dataset:
Is anomalous?
To solve this problem, we will train a model for (i.e. a model for the probability of , where are these features of, say, aircraft engines). We're then going to say that for the new aircraft engine, if is less than some then we flag this as an anomaly:
This problem of estimating this distribution are sometimes called the problem of density estimation.
Anomaly detection example
Anomaly detec-on example
Fraud detection
- = features of user ’s actvities
- Model from data.
- Idenify unusual users by checking which have
Manufacturing (Monitoring computers in a data center )
-
= features of machine
-
= memory use, = number of disk accesses/sec, = CPU load, = CPU load/network traffic ...
Gaussian Distribution
Gaussian distribution which is also called the normal distribution.
If is a distributed Gaussian with mean , variance , we will write it as:
In this case, the probability of is:
There is some example of Gaussian distribution:
We can see that the "center" of these plots is actually the value of . And with our increasing, the plots is more and more "flat". Or, it will be "thin & tall" if is small.
Parameter estimation
Given a dataset: , we can get our and by:
P.S. this is what people turn to use when handling a machine learning problem, it's a little different from what we normally utilize in mathematics.
Anomaly detection algorithm
-
Choose features that you think might be indicative of anomalous examples.
-
Fit parameters : (I wrote the vectorilized version)
-
Given new example , compute :
-
Anomaly if
Building an Anomaly Detection System
Developing and Evaluating an Anomaly Detection System
The importance of real-number evalua-on
When developing a learning algorithm (choosing features, etc.), making decisions is much easier if we have a way of evalua-ng our learning algorithm.
Assume we have some labeled data, of anomalous and non- anomalous examples. ( if normal, if anomalous).
-
Training set
: (assume normal examples/not anomalous) -
Cross validation set
: -
Test set
:
Aircraft engines motivating example
Say, what we have is:
- 10000 good (normal) engines ()
- 20 flawed engines (anomalous) ()
Than we are going to choose:
- Training set: 6000 good engines
- CV: 2000 good engines (), 10 anomalous ()
- Test: 2000 good engines (), 10 anomalous ()
Algorithm evaluation
Fit model on training set
On a cross validation/test example , predict:
Possible evaluation matrices:
- True positive, false positive, false negative, true negative
- Precision/Recall
- -score
Can also use cross valida-on set to choose parameter .
Anomaly Detection vs. Supervised Learning
There are some more examples:
-
(Anomaly Detection) You run a power utility (supplying electricity to customers) and want to monitor your electric plants to see if any one of them might be behaving strangely.
-
(Supervised Learning) You run a power utility and want to predict tomorrow’s expected demand for electricity (so that you can plan to ramp up an appropriate amount of generation capacity).
-
(Anomaly Detection) A computer vision / security application, where you examine video images to see if anyone in your company’s parking lot is acting in an unusual way.
-
(Supervised Learning) A computer vision application, where you examine an image of a person entering your retail store to determine if the person is male or female.
Choosing What Features to Use
preprocess non-gaussian features
We can use hist
in octave to plot out the histogram of our data, if we find that it is non-gaussian, we can try to apply a or (try different for a better result) to make it looks more gaussian. For example:
Error analysis for anomaly detection
What we want is:
- large for normal examples .
- small for anomalous examples .
And the most common problem is:
is comparable (say, both large) for normal and anomalous examples.
Suppose our anomaly detection algorithm is performing poorly and outputs a large value of p(x) for many normal examples and for many anomalous examples in the cross validation dataset, what is most likely to help is to try coming up with more features to distinguish between the normal and the anomalous examples.
Multivariate Gaussian distribution
Multivariate Gaussian Distribution
What we still want to do is:
- Given .
- Don't model separately.
- Model all in one go.
Multivariate gaussian is able to make it:
P.s. is the determinant of .
Where we need parameters:
- (covariance matrix,
Sigma = 1/m * X' * X;
)
Given training set , we can fit the parameters:
Here are lots of examples:
Anomaly Detection using the Multivariate Gaussian Distribution
Using the Multivariate Gaussian Distribution
-
Fit model by setting:
-
Given a new example , compute:
-
Flag an anomaly if
Relationship to original model
Original model:
As we can see, the contours of the Original model are always axis aligned.
This model actually corresponds to a special case of a multivariate Gaussian distribution where:
Original model vs. Multivariate gaussian
Original model:
- Manually create features to capture anomalies where take unusual combinations of values. (e.g. )
- Computationally cheaper (alternatively, scales better to large )
- OK even if (training set size) is small
Multivariate gaussian:
- Automatically captures correlations between features (no need to create a )
- Computationally more expensive
- Must have & all features are not redundant (need to promise that there are no features that are linearly dependent) or else is non-invertible.