Application
- credit scoring
- medical diagnosis
- handwritten character recognition
Regression?
Why not regression
- penalize examples that are "too correct"
- multiple class: 1, 2, 3 (classes 1 and 2 are not necessarily closer than classes 1 and 3)
Ideal alternatives
Gradient descent cannot be applied! - indifferentiable
Two classes
Probabilistic generative model - Bayes' Theorem
- estimating the probabilities from training data
- if assume all the dimensions are independent -> Naive Bayes Classifier
&
Easily calculated from training data
&
- Assume that the points are sampled from a Gaussian distribution (it's okay to use other distributions as well. e.g., Bernoulli distribution for binary features.)
- Find the Gaussian distribution behind each class (Maximum likelihood)
Easily overfitting if a distribution for each class, since a covariance matrix can have too many parameters (# features * # features) -> the same covariance matrix for each class.
- Find the probability of the new point based on the estimated distribution
Finally
-> How about we directly find w and b?