这是我参与8月更文挑战的第1天,活动详情查看:8月更文挑战
Machine Learning definition
Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.
——Arthur Samuel(1959)
Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by p, improves with experience E.
——Tom Mitchell(1998)
一个适当的学习问题定义如下:计算机程序从经验E中学习,解决某一任务T,进行某一性能度量P,通过P测定在T上的表现因经验E而提高。
监督学习
在监督学习中,给出一个数据集,且每个样本都指出正确答案,使用算法预测出“正确答案”。
Supervised Learning: We gave the algorithm a data set,in which the right answers were given. The task of the algorithm was to just produce more of these right answers.
- Regression: Predict continuous valued output(回归问题:预测一个连续值输出)
- Classification: Discrete valued output(分类问题:预测离散值输出)
回归问题
已知数据元素学习时间和测试分数,简单拟合出一个一元函数。拟合出公式之后,随意给出一个数据即可预测出对应的另一个数据,比如已知考75分可以推测复习的3h左右,已知复习3h可以推测出考75分左右。
离散问题
已知不同的身高体重和性别,图中明显看出由于身高体重不同而划分出的性别差异。之后给定一个身高体重数据便可以推测其性别。
例题:
You're running a company and you want to develop learning algorithms to address each of two problems
Problem 1:You have a large inventory of identical items. You want to predict how many of these items will sell over the next 3 months
Problem 2:Youd like software to examine individual customer accounts and for each account decide if it has been hacked /compromised
答案: Treat problem 1 as a regression problem, problem 2 as a classification problem.
p1预测销量,肯定是对历史销量数据进行拟合从而得到一个曲线模型而进行预测,因此是回归问题。
p2检测账号,数据是离散的,安全或者不安全,因此是分类问题。
无监督学习
在无监督学习中,给出数据集,不进行数据区分。程序自动对输入的数据进行分类或分群,以寻找数据的模型和规律。
In Unsupervised Learning, the data that doesn't have any labels,or that all has the same labels or really no labels.
- clustering algorithm 聚类
将数据进行聚类,作为一个曾经的生物学学生,我第一反应就是聚类在生信中应用及其广泛。生信中的聚类,给定DNA序列,就可以自动划分为不同的物种。下图为一个热图。
例题:
Of the following examples, which would you address using an unsupervised learning algorithm?
-
Given email labeled as spam/ not spam learn a spam filter
-
Given a set of news articles found on the web, group them into set of articles about the same story
-
Given a database of customer data, automatically discover market segments and group customers into different market segments
-
Given a dataset of patients diagnosed as either having diabetes or not, learn to classify new patients as having diabetes or not.
答案:
- Given a set of news articles found on the web, group them into set of articles about the same story
- Given a database of customer data, automatically discover market segments and group customers into different market segments
1.区分是否是垃圾邮件:监督学习-离散
4.区分病人是否患糖尿病:监督学习-离散
我是萝莉安。梦想是做个合格的程序媛。