机器学习入门01

92 阅读5分钟

动态规划和机器学习在处理问题时有本质的区别? 动态规划是一种算法策略,其核心是利用子问题的结果来求解更大规模的问题,也就是将问题分解为更小的子问题并逐一解决,最终得到原问题的答案。这种方法主要适用于最优化问题,特别是在多种决策相互依赖的情况下。

而机器学习则是一种利用经验数据来建立模型的方法,通过模型对新的数据进行预测和分析。机器学习算法的核心是通过训练数据集进行模型的训练,然后利用该模型对新数据进行预测和分析。

两者的本质区别在于,动态规划是一种基于确定规则和顺序的算法策略,而机器学习则是基于数据的归纳和统计学习方法。动态规划需要预先知道问题的结构信息和最优解的性质,而机器学习则是从数据中学习并发现规律和模式。

image.png

监督学习、非监督学习,梯度下降分别是什么

  1. 监督学习:监督学习是一种通过已有的训练样本(输入和输出)进行模型训练的方法。在监督学习中,每个训练样本都包含输入(特征)和输出(标签),模型通过学习这些训练样本的输入和输出关系,从而能够对新的输入数据进行预测。常见的监督学习算法包括回归、分类、聚类等。
  2. 非监督学习:与监督学习不同,非监督学习没有预先标注的输出结果,而是通过学习输入数据的内在结构和规律来对数据进行分类、聚类或降维等操作。在非监督学习中,模型的目标是发现数据中的模式和关系,而不是对新的数据进行预测。常见的非监督学习算法包括K-均值聚类、层次聚类、密度聚类等。
  3. 梯度下降:梯度下降是一种最常用的优化算法,用于求解机器学习和人工智能中的最优化问题。它是通过不断地调整模型的参数,以使得损失函数的值最小化。在梯度下降中,我们选择一个初始点,然后沿着损失函数的梯度方向,一步一步地朝最小值点靠近。常见的梯度下降算法包括批量梯度下降、随机梯度下降、小批量梯度下降等。

梯度下降算法样例代码:

import numpy as np
import matplotlib.pyplot as plt
import random
from icecream import ic

def func(x):
    return 10 * x**2 + 32*x + 9

def gradient(x):
    return 20 *x + 32

if __name__ == '__main__':
    x = np.linspace(-10, 10)
    steps = []
    #为了展示效果明显,我这里初始的x值设置为10
    x_star = 10
    # x_star = random.choice(x)
    alpha = 1e-3
    print("学习率:%f" %(alpha))
    for i in range(100):
        # 更新我们的当前点为当前点减去学习率乘以梯度
        x_star = x_star + -1*gradient(x_star)*alpha
        steps.append(x_star)
        ic(x_star, func(x_star))
    fig, ax = plt.subplots()
    ax.plot(x, func(x))
    for i, s in enumerate(steps):
        ax.annotate(str(i+1), (s, func(s)))
    plt.show()

运行示例图:

image.png

运行结果:

image.png

K-means算法样例:

import random
import math
from pylab import mpl
import re
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
from collections import defaultdict

mpl.rcParams['font.sans-serif'] = ['FangSong']  # 指定默认字体
mpl.rcParams['axes.unicode_minus'] = False  # 解决保存图像是负号'-'显示为方块的问题

coordination_source = """
{name:'兰州', geoCoord:[103.73, 36.03]},
{name:'嘉峪关', geoCoord:[98.17, 39.47]},
{name:'西宁', geoCoord:[101.74, 36.56]},
{name:'成都', geoCoord:[104.06, 30.67]},
{name:'石家庄', geoCoord:[114.48, 38.03]},
{name:'拉萨', geoCoord:[102.73, 25.04]},
{name:'贵阳', geoCoord:[106.71, 26.57]},
{name:'武汉', geoCoord:[114.31, 30.52]},
{name:'郑州', geoCoord:[113.65, 34.76]},
{name:'济南', geoCoord:[117, 36.65]},
{name:'南京', geoCoord:[118.78, 32.04]},
{name:'合肥', geoCoord:[117.27, 31.86]},
{name:'杭州', geoCoord:[120.19, 30.26]},
{name:'南昌', geoCoord:[115.89, 28.68]},
{name:'福州', geoCoord:[119.3, 26.08]},
{name:'广州', geoCoord:[113.23, 23.16]},
{name:'长沙', geoCoord:[113, 28.21]},
{name:'海口', geoCoord:[110.35, 20.02]},
{name:'沈阳', geoCoord:[123.38, 41.8]},
{name:'长春', geoCoord:[125.35, 43.88]},
{name:'哈尔滨', geoCoord:[126.63, 45.75]},
{name:'太原', geoCoord:[112.53, 37.87]},
{name:'西安', geoCoord:[108.95, 34.27]},
{name:'台湾', geoCoord:[121.30, 25.03]},
{name:'北京', geoCoord:[116.46, 39.92]},
{name:'上海', geoCoord:[121.48, 31.22]},
{name:'重庆', geoCoord:[106.54, 29.59]},
{name:'天津', geoCoord:[117.2, 39.13]},
{name:'呼和浩特', geoCoord:[111.65, 40.82]},
{name:'南宁', geoCoord:[108.33, 22.84]},
{name:'西藏', geoCoord:[91.11, 29.97]},
{name:'银川', geoCoord:[106.27, 38.47]},
{name:'乌鲁木齐', geoCoord:[87.68, 43.77]},
{name:'香港', geoCoord:[114.17, 22.28]},
{name:'澳门', geoCoord:[113.54, 22.19]}
"""


def get_city_location_features():
    pattern = re.compile(r"name:'(\w+)',\s+geoCoord:[(\d+.\d+),\s(\d+.\d+)]")
    city_location = {
        '香港': (114.17, 22.28)
    }
    for line in coordination_source.split('\n'):
        city_info = pattern.findall(line)
        if not city_info: continue
        city, long, lat = city_info[0]
        long, lat = float(long), float(lat)
        city_location[city] = (long, lat)
    print(city_location)
    print("特征提取结束")
    return city_location


def draw_city(city_location):
    city_graph = nx.Graph()
    city_graph.add_nodes_from(list(city_location.keys()))
    nx.draw(city_graph, city_location, with_labels=True, node_size=30)


def get_long_lat_vector(city_location):
    all_x = []
    all_y = []
    for _, location in city_location.items():
        x, y = location
        all_x.append(x)
        all_y.append(y)
    return all_x, all_y


def get_random_center(all_x, all_y):
    r_x = random.uniform(min(all_x), max(all_x))
    r_y = random.uniform(min(all_y), max(all_y))
    return r_x, r_y


# 随机获取K个特征中心点
def get_centers(all_x, all_y):
    K = 5
    centers = dict()
    for i in range(K):
        centers[i] = get_random_center(all_x, all_y)
    print(centers)
    return centers


def geo_distance(origin, destination):
    lon1, lat1 = origin
    lon2, lat2 = destination
    radius = 6371  # km

    dlat = math.radians(lat2 - lat1)
    dlon = math.radians(lon2 - lon1)
    a = (math.sin(dlat / 2) * math.sin(dlat / 2) +
         math.cos(math.radians(lat1)) * math.cos(math.radians(lat2)) *
         math.sin(dlon / 2) * math.sin(dlon / 2))
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
    d = radius * c
    return d


def iterate_once(centers, closet_points, threshold=5):
    have_changed = False
    for c in closet_points:
        former_center = centers[c]
        neighbors = closet_points[c]
        neighbors_center = np.mean(neighbors, axis=0)
        if geo_distance(neighbors_center, former_center) > threshold:
            centers[c] = neighbors_center
            have_changed = True
        else:
            pass  ## keep former center

    return centers, have_changed


def kmeans( all_x,all_y, k, threshold=5):
    # print(type(Xs))
    # all_x = Xs[:, 0]
    # all_y = Xs[:, 1]

    K = k
    centers = {'{}'.format(i + 1): get_random_center(all_x, all_y) for i in range(K)}

    changed = True
    while changed:
        closet_points = defaultdict(list)

        for x, y, in zip(all_x, all_y):
            closet_c, closet_dis = min([(k, geo_distance((x, y), centers[k])) for k in centers], key=lambda t: t[1])
            closet_points[closet_c].append([x, y])

        centers, changed = iterate_once(centers, closet_points, threshold)
        print('iteration')
    return centers

def draw_cities(citise, color=None):
    city_graph = nx.Graph()
    city_graph.add_nodes_from(list(citise.keys()))
    nx.draw(city_graph, citise, node_color=color, with_labels=True, node_size=30)

def draws_cities_with_station():
    plt.figure(1, figsize=(12, 12))
    all_x, all_y = get_long_lat_vector(get_city_location_features())
    city_location_with_station = {
        '能源站-{}'.format(i): position for i, position in kmeans( all_x, all_y,5,5).items()}
    draw_cities(city_location_with_station, color='green')
    draw_cities(get_city_location_features(), color='red')

if __name__ == '__main__':
    city_location = get_city_location_features()
    draw_city(city_location)
    all_x, all_y = get_long_lat_vector(city_location)
    centers = get_centers(all_x, all_y)
    # draws_cities_with_station()

image.png