问题一

问题重述

给问 352 个城市csv文件，1求其中所有 35200 个景点评分的最高分，2有多少个景点，3依据拥有最高评分（BS）景点数量的多少排序，列出前 10 个城市

思路

遍历当前目录下所有的csv文件，取每一行的评分和名字，处理过则跳过（去重），没有处理过则加入集合。如果评分存在且不等于"--"，将评分转换为浮点数。如果评分大于当前最高评分，则更新最高评分，并重置相关计数。如果评分等于当前最高评分，则增加最高评分的计数，并更新对应城市的景点数量。按最高评分景点数量排序城市，并输出前10。

代码讲解

import os
import csv
from collections import defaultdict
import matplotlib.pyplot as plt
import matplotlib

# 设置字体为SimHei，以支持中文显示
matplotlib.rcParams['font.sans-serif'] = ['SimHei']
matplotlib.rcParams['axes.unicode_minus'] = False

def process_csv_files():
    # 存储当前找到的最高评分。
    highest_score = 0
    # 存储最高评分的景点数量。
    highest_score_count = 0
    # 使用defaultdict来存储每个城市拥有最高评分景点的数量。
    city_scores = defaultdict(int)
    # 用于存储已处理的景点名称
    processed_names = set()

    # 遍历当前目录下所有的csv文件
    # 列出当前目录下的所有文件
    for filename in os.listdir('.'):
        if filename.endswith('.csv'):
            # 获取不带扩展名的文件名作为城市名
            city_name = os.path.splitext(filename)[0]
            with open(filename, 'r', encoding='utf-8') as file:
                # 读取CSV文件，每一行作为一个字典。
                reader = csv.DictReader(file)
                # 获取每一行中的评分和名字字段
                for row in reader:
                    name = row.get('名字', '')
                    score = row.get('评分', '')

                    # 如果该名称已经处理过，则跳过
                    if name in processed_names:
                        continue

                    # 将名称添加到已处理集合中
                    processed_names.add(name)

                    # 如果评分存在且不等于--，尝试将其转换为浮点数。
                    if score and score != '--':
                        try:
                            score = float(score)
                            # 如果转换成功并且评分大于当前最高评分，则更新最高评分，并重置相关计数。
                            if score > highest_score:
                                highest_score = score
                                highest_score_count = 1
                                city_scores.clear()
                                city_scores[city_name] = 1
                            # 如果评分等于当前最高评分，则增加最高评分的计数，并更新对应城市的景点数量。
                            elif score == highest_score:
                                highest_score_count += 1
                                city_scores[city_name] += 1
                        except ValueError:
                            continue

    print(f"最高评分: {highest_score}")
    print(f"最高评分的景点数量: {highest_score_count}")

    # 按最高评分景点数量排序城市
    sorted_cities = sorted(city_scores.items(), key=lambda x: x[1], reverse=True)

    print("\n拥有最高评分景点最多的前10个城市:")
    for i, (city, count) in enumerate(sorted_cities[:10], 1):
        print(f"{i}. {city}: {count}个景点")

    return sorted_cities[:20]

def plot_bar_chart(top_cities):
    cities = [city for city, count in top_cities]
    counts = [int(count) for city, count in top_cities]

    plt.figure(figsize=(10, 6))
    plt.barh(cities, counts, color='lightblue')
    plt.xlabel('最高评分景点数')
    plt.ylabel('城市名')
    plt.title('拥有最高评分景点最多的前10个城市')
    # 反转y轴，使得排名第一的城市在顶部
    plt.gca().invert_yaxis()

    # 设置横坐标的刻度为整数
    max_count = max(counts)
    plt.xticks(range(0, max_count + 1))

    plt.show()

if __name__ == "__main__":
    top_cities = process_csv_files()
    plot_bar_chart(top_cities)

值得留意的：

# defaultdict的用法
city_scores = defaultdict(int)
city_scores.clear()
city_scores[city_name] = 1

sorted_cities = sorted(city_scores.items(), key=lambda x: x[1], reverse=True)

key=lambda x: x[1] 相当于

def get_score(x):
    return x[1]
sorted_cities = sorted(city_scores.items(), key=get_score, reverse=True)

lambda x: 创建了一个匿名函数，x 是这个函数的参数

对于可迭代对象中的每个元素，这个 lambda 函数都会被调用一次

x 每次都会是一个元组，例如 ("New York", 85)

x[1] 表示获取这个元组的第二个元素（索引为1），也就是分数

问题二

问题重述

结合城市规模、环境环保、人文底蕴、交通便利，以及气候、美食等因素，卡一个城市一个地点，推荐最令外国游客向往的 50 个城市

思路与建模

影响因素

城市规模人口

环境环保 AQI 绿化废水废气垃圾

人文底蕴文化设施历史遗迹博物馆

交通便利覆盖率密度里程

气候气温年降水适宜旅游天数

美食餐饮营收

（不一定一对一，写作时可以多扯扯）

熵权法

熵权法是一种用于确定指标权重的方法，常用于多指标决策分析。

（1）数据处理：用线性回归模型预测缺失的餐饮营收、数据归一化

（2）使用熵权法决定权重

计算熵值：

定义函数calculate_entropy计算每个指标的熵值

熵值用于衡量指标的离散程度

计算权重：

基于熵值计算每个指标的权重

权重反映了指标的重要性

（3）在计算熵值时使用归一化数据的作用：

归一化将数据转换为概率分布，确保每个数据点的值在 0 到 1 之间，并且总和为 1。这是计算熵的前提条件。

归一化可以消除不同量纲或尺度对熵计算的影响，使得计算结果更具可比性。

归一化可以防止在计算对数时出现数值不稳定的问题，尤其是在数据范围较大时。

通过归一化，熵值可以被标准化到一个特定的范围（通常是 [0, 1]），方便比较不同数据集的熵。

（4）熵权法的数学表示如下：

数据标准化:
- 对于每个指标 $j$ 和样本 $i$ ，将原始数据 $x_{ij}$ 标准化为 $p_{ij}$ ： $p_{ij} = \frac{x_{ij}}{\sum_{i=1}^{n} x_{ij}}$ , 其中 $n$ 是样本数量。
计算熵值:
- 计算每个指标 $j$ 的熵值 $E_j$ ： $E_j = -k \sum_{i=1}^{n} p_{ij} \ln(p_{ij})$ 其中 $k = \frac{1}{\ln(n)}$ 是一个常数，用于标准化熵值。
计算差异系数:
- 差异系数 $d_j$ 反映了指标的信息差异性： $d_j = 1 - E_j$
计算权重:
- 根据差异系数计算每个指标的权重 $w_j$ ： $w_j = \frac{d_j}{\sum_{j=1}^{m} d_j}$ 其中 $m$ 是指标的数量。

熵权法通过计算每个指标的信息熵，来确定其在决策中的相对重要性。熵值越小，说明指标的信息量越大，其权重也越大。

代码讲解

import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# 设置中文字体
plt.rcParams['font.sans-serif'] = ['SimHei']  # 指定默认字体为黑体
plt.rcParams['axes.unicode_minus'] = False  # 解决保存图像是负号'-'显示为方块的问题

# 读取Excel文件
df = pd.read_excel("D:/Mathematical modeling/华数杯/Q2data_1.xlsx")

# 提取城区人口(B列)和餐饮营收(Q列)
X = df.iloc[:, 1].values.reshape(-1, 1)  # 城区人口
y = df.iloc[:, 16].values  # 餐饮营收

# 创建线性回归模型
model = LinearRegression()

# 数据预处理：清理缺失值
# 创建了一个布尔掩码，标识数组 y 中的非 NaN（非数字）元素
mask = ~np.isnan(y)
X_clean = X[mask]
# 使用布尔掩码 mask，选择 y 数组中对应位置为 True 的元素。
y_clean = y[mask]

# 拟合线性回归模型
model.fit(X_clean, y_clean)

# 预测所有的餐饮营收（包括缺失值）
y_predict = model.predict(X)

# 创建一个新的DataFrame来存储预测结果, f_predicted 是一个新的DataFrame, 包含两列：Actual（实际值）和 Predicted（预测值）
df_predicted = pd.DataFrame({'Actual': y, 'Predicted': y_predict})

# 使用预测值填充缺失的餐饮营收，使用 apply 函数对每一行进行检查，如果 Actual 列是 NaN，则用 Predicted 列的值填充。
df_predicted['Final'] = df_predicted.apply(lambda row: row['Predicted'] if pd.isna(row['Actual']) else row['Actual'], axis=1)

# 将最终结果更新到原始DataFrame中
df.iloc[:, 16] = df_predicted['Final']

# 提取数据列(从B列到U列)
data = df.iloc[:, 1:].values

# 对数据进行归一化处理，使数据的值在0到1之间
def normalize(data):
    return (data - np.min(data, axis=0)) / (np.max(data, axis=0) - np.min(data, axis=0))

normalized_data = normalize(data)

# 计算熵值
def calculate_entropy(data):
    p = data / np.sum(data, axis=0)
    e = -np.sum(p * np.log(p + 1e-8), axis=0) / np.log(len(data))
    return e

entropy = calculate_entropy(normalized_data)

# 计算权重
weights = (1 - entropy) / np.sum(1 - entropy)

# 计算综合得分
scores = np.sum(normalized_data * weights, axis=1)

# 将得分与城市名称结合
result = pd.DataFrame({'City': df.iloc[:, 0], 'Score': scores})

# 排序并选择前50个城市
top_50 = result.sort_values('Score', ascending=False).head(352)

# 输出结果
print(top_50)

# 保存结果到Excel文件
top_50.to_excel('top_50_cities.xlsx', index=False)

# 将更新后的完整数据保存到新的Excel文件
df.to_excel('updated_Q2data_1.xlsx', index=False)

# 结果可视化
plt.figure(figsize=(12, 8))
plt.bar(top_50['City'], top_50['Score'], color='c', alpha=0.6)  # 设置颜色和透明度
plt.title("最令外国游客向往的 50 个城市")
plt.xlabel("城市")
plt.ylabel("得分")
plt.xticks(rotation=45, ha='right')  # 旋转角度和对齐方式
plt.ylim(0, top_50['Score'].max() * 1.1)
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.2f}'))
plt.tight_layout()
plt.show()

值得留意的：

# 提取城区人口(B列)和餐饮营收(Q列)
X = df.iloc[:, 1].values.reshape(-1, 1)  # 城区人口
y = df.iloc[:, 16].values  # 餐饮营收

这段代码从数据框 df 中提取了两列数据：城区人口和餐饮营收。

df.iloc:

iloc 是 Pandas 的一个索引器，用于基于整数位置进行数据选择。
df.iloc[:, 1] 选择所有行（用 : 表示）和第 2 列（索引从 0 开始，所以第 2 列的索引是 1）。

.values:

这是 Pandas 的一个属性，用于将数据框或系列转换为 NumPy 数组。

.reshape(-1, 1):

这是一个 NumPy 方法，用于调整数组的形状。
-1 让 NumPy 自动计算行数，1 表示将数据变为一列。

X = df.iloc[:, 1].values.reshape(-1, 1):

这行代码提取第 2 列的所有值（即城区人口），并将其转换为一个二维数组（每个值作为单独的一行）。

y = df.iloc[:, 16].values:

这行代码提取第 17 列的所有值（即餐饮营收），并将其转换为一维数组。

# 数据预处理：清理缺失值
# 创建了一个布尔掩码，标识数组 y 中的非 NaN（非数字）元素
mask = ~np.isnan(y)
X_clean = X[mask]
# 使用布尔掩码 mask，选择 y 数组中对应位置为 True 的元素。
y_clean = y[mask]

np.isnan(y):

这个函数用于检查数组 y 中的每个元素是否为 NaN（Not a Number）。它返回一个布尔数组，其中元素为 True 表示对应位置的元素是 NaN。

~:

这是按位取反运算符，作用是将布尔数组中 True 和 False 互换。因此，~np.isnan(y) 生成一个新的布尔数组，其中 True 表示 y 中的元素不是 NaN。

X_clean = X[mask]：

使用布尔掩码 mask 从数组 X 中选择对应位置为 True 的元素。这样可以去除与 y 中 NaN 位置对应的 X 中的元素，从而保持数据的一致性。

DataFrame

Pandas的DataFrame是一个二维的、大小可变的、具有标签的数据结构。它类似于电子表格或SQL表格，具有行和列。DataFrame可以通过多种方式创建，比如使用字典、列表、NumPy数组等。它提供了强大的数据操作和分析功能。

例如，使用字典创建一个简单的DataFrame：

import pandas as pd

data = {
    '姓名': ['张三', '李四', '王五'],
    '年龄': [25, 30, 22],
    '城市': ['北京', '上海', '广州']
}

df = pd.DataFrame(data)
print(df)

这将输出：

   姓名  年龄   城市
0  张三  25   北京
1  李四  30   上海
2  王五  22   广州

DataFrame提供了丰富的方法来进行数据筛选、排序、聚合等操作。

问题三

问题重述

广州出发 144 小时尽可能多城市同时要求综合游玩体验最好具体的游玩路线

输出：具体的游玩路线，包括总花费时间，门票和交通的总费用以及可以游玩的景点数量。

在问题2的 50 个城市里选择，交通只考虑高铁(一个城市一个点在问题2筛选过)

思路与建模

使用贪心算法来规划行程，核心思想是在每一步都选择"性价比"最高的下一个城市,同时确保不超过总时间限制。考虑城市的吸引力(评分)和到达难度(距离),以及每个城市推荐的游览时间。

代码讲解

import pandas as pd
from collections import defaultdict

# 读取数据
cities_df = pd.read_excel("top50.xlsx")
rail_df = pd.read_excel("high_speed_rail_info_with_time_and_cost.xlsx")

# 创建城市信息字典
city_info = {row['City']: {'score': row['Score'], 'travel_time': row['travel_time']} for _, row in cities_df.iterrows()}

# 创建邻接表表示城市间连接
graph = defaultdict(dict)
for _, row in rail_df.iterrows():
    from_city, to_city = row['from_city'], row['to_city']
    distance, time, cost = row['distance_km'], row['time_hours'], row['cost_cny']
    graph[from_city][to_city] = {'distance': distance, 'time': time, 'cost': cost}
    graph[to_city][from_city] = {'distance': distance, 'time': time, 'cost': cost}  # 假设双向连接


def plan_trip(start_city, total_time):
    current_city = start_city
    remaining_time = total_time
    visited_cities = [start_city]
    total_score = 0
    itinerary_details = []

    while remaining_time > 0:
        best_next_city = None
        best_score = -float('inf')

        for next_city, info in graph[current_city].items():
            if next_city in visited_cities:
                continue

            travel_time = info['time']
            city_time = city_info[next_city]['travel_time']
            total_required_time = travel_time + city_time

            if total_required_time > remaining_time:
                continue

            # 计算得分 (考虑距离的倒数和城市评分)
            score = city_info[next_city]['score'] / (info['distance'] + 1)

            if score > best_score:
                best_score = score
                best_next_city = next_city

        if best_next_city is None:
            break

        travel_time = graph[current_city][best_next_city]['time']
        city_time = city_info[best_next_city]['travel_time']
        cost = graph[current_city][best_next_city]['cost']

        visited_cities.append(best_next_city)
        total_score += city_info[best_next_city]['score']
        remaining_time -= (travel_time + city_time)

        itinerary_details.append({
            'from': current_city,
            'to': best_next_city,
            'travel_time': travel_time,
            'city_time': city_time,
            'distance': graph[current_city][best_next_city]['distance'],
            'cost': cost
        })

        current_city = best_next_city

    return visited_cities, total_score, itinerary_details


# 执行规划
start_city = "广州市"
total_time = 84  # 小时

itinerary, total_score, details = plan_trip(start_city, total_time)

print(f"行程规划:")
total_travel_time = 0
total_city_time = 0
total_distance = 0
total_cost = 0
for i, detail in enumerate(details):
    print(f"{i + 1}. {detail['from']} -> {detail['to']}:")
    print(f"   旅行时间: {detail['travel_time']:.2f} 小时")
    print(f"   游玩时间: {detail['city_time']:.2f} 小时")
    print(f"   距离: {detail['distance']:.2f} 公里")
    print(f"   花费: {detail['cost']:.2f} 元")
    total_travel_time += detail['travel_time']
    total_city_time += detail['city_time']
    total_distance += detail['distance']
    total_cost += detail['cost']

print(f"\n总结:")
print(f"访问城市数: {len(itinerary)}")
print(f"总旅行时间: {total_travel_time:.2f} 小时")
print(f"总游玩时间: {total_city_time:.2f} 小时")
print(f"总距离: {total_distance:.2f} 公里")
print(f"总用时: {total_travel_time + total_city_time:.2f} 小时")
print(f"总花费: {total_cost:.2f} 元")

值得留意的：

city_info = {row['City']: {'score': row['Score'], 'travel_time': row['travel_time']} for _, row in cities_df.iterrows()}

遍历 cities_df 中的每一行。
对于每一行，以城市名称为键，创建一个包含该城市评分和推荐游览时间的字典。
将这些信息组合成一个大的字典 city_info。

例如，如果 cities_df 包含以下数据：

   City   Score  travel_time
0  北京    95     48
1  上海    90     36

生成的 city_info 字典将如下所示：

{
    '北京': {'score': 95, 'travel_time': 48},
    '上海': {'score': 90, 'travel_time': 36}
}

graph = defaultdict(dict)

创建一个默认字典 graph，其默认值是一个空字典。这将用于存储城市间的连接信息。

graph[from_city][to_city] = {'distance': distance, 'time': time, 'cost': cost}

在图中添加从 from_city 到 to_city 的边。边的属性包括距离、时间和成本。

graph[to_city][from_city] = {'distance': distance, 'time': time, 'cost': cost}

添加反向边，从 to_city 到 from_city。这假设铁路连接是双向的，具有相同的属性。 3.

best_score = -float('inf')

最佳分数为负无穷。

        for next_city, info in graph[current_city].items():

info 是一个字典，包含了从当前城市 (current_city) 到下一个可能访问的城市 (next_city) 的旅行信息。

这个 info 字典通常包含以下键值对：

'time': 从当前城市到下一个城市的旅行时间。

'distance': 从当前城市到下一个城市的距离。

'cost': 从当前城市到下一个城市的旅行成本。

score = city_info[next_city]['score'] / (info['distance'] + 1)

得分计算考虑了城市的评分和距离（距离越远，得分越低）。

plan_trip 函数思路

函数定义 --> 初始化变量 --> 主循环 --> 寻找最佳下一个城市 --> 检查城市是否可访问 --> 计算城市得分 --> 更新最佳下一个城市 --> 检查是否找到下一个城市 --> 更新行程信息 --> 添加行程详情 --> 更新当前城市 --> 返回结果

这个函数使用贪心算法来规划行程，每次都选择当前最优的下一个城市，直到时间用完或没有可访问的城市为止。

问题四

问题重述

问题四相较问题三，考虑费用。输出门票和交通的总费用，总花费时间以及可以游玩的城市数量。

思路与建模

修改算法的核心决策部分即可。

score_per_hour = (score - travel_cost / 100) / total_time_needed

计算每小时的得分，这是决策的关键指标。 score 是城市的原始分数。 travel_cost / 100 是旅行成本的惩罚项。除以100可能是为了平衡分数和成本的量级。用分数减去成本惩罚，然后除以总时间，得到每小时的净得分。

代码展示

import pandas as pd
import networkx as nx

# 读取Excel文件
top_50_df = pd.read_excel('top50.xlsx')
rail_info_df = pd.read_excel('high_speed_rail_info_with_time_and_cost.xlsx')

# 创建城市图
G = nx.Graph()

# 添加边到图中
for _, row in rail_info_df.iterrows():
    G.add_edge(row['from_city'], row['to_city'],
               time=row['time_hours'],
               cost=row['cost_cny'])

# 创建城市评分字典和游玩时长字典
city_scores = dict(zip(top_50_df['City'], top_50_df['Score']))
city_travel_times = dict(zip(top_50_df['City'], top_50_df['travel_time']))


def plan_trip(start_city, time_limit):
    current_city = start_city
    remaining_time = time_limit
    path = [current_city]
    total_score = city_scores.get(current_city, 0)
    total_cost = 0

    # 首先考虑起始城市的游玩时间
    remaining_time -= city_travel_times.get(current_city, 0)

    while remaining_time > 0:
        best_next_city = None
        best_score_per_hour = -float('inf')

        for neighbor in G.neighbors(current_city):
            if neighbor in path:
                continue

            edge_data = G.get_edge_data(current_city, neighbor)
            travel_time = edge_data['time']
            travel_cost = edge_data['cost']
            city_visit_time = city_travel_times.get(neighbor, 0)

            total_time_needed = travel_time + city_visit_time

            if total_time_needed < remaining_time:
                score = city_scores.get(neighbor, 0)
                score_per_hour = (score - travel_cost / 100) / total_time_needed

                if score_per_hour > best_score_per_hour:
                    best_score_per_hour = score_per_hour
                    best_next_city = neighbor

        if best_next_city is None:
            break

        edge_data = G.get_edge_data(current_city, best_next_city)
        city_visit_time = city_travel_times.get(best_next_city, 0)
        remaining_time -= (edge_data['time'] + city_visit_time)
        total_score += city_scores.get(best_next_city, 0) - edge_data['cost'] / 100
        total_cost += edge_data['cost']
        path.append(best_next_city)
        current_city = best_next_city

    return total_score, path, time_limit - remaining_time, total_cost


# 执行规划
start_city = "广州市"
time_limit = 84  # 小时

total_score, optimal_path, total_time, total_cost = plan_trip(start_city, time_limit)

# 输出结果
print(f"最优路径: {' -> '.join(optimal_path)}")
print(f"总时间: {total_time:.2f} 小时")
print(f"总成本: {total_cost:.2f} 元")

# 详细行程
print("\n详细行程:")
current_time = 0
for i in range(len(optimal_path)):
    city = optimal_path[i]
    print(f"城市: {city}")
    print(f"到达时间: {current_time:.2f} 小时")

    visit_time = city_travel_times.get(city, 0)
    print(f"游玩时间: {visit_time:.2f} 小时")

    current_time += visit_time

    if i < len(optimal_path) - 1:
        next_city = optimal_path[i + 1]
        edge_data = G.get_edge_data(city, next_city)
        travel_time = edge_data['time']
        travel_cost = edge_data['cost']

        print(f"前往 {next_city} 的旅行时间: {travel_time:.2f} 小时")
        print(f"前往 {next_city} 的旅行成本: {travel_cost:.2f} 元")

        current_time += travel_time

    print()

问题五

问题重述

入境地不限，既要尽可能的游览更多的山，又需要使门票和交通的总费用尽可能的少。

给出具体的游玩路线，包括总花费时间，门票和交通的总费用以及可以游玩的景点数量。

思路与建模

入境地不限，使用贪心算法构建初始路线随机选择起点，然后每次选择最近的下一个景点，且由于城市数量从50个扩展到352个，方案的可能性成倍增加，故采用局部搜索算法。

局部搜索算法是一种用于求解组合优化问题的启发式算法，它旨在通过逐步改进一个或多个候选解来找到问题的近似解。

局部搜索算法步骤描述：

初始化：从问题的解空间中随机选择一个初始解。
移动：在当前解决方案的邻域中搜索一个改进的解决方案。
评估：使用一个目标函数作为评估函数计算解的质量。
更替：如果找到的解优于当前解，那么用找到的解替换当前解。
终止：当达到最大迭代次数或长时间未找到更优解时停止搜索。

代码讲解

import pandas as pd
from geopy.distance import geodesic
import random

# 读取Excel文件
df = pd.read_excel('mountain_spots.xlsx')
city_scores = pd.read_excel('city_scores.xlsx')

# 将城市评分添加到df中
df = df.merge(city_scores, on='城市', how='left')

# 如果有些城市没有评分，我们可以给它们一个默认评分
df['总评分'].fillna(df['总评分'].mean(), inplace=True)


# 计算两点之间的距离（单位：公里）
def calculate_distance(lat1, lon1, lat2, lon2):
    return geodesic((lat1, lon1), (lat2, lon2)).kilometers


# 计算高铁时间和费用
def calculate_train_time_and_cost(distance):
    speed = 300  # 假设高铁平均速度为300km/h
    time = distance / speed
    cost = distance * 0.45  # 假设每公里0.45元
    return time, cost


# 修改calculate_route_stats函数以考虑城市评分
def calculate_route_stats(route):
    total_time = 0
    total_cost = 0
    spots_visited = len(route)
    total_score = 0

    for i in range(len(route)):
        spot = df.loc[route[i]]
        total_score += spot['总评分']

        if i < len(route) - 1:
            next_spot = df.loc[route[i + 1]]
            distance = calculate_distance(spot['纬度'], spot['经度'], next_spot['纬度'], next_spot['经度'])
            train_time, train_cost = calculate_train_time_and_cost(distance)

            total_time += train_time + 3  # 3小时游玩时间
            total_cost += train_cost + 120  # 120元门票

    return total_time, total_cost, spots_visited, total_score


# 贪心算法构建初始路线
def greedy_initial_route():
    remaining_spots = set(df.index)
    route = [random.choice(list(remaining_spots))]
    remaining_spots.remove(route[0])

    while remaining_spots and len(route) < len(df):
        current_spot = df.loc[route[-1]]
        nearest_spot = min(remaining_spots, key=lambda x: calculate_distance(
            current_spot['纬度'], current_spot['经度'],
            df.loc[x]['纬度'], df.loc[x]['经度']
        ))
        route.append(nearest_spot)
        remaining_spots.remove(nearest_spot)

        time, _, _, _ = calculate_route_stats(route)
        if time > 84:
            route.pop()
            break

    return route


# 修改local_search函数以考虑城市评分
def local_search(route, max_iterations=100):
    best_route = route
    best_time, best_cost, best_spots, best_score = calculate_route_stats(route)

    for _ in range(max_iterations):
        i, j = sorted(random.sample(range(len(route)), 2))
        new_route = route[:i] + route[i:j + 1][::-1] + route[j + 1:]
        new_time, new_cost, new_spots, new_score = calculate_route_stats(new_route)

        if new_time <= 84 and (new_score > best_score or (new_score == best_score and new_cost < best_cost)):
            best_route = new_route
            best_time, best_cost, best_spots, best_score = new_time, new_cost, new_spots, new_score

    return best_route, best_time, best_cost, best_spots, best_score

# 主函数
# 修改find_best_route函数
def find_best_route(num_iterations=100):
    best_route = None
    best_time = float('inf')
    best_cost = float('inf')
    best_spots = 0
    best_score = 0

    for _ in range(num_iterations):
        initial_route = greedy_initial_route()
        route, time, cost, spots, score = local_search(initial_route)

        if score > best_score or (score == best_score and cost < best_cost):
            best_route = route
            best_time = time
            best_cost = cost
            best_spots = spots
            best_score = score

    return best_route, best_time, best_cost, best_spots, best_score


# 运行程序并输出结果
best_route, best_time, best_cost, best_spots, best_score = find_best_route()

print("最佳旅游路线：")
total_cost = 0
total_time = 0
total_score = 0

for i, spot_id in enumerate(best_route):
    spot = df.loc[spot_id]
    print(f"\n{i + 1}. {spot['城市']} ({spot['景点名字']})")

    # 添加景点门票费用和游玩时间
    total_cost += 120
    total_time += 3
    total_score += spot['总评分']
    print(f"   门票费用：120元")
    print(f"   游玩时间：3小时")
    print(f"   城市评分：{spot['总评分']:.2f}")
    print(f"   累计费用：{total_cost:.2f}元")
    print(f"   累计时间：{total_time:.2f}小时")
    print(f"   累计评分：{total_score:.2f}")

    # 如果不是最后一个景点，计算到下一个景点的交通费用和时间
    if i < len(best_route) - 1:
        next_spot = df.loc[best_route[i + 1]]
        distance = calculate_distance(spot['纬度'], spot['经度'], next_spot['纬度'], next_spot['经度'])
        train_time, train_cost = calculate_train_time_and_cost(distance)

        total_cost += train_cost
        total_time += train_time

        print(f"\n   到下一站 {next_spot['城市']} ({next_spot['景点名字']})：")
        print(f"   - 交通费用：{train_cost:.2f}元")
        print(f"   - 交通时间：{train_time:.2f}小时")
        print(f"   累计费用：{total_cost:.2f}元")
        print(f"   累计时间：{total_time:.2f}小时")

print(f"\n总费用：{total_cost:.2f} 元")
print(f"总时间：{total_time:.2f} 小时")
print(f"总评分：{total_score:.2f}")
print(f"游玩景点数量：{best_spots}")

值得留意的：

df = df.merge(city_scores, on='城市', how='left')：这行代码执行了一个数据框（DataFrame）的合并操作，将城市评分信息添加到原始数据框中。

（1） df.merge(): 这是pandas库中用于合并数据框的方法。

（2） city_scores: 这是另一个数据框，包含了城市的评分信息。

（3） on='城市': 这个参数指定了合并操作的依据列。在这里，两个数据框都有一个名为"城市"的列，合并操作将基于这个列进行。

（4） how='left': 这个参数指定了合并的类型。'left'表示执行左连接，意味着:

保留左侧数据框（df）的所有行
如果右侧数据框（city_scores）中没有匹配的城市，相应的列会填充NaN值

（5）最后，合并的结果被赋值回 df，即更新了原始的df数据框。

greedy_initial_route() 实现了一个贪心算法来生成初始路线。

def greedy_initial_route():
   # 创建一个包含所有景点索引的集合。
   remaining_spots = set(df.index)
   # 随机选择一个起始点作为路线的第一个景点。
   route = [random.choice(list(remaining_spots))]
   # 从待访问景点集合中移除起始点。
   remaining_spots.remove(route[0])

   # 进入一个while循环，只要还有未访问的景点且路线长度小于总景点数，就继续
   while remaining_spots and len(route) < len(df):
       # 获取当前路线最后一个景点的信息。
       current_spot = df.loc[route[-1]]
       # 找出距离当前景点最近的未访问景点。这里使用`calculate_distance`函数计算两点间的距离。
       nearest_spot = min(remaining_spots, key=lambda x: calculate_distance(
           current_spot['纬度'], current_spot['经度'],
           df.loc[x]['纬度'], df.loc[x]['经度']
       ))
       # 将最近的景点添加到路线中。
       route.append(nearest_spot)
       # 从待访问集合中移除这个景点。
       remaining_spots.remove(nearest_spot)

       # 计算当前路线的总时间。
       time, _, _, _ = calculate_route_stats(route)
       # 如果总时间超过84小时，就移除最后添加的景点并结束循环。这是为了确保路线在时间限制内。
       if time > 84:
           route.pop()
           break

   return route

这个算法的核心思想是每次都选择距离当前位置最近的下一个景点，同时确保总时间不超过限制。这种方法虽然简单，但可能不会得到全局最优解。它是一个快速生成初始解的方法，可以作为进一步优化的起点。

华数杯24-C题 人生第一次数模复盘

问题一

问题重述

思路

代码讲解

问题二

问题重述

思路与建模

代码讲解

问题三

问题重述

思路与建模

代码讲解

问题四

问题重述

思路与建模

代码展示

问题五

问题重述

思路与建模

代码讲解

华数杯24-C题人生第一次数模复盘