计算机专业的荣耀时刻:用Python+Django构建全球用水量大数据可视化分析系统

45 阅读7分钟

前言

一.开发工具简介

  • 大数据框架:Hadoop+Spark(本次没用Hive,支持定制)
  • 开发语言:Python+Java(两个版本都支持)
  • 后端框架:Django+Spring Boot(Spring+SpringMVC+Mybatis)(两个版本都支持)
  • 前端:Vue+ElementUI+Echarts+HTML+CSS+JavaScript+jQuery
  • 详细技术点:Hadoop、HDFS、Spark、Spark SQL、Pandas、NumPy
  • 数据库:MySQL

二.系统内容简介

基于大数据的全球用水量数据可视化分析系统是一套运用前沿大数据技术构建的综合性水资源分析平台,该系统以Hadoop分布式存储框架为底层架构,结合Spark大数据计算引擎和Spark SQL进行海量用水数据的高效处理与分析。系统采用Python作为主要开发语言,运用Django Web框架构建后端服务,前端则基于Vue.js框架配合ElementUI组件库和Echarts可视化库打造直观友好的用户界面,数据存储采用MySQL关系型数据库,并通过HDFS分布式文件系统管理大规模数据集。系统核心功能涵盖系统首页、我的信息、用户管理等基础模块,重点提供全球用水量管理、大屏可视化展示、多维关联聚类分析、多国用水横向对比分析、重点国家深度剖析、全球用水时序分析以及稀缺状况归因分析等专业功能模块。通过运用Pandas和NumPy等Python科学计算库进行数据预处理与统计分析,系统能够从多个维度深入挖掘全球各国用水规律与趋势,为水资源管理决策提供科学依据,同时借助大数据技术的分布式计算能力,实现对海量用水数据的实时处理与动态可视化呈现,是集数据采集、存储、计算、分析、可视化于一体的完整大数据应用解决方案。

三.系统功能演示

计算机专业的荣耀时刻:用Python+Django构建全球用水量大数据可视化分析系统

四.系统界面展示

大屏幕 多国用水横向分析 多维关联聚类分析 全球用水时序分析 水资源稀缺归因分析 全球用水量管理 用户管理 登录

五.系统源码展示



# 功能1:全球用水量管理 - 核心数据处理与统计分析
def process_global_water_usage(self, request):
    country = request.GET.get('country', 'all')
    year_range = request.GET.get('year_range', '2010-2023')
    usage_type = request.GET.get('type', 'total')
    
    start_year, end_year = map(int, year_range.split('-'))
    
    # 使用Spark SQL处理大数据查询
    spark_query = f"""
        SELECT country, year, agricultural_usage, industrial_usage, 
               domestic_usage, (agricultural_usage + industrial_usage + domestic_usage) as total_usage,
               population, gdp, water_stress_level
        FROM global_water_data 
        WHERE year BETWEEN {start_year} AND {end_year}
        {f"AND country = '{country}'" if country != 'all' else ''}
        ORDER BY year DESC, total_usage DESC
    """
    
    raw_data = self.spark.sql(spark_query).collect()
    
    # 数据预处理和统计计算
    processed_data = []
    country_stats = {}
    
    for row in raw_data:
        country_name = row['country']
        if country_name not in country_stats:
            country_stats[country_name] = {
                'total_usage': 0, 'avg_stress_level': 0, 
                'growth_rate': 0, 'efficiency_score': 0
            }
        
        # 计算用水效率 = GDP/总用水量
        efficiency = row['gdp'] / max(row['total_usage'], 1)
        country_stats[country_name]['efficiency_score'] += efficiency
        country_stats[country_name]['total_usage'] += row['total_usage']
        country_stats[country_name]['avg_stress_level'] += row['water_stress_level']
        
        processed_data.append({
            'country': country_name,
            'year': row['year'],
            'agricultural': row['agricultural_usage'],
            'industrial': row['industrial_usage'],
            'domestic': row['domestic_usage'],
            'total': row['total_usage'],
            'per_capita': row['total_usage'] / max(row['population'], 1),
            'stress_level': row['water_stress_level'],
            'efficiency': efficiency
        })
    
    # 计算各国平均值和排名
    for country_name in country_stats:
        data_count = len([d for d in processed_data if d['country'] == country_name])
        country_stats[country_name]['avg_stress_level'] /= data_count
        country_stats[country_name]['efficiency_score'] /= data_count
    
    return {
        'data': processed_data,
        'statistics': country_stats,
        'total_countries': len(country_stats),
        'data_range': f"{start_year}-{end_year}"
    }

# 功能2:多维关联聚类分析 - 核心聚类算法处理
def perform_multidimensional_clustering(self, request):
    dimensions = request.POST.get('dimensions', 'usage,gdp,population,stress').split(',')
    cluster_count = int(request.POST.get('clusters', 5))
    algorithm = request.POST.get('algorithm', 'kmeans')
    
    # 获取多维数据
    spark_query = """
        SELECT country, AVG(agricultural_usage + industrial_usage + domestic_usage) as avg_usage,
               AVG(gdp) as avg_gdp, AVG(population) as avg_population,
               AVG(water_stress_level) as avg_stress, AVG(precipitation) as avg_precipitation,
               AVG(renewable_water_resources) as avg_renewable
        FROM global_water_data 
        WHERE year >= 2018
        GROUP BY country
        HAVING COUNT(*) >= 3
    """
    
    raw_data = self.spark.sql(spark_query).collect()
    
    # 构建多维特征矩阵
    countries = []
    feature_matrix = []
    
    for row in raw_data:
        countries.append(row['country'])
        features = []
        
        if 'usage' in dimensions:
            features.append(float(row['avg_usage']))
        if 'gdp' in dimensions:
            features.append(float(row['avg_gdp']))
        if 'population' in dimensions:
            features.append(float(row['avg_population']))
        if 'stress' in dimensions:
            features.append(float(row['avg_stress']))
        if 'precipitation' in dimensions:
            features.append(float(row['avg_precipitation']))
        if 'renewable' in dimensions:
            features.append(float(row['avg_renewable']))
            
        feature_matrix.append(features)
    
    # 数据标准化处理
    feature_array = np.array(feature_matrix)
    normalized_features = (feature_array - np.mean(feature_array, axis=0)) / np.std(feature_array, axis=0)
    
    # 执行聚类算法
    if algorithm == 'kmeans':
        from sklearn.cluster import KMeans
        clusterer = KMeans(n_clusters=cluster_count, random_state=42, n_init=10)
    elif algorithm == 'hierarchical':
        from sklearn.cluster import AgglomerativeClustering
        clusterer = AgglomerativeClustering(n_clusters=cluster_count)
    
    cluster_labels = clusterer.fit_predict(normalized_features)
    
    # 分析聚类结果
    clusters = {}
    for i, country in enumerate(countries):
        cluster_id = int(cluster_labels[i])
        if cluster_id not in clusters:
            clusters[cluster_id] = {
                'countries': [], 'center': [], 'characteristics': {},
                'size': 0, 'avg_features': {}
            }
        
        clusters[cluster_id]['countries'].append(country)
        clusters[cluster_id]['size'] += 1
        
        # 累加特征值用于计算聚类中心
        for j, dim in enumerate(dimensions):
            if dim not in clusters[cluster_id]['avg_features']:
                clusters[cluster_id]['avg_features'][dim] = 0
            clusters[cluster_id]['avg_features'][dim] += feature_matrix[i][j]
    
    # 计算各聚类的特征平均值和特征描述
    for cluster_id in clusters:
        size = clusters[cluster_id]['size']
        for dim in clusters[cluster_id]['avg_features']:
            clusters[cluster_id]['avg_features'][dim] /= size
        
        # 生成聚类特征描述
        avg_usage = clusters[cluster_id]['avg_features'].get('usage', 0)
        avg_gdp = clusters[cluster_id]['avg_features'].get('gdp', 0)
        avg_stress = clusters[cluster_id]['avg_features'].get('stress', 0)
        
        if avg_usage > 1000 and avg_gdp > 20000:
            clusters[cluster_id]['characteristics']['type'] = '发达高耗水型'
        elif avg_usage < 500 and avg_stress > 0.4:
            clusters[cluster_id]['characteristics']['type'] = '水资源稀缺型'
        elif avg_gdp < 10000 and avg_usage > 800:
            clusters[cluster_id]['characteristics']['type'] = '发展中高耗水型'
        else:
            clusters[cluster_id]['characteristics']['type'] = '均衡发展型'
    
    return {
        'clusters': clusters,
        'total_clusters': len(clusters),
        'algorithm': algorithm,
        'dimensions': dimensions,
        'silhouette_score': self.calculate_silhouette_score(normalized_features, cluster_labels)
    }

# 功能3:全球用水时序分析 - 核心时间序列预测处理
def analyze_global_water_time_series(self, request):
    target_country = request.GET.get('country', 'global')
    predict_years = int(request.GET.get('predict_years', 5))
    analysis_type = request.GET.get('type', 'total_usage')
    
    # 获取时序数据
    if target_country == 'global':
        spark_query = """
            SELECT year, SUM(agricultural_usage + industrial_usage + domestic_usage) as total_usage,
                   AVG(water_stress_level) as avg_stress, SUM(population) as total_population
            FROM global_water_data 
            WHERE year BETWEEN 2000 AND 2023
            GROUP BY year ORDER BY year
        """
    else:
        spark_query = f"""
            SELECT year, (agricultural_usage + industrial_usage + domestic_usage) as total_usage,
                   water_stress_level as avg_stress, population as total_population
            FROM global_water_data 
            WHERE country = '{target_country}' AND year BETWEEN 2000 AND 2023
            ORDER BY year
        """
    
    raw_data = self.spark.sql(spark_query).collect()
    
    # 构建时间序列数据
    years = [row['year'] for row in raw_data]
    values = [float(row[analysis_type]) for row in raw_data]
    
    # 时间序列分析和趋势提取
    time_series = pd.Series(values, index=years)
    
    # 计算移动平均和趋势
    ma_3 = time_series.rolling(window=3).mean()
    ma_5 = time_series.rolling(window=5).mean()
    
    # 趋势分析 - 使用线性回归
    X = np.array(years).reshape(-1, 1)
    y = np.array(values)
    
    from sklearn.linear_model import LinearRegression
    trend_model = LinearRegression()
    trend_model.fit(X, y)
    
    trend_values = trend_model.predict(X)
    trend_slope = trend_model.coef_[0]
    
    # 季节性分解(如果数据足够多)
    seasonal_pattern = {}
    if len(values) >= 12:
        # 简单的季节性模式检测
        residuals = y - trend_values
        for i, residual in enumerate(residuals):
            season_key = i % 4  # 按4年周期
            if season_key not in seasonal_pattern:
                seasonal_pattern[season_key] = []
            seasonal_pattern[season_key].append(residual)
    
    # 预测未来值
    future_years = list(range(max(years) + 1, max(years) + predict_years + 1))
    future_X = np.array(future_years).reshape(-1, 1)
    future_predictions = trend_model.predict(future_X)
    
    # 添加季节性调整
    adjusted_predictions = []
    for i, pred in enumerate(future_predictions):
        season_adj = 0
        if seasonal_pattern:
            season_key = (len(values) + i) % 4
            if season_key in seasonal_pattern:
                season_adj = np.mean(seasonal_pattern[season_key])
        adjusted_predictions.append(pred + season_adj)
    
    # 计算预测区间
    residual_std = np.std(y - trend_values)
    confidence_upper = [pred + 1.96 * residual_std for pred in adjusted_predictions]
    confidence_lower = [pred - 1.96 * residual_std for pred in adjusted_predictions]
    
    # 变化点检测
    change_points = []
    for i in range(1, len(values) - 1):
        if abs(values[i] - values[i-1]) > 2 * residual_std:
            change_points.append({
                'year': years[i],
                'change': values[i] - values[i-1],
                'type': 'increase' if values[i] > values[i-1] else 'decrease'
            })
    
    # 趋势特征分析
    trend_analysis = {
        'slope': float(trend_slope),
        'direction': 'increasing' if trend_slope > 0 else 'decreasing',
        'volatility': float(np.std(residuals)),
        'r_squared': float(trend_model.score(X, y)),
        'avg_annual_change': float(np.mean(np.diff(values))),
        'max_value': {'year': years[np.argmax(values)], 'value': float(max(values))},
        'min_value': {'year': years[np.argmin(values)], 'value': float(min(values))}
    }
    
    return {
        'historical_data': {
            'years': years,
            'values': values,
            'ma_3': ma_3.fillna(0).tolist(),
            'ma_5': ma_5.fillna(0).tolist(),
            'trend': trend_values.tolist()
        },
        'predictions': {
            'years': future_years,
            'values': adjusted_predictions,
            'upper_bound': confidence_upper,
            'lower_bound': confidence_lower
        },
        'trend_analysis': trend_analysis,
        'change_points': change_points,
        'seasonal_pattern': seasonal_pattern,
        'analysis_type': analysis_type,
        'target': target_country
    }

六.系统文档展示

在这里插入图片描述

结束

在这里插入图片描述

💕💕文末获取源码联系 计算机程序员小杨