基于大数据的烟酒成瘾个体数据分析系统 | Hadoop+Spark双引擎驱动:烟酒成瘾个体数据分析系统7天开发指南

29 阅读6分钟

💖💖作者:计算机毕业设计江挽 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目

基于大数据的烟酒成瘾个体数据分析系统介绍

烟酒成瘾个体数据分析系统是一款基于大数据技术的智能分析平台,采用Hadoop+Spark双引擎架构实现海量成瘾数据的高效处理与深度挖掘。系统通过集成Python数据科学生态和Java企业级开发框架,构建了从数据采集、存储、处理到可视化展示的完整技术链路。平台核心功能涵盖用户管理、成瘾历史分析、人口统计学分析、健康与生活方式分析、风险评估分析以及大屏可视化等多个维度,能够对个体的烟酒成瘾行为模式进行全方位数据建模和智能分析。系统前端采用Vue+ElementUI+Echarts技术栈打造现代化交互界面,后端支持Django和Spring Boot双框架选择,数据存储基于MySQL关系型数据库,通过HDFS分布式文件系统处理大规模数据集。整个系统运用Spark SQL进行复杂查询优化,结合Pandas和NumPy进行数据科学计算,为研究人员和医疗机构提供科学的成瘾行为评估工具和决策支持,实现了传统静态管理系统向智能数据分析平台的技术升级。

基于大数据的烟酒成瘾个体数据分析系统演示视频

演示视频

基于大数据的烟酒成瘾个体数据分析系统演示图片

在这里插入图片描述

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

基于大数据的烟酒成瘾个体数据分析系统代码展示

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import mysql.connector
from django.http import JsonResponse
from django.views import View
import json
spark = SparkSession.builder.appName("AddictionAnalysis").config("spark.sql.adaptive.enabled", "true").getOrCreate()
class AddictionHistoryAnalysis(View):
    def post(self, request):
        user_id = json.loads(request.body).get('user_id')
        connection = mysql.connector.connect(host='localhost', database='addiction_db', user='root', password='password')
        cursor = connection.cursor()
        cursor.execute("SELECT * FROM addiction_records WHERE user_id = %s", (user_id,))
        records = cursor.fetchall()
        df = spark.createDataFrame(records, ["id", "user_id", "substance_type", "consumption_amount", "consumption_date", "mood_score", "stress_level"])
        df_with_features = df.withColumn("days_since_start", datediff(col("consumption_date"), lit("2024-01-01")))
        df_with_features = df_with_features.withColumn("weekly_pattern", date_format(col("consumption_date"), "E"))
        df_with_features = df_with_features.withColumn("consumption_trend", lag(col("consumption_amount")).over(Window.partitionBy("user_id").orderBy("consumption_date")))
        df_with_features = df_with_features.withColumn("trend_direction", when(col("consumption_amount") > col("consumption_trend"), "increasing").when(col("consumption_amount") < col("consumption_trend"), "decreasing").otherwise("stable"))
        weekly_stats = df_with_features.groupBy("weekly_pattern").agg(avg("consumption_amount").alias("avg_consumption"), count("*").alias("frequency"), avg("mood_score").alias("avg_mood"))
        monthly_trends = df_with_features.withColumn("month", date_format(col("consumption_date"), "yyyy-MM")).groupBy("month", "substance_type").agg(sum("consumption_amount").alias("total_consumption"), avg("stress_level").alias("avg_stress"))
        correlation_data = df_with_features.select("consumption_amount", "mood_score", "stress_level").toPandas()
        correlation_matrix = correlation_data.corr()
        risk_indicators = df_with_features.withColumn("high_consumption", when(col("consumption_amount") > 10, 1).otherwise(0)).withColumn("high_stress", when(col("stress_level") > 7, 1).otherwise(0))
        risk_summary = risk_indicators.agg(sum("high_consumption").alias("high_consumption_days"), sum("high_stress").alias("high_stress_days"), count("*").alias("total_days"))
        result_data = {"weekly_patterns": weekly_stats.collect(), "monthly_trends": monthly_trends.collect(), "correlation_matrix": correlation_matrix.to_dict(), "risk_summary": risk_summary.collect()[0].asDict()}
        cursor.close()
        connection.close()
        return JsonResponse(result_data)
class DemographicAnalysis(View):
    def post(self, request):
        age_range = json.loads(request.body).get('age_range', [18, 65])
        connection = mysql.connector.connect(host='localhost', database='addiction_db', user='root', password='password')
        cursor = connection.cursor()
        cursor.execute("SELECT u.*, ar.* FROM users u LEFT JOIN addiction_records ar ON u.id = ar.user_id WHERE u.age BETWEEN %s AND %s", (age_range[0], age_range[1]))
        user_records = cursor.fetchall()
        columns = ["user_id", "name", "age", "gender", "education", "income_level", "record_id", "substance_type", "consumption_amount", "consumption_date"]
        df = spark.createDataFrame(user_records, columns)
        df_cleaned = df.filter(col("record_id").isNotNull())
        age_groups = df_cleaned.withColumn("age_group", when(col("age") < 25, "18-24").when(col("age") < 35, "25-34").when(col("age") < 45, "35-44").when(col("age") < 55, "45-54").otherwise("55+"))
        demographic_stats = age_groups.groupBy("age_group", "gender", "education").agg(count("*").alias("user_count"), avg("consumption_amount").alias("avg_consumption"), sum("consumption_amount").alias("total_consumption"))
        gender_analysis = df_cleaned.groupBy("gender", "substance_type").agg(count("*").alias("record_count"), avg("consumption_amount").alias("avg_amount"), stddev("consumption_amount").alias("std_consumption"))
        education_impact = df_cleaned.groupBy("education").agg(countDistinct("user_id").alias("unique_users"), avg("consumption_amount").alias("avg_consumption"), max("consumption_amount").alias("max_consumption"))
        income_correlation = df_cleaned.groupBy("income_level", "substance_type").agg(count("*").alias("frequency"), avg("consumption_amount").alias("avg_consumption"))
        cross_analysis = df_cleaned.groupBy("age_group", "gender", "substance_type").agg(count("*").alias("instances"), avg("consumption_amount").alias("mean_consumption"))
        high_risk_demographics = demographic_stats.filter(col("avg_consumption") > 15).orderBy(desc("avg_consumption"))
        pandas_df = df_cleaned.select("age", "consumption_amount", "income_level").toPandas()
        pandas_df['income_numeric'] = pandas_df['income_level'].map({'low': 1, 'medium': 2, 'high': 3})
        age_consumption_corr = pandas_df['age'].corr(pandas_df['consumption_amount'])
        income_consumption_corr = pandas_df['income_numeric'].corr(pandas_df['consumption_amount'])
        result = {"demographic_stats": demographic_stats.collect(), "gender_analysis": gender_analysis.collect(), "education_impact": education_impact.collect(), "income_correlation": income_correlation.collect(), "cross_analysis": cross_analysis.collect(), "correlations": {"age_consumption": age_consumption_corr, "income_consumption": income_consumption_corr}, "high_risk_groups": high_risk_demographics.collect()}
        cursor.close()
        connection.close()
        return JsonResponse(result)
class RiskAssessmentAnalysis(View):
    def post(self, request):
        assessment_params = json.loads(request.body)
        user_id = assessment_params.get('user_id')
        time_period = assessment_params.get('time_period', 30)
        connection = mysql.connector.connect(host='localhost', database='addiction_db', user='root', password='password')
        cursor = connection.cursor()
        cursor.execute("""SELECT ar.*, u.age, u.gender, hr.sleep_hours, hr.exercise_frequency, hr.mental_health_score 
                         FROM addiction_records ar 
                         LEFT JOIN users u ON ar.user_id = u.id 
                         LEFT JOIN health_records hr ON ar.user_id = hr.user_id 
                         WHERE ar.user_id = %s AND ar.consumption_date >= DATE_SUB(CURDATE(), INTERVAL %s DAY)""", (user_id, time_period))
        risk_data = cursor.fetchall()
        columns = ["record_id", "user_id", "substance_type", "consumption_amount", "consumption_date", "mood_score", "stress_level", "age", "gender", "sleep_hours", "exercise_frequency", "mental_health_score"]
        df = spark.createDataFrame(risk_data, columns)
        consumption_risk = df.withColumn("daily_consumption_risk", when(col("consumption_amount") > 20, 3).when(col("consumption_amount") > 10, 2).otherwise(1))
        frequency_risk = consumption_risk.groupBy("user_id").agg(count("*").alias("consumption_days")).withColumn("frequency_risk", when(col("consumption_days") > time_period * 0.8, 3).when(col("consumption_days") > time_period * 0.5, 2).otherwise(1))
        mood_stress_risk = df.withColumn("psychological_risk", when((col("mood_score") < 3) | (col("stress_level") > 8), 3).when((col("mood_score") < 5) | (col("stress_level") > 6), 2).otherwise(1))
        health_risk = df.withColumn("health_risk", when((col("sleep_hours") < 6) | (col("exercise_frequency") < 2) | (col("mental_health_score") < 4), 3).when((col("sleep_hours") < 7) | (col("exercise_frequency") < 3) | (col("mental_health_score") < 6), 2).otherwise(1))
        substance_patterns = df.groupBy("substance_type").agg(sum("consumption_amount").alias("total_amount"), count("*").alias("frequency"), avg("consumption_amount").alias("avg_amount"))
        escalation_pattern = df.withColumn("month_year", date_format(col("consumption_date"), "yyyy-MM")).groupBy("month_year").agg(avg("consumption_amount").alias("monthly_avg")).orderBy("month_year")
        escalation_df = escalation_pattern.toPandas()
        if len(escalation_df) > 1:
            escalation_trend = np.polyfit(range(len(escalation_df)), escalation_df['monthly_avg'], 1)[0]
        else:
            escalation_trend = 0
        risk_scores = consumption_risk.join(frequency_risk, "user_id").join(mood_stress_risk.select("user_id", "psychological_risk").distinct(), "user_id").join(health_risk.select("user_id", "health_risk").distinct(), "user_id")
        final_risk = risk_scores.withColumn("composite_risk_score", (col("daily_consumption_risk") + col("frequency_risk") + col("psychological_risk") + col("health_risk")) / 4.0)
        risk_level = final_risk.withColumn("risk_level", when(col("composite_risk_score") >= 2.5, "High").when(col("composite_risk_score") >= 1.5, "Medium").otherwise("Low"))
        recommendations = []
        avg_risk = final_risk.select("composite_risk_score").collect()[0]["composite_risk_score"]
        if avg_risk >= 2.5:
            recommendations.extend(["建议寻求专业医疗帮助", "考虑参加戒瘾支持小组", "制定严格的消费限制计划"])
        elif avg_risk >= 1.5:
            recommendations.extend(["建议减少消费频率", "增加健康的替代活动", "定期监测消费模式"])
        else:
            recommendations.extend(["保持当前良好状态", "继续健康的生活方式"])
        result = {"risk_assessment": risk_level.collect(), "substance_patterns": substance_patterns.collect(), "escalation_trend": float(escalation_trend), "recommendations": recommendations, "detailed_scores": final_risk.collect()}
        cursor.close()
        connection.close()
        return JsonResponse(result)

基于大数据的烟酒成瘾个体数据分析系统文档展示

在这里插入图片描述

💖💖作者:计算机毕业设计江挽 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目