基于大数据的健身房会员锻炼数据分析系统 | Hadoop+Spark处理健身数据vs传统数据库查询:毕设技术选型的天壤之别

80 阅读8分钟

💖💖作者:计算机毕业设计江挽 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目

基于大数据的健身房会员锻炼数据分析系统介绍

健身房会员锻炼数据分析系统是一个基于大数据技术栈的综合性分析平台,采用Hadoop分布式存储和Spark大数据处理引擎作为核心技术架构。系统运用Python开发语言结合Django框架构建后端服务,前端采用Vue.js配合ElementUI组件库和Echarts可视化图表库实现用户交互界面。系统通过HDFS分布式文件系统存储海量健身数据,利用Spark SQL进行高效的数据查询和分析,结合Pandas和NumPy进行深度数据处理。核心功能模块包括系统首页展示、用户权限管理、健身房会员数据管理、系统公告发布、会员画像分析、行为偏好分析、锻炼效果分析和健康指标分析等八大功能板块。系统能够处理大规模会员锻炼记录,通过智能算法分析会员的运动习惯、健身偏好和锻炼效果,为健身房运营决策提供数据支撑,同时为会员提供个性化的健身建议和健康指标监测服务。

基于大数据的健身房会员锻炼数据分析系统演示视频

演示视频

基于大数据的健身房会员锻炼数据分析系统演示图片

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

基于大数据的健身房会员锻炼数据分析系统代码展示

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import json

spark = SparkSession.builder.appName("FitnessDataAnalysis").config("spark.sql.adaptive.enabled", "true").config("spark.sql.adaptive.coalescePartitions.enabled", "true").getOrCreate()

def member_portrait_analysis(member_id):
    member_df = spark.sql("SELECT * FROM fitness_members WHERE member_id = '{}'".format(member_id))
    workout_df = spark.sql("SELECT * FROM workout_records WHERE member_id = '{}'".format(member_id))
    equipment_df = spark.sql("SELECT * FROM equipment_usage WHERE member_id = '{}'".format(member_id))
    member_info = member_df.collect()[0]
    total_workouts = workout_df.count()
    avg_duration = workout_df.agg(avg("workout_duration")).collect()[0][0]
    preferred_time = workout_df.groupBy("workout_time_slot").count().orderBy(desc("count")).collect()[0]["workout_time_slot"]
    equipment_preference = equipment_df.groupBy("equipment_type").count().orderBy(desc("count")).limit(5).collect()
    monthly_frequency = workout_df.withColumn("month", date_format("workout_date", "yyyy-MM")).groupBy("month").count().orderBy("month").collect()
    fitness_level = calculate_fitness_level(avg_duration, total_workouts, member_info["age"])
    bmi = member_info["weight"] / ((member_info["height"] / 100) ** 2)
    health_status = "正常" if 18.5 <= bmi <= 24.9 else "需要关注"
    workout_consistency = calculate_consistency(monthly_frequency)
    portrait_data = {
        "member_id": member_id,
        "total_workouts": total_workouts,
        "avg_duration": round(avg_duration, 2) if avg_duration else 0,
        "preferred_time": preferred_time,
        "equipment_preference": [{"equipment": item["equipment_type"], "usage_count": item["count"]} for item in equipment_preference],
        "monthly_frequency": [{"month": item["month"], "count": item["count"]} for item in monthly_frequency],
        "fitness_level": fitness_level,
        "bmi": round(bmi, 2),
        "health_status": health_status,
        "workout_consistency": workout_consistency
    }
    return portrait_data

def behavior_preference_analysis():
    workout_df = spark.sql("SELECT * FROM workout_records")
    equipment_df = spark.sql("SELECT * FROM equipment_usage")
    time_preference = workout_df.groupBy("workout_time_slot").agg(count("*").alias("workout_count"), avg("workout_duration").alias("avg_duration")).orderBy(desc("workout_count"))
    equipment_popularity = equipment_df.groupBy("equipment_type").agg(count("*").alias("usage_count"), avg("usage_duration").alias("avg_usage_time")).orderBy(desc("usage_count"))
    weekly_pattern = workout_df.withColumn("weekday", date_format("workout_date", "EEEE")).groupBy("weekday").count().orderBy(desc("count"))
    age_behavior = workout_df.join(spark.sql("SELECT member_id, age FROM fitness_members"), "member_id").withColumn("age_group", when(col("age") < 25, "青年").when(col("age") < 40, "中年").otherwise("中老年")).groupBy("age_group").agg(avg("workout_duration").alias("avg_duration"), count("*").alias("workout_count"))
    seasonal_trend = workout_df.withColumn("season", when(month("workout_date").isin([12, 1, 2]), "冬季").when(month("workout_date").isin([3, 4, 5]), "春季").when(month("workout_date").isin([6, 7, 8]), "夏季").otherwise("秋季")).groupBy("season").count().orderBy(desc("count"))
    duration_distribution = workout_df.withColumn("duration_range", when(col("workout_duration") < 30, "短时锻炼").when(col("workout_duration") < 60, "中等时长").otherwise("长时锻炼")).groupBy("duration_range").count()
    member_retention = calculate_member_retention(workout_df)
    peak_hours = workout_df.withColumn("hour", hour("workout_start_time")).groupBy("hour").count().orderBy(desc("count")).limit(3)
    preference_data = {
        "time_preference": [{"time_slot": row["workout_time_slot"], "count": row["workout_count"], "avg_duration": round(row["avg_duration"], 2)} for row in time_preference.collect()],
        "equipment_popularity": [{"equipment": row["equipment_type"], "usage_count": row["usage_count"], "avg_time": round(row["avg_usage_time"], 2)} for row in equipment_popularity.collect()],
        "weekly_pattern": [{"weekday": row["weekday"], "count": row["count"]} for row in weekly_pattern.collect()],
        "age_behavior": [{"age_group": row["age_group"], "avg_duration": round(row["avg_duration"], 2), "count": row["workout_count"]} for row in age_behavior.collect()],
        "seasonal_trend": [{"season": row["season"], "count": row["count"]} for row in seasonal_trend.collect()],
        "duration_distribution": [{"range": row["duration_range"], "count": row["count"]} for row in duration_distribution.collect()],
        "member_retention": member_retention,
        "peak_hours": [{"hour": row["hour"], "count": row["count"]} for row in peak_hours.collect()]
    }
    return preference_data

def workout_effect_analysis(member_id, start_date, end_date):
    workout_df = spark.sql("SELECT * FROM workout_records WHERE member_id = '{}' AND workout_date BETWEEN '{}' AND '{}'".format(member_id, start_date, end_date))
    health_df = spark.sql("SELECT * FROM health_indicators WHERE member_id = '{}' AND record_date BETWEEN '{}' AND '{}'".format(member_id, start_date, end_date))
    equipment_df = spark.sql("SELECT * FROM equipment_usage WHERE member_id = '{}' AND usage_date BETWEEN '{}' AND '{}'".format(member_id, start_date, end_date))
    total_workouts = workout_df.count()
    total_duration = workout_df.agg(sum("workout_duration")).collect()[0][0] or 0
    total_calories = workout_df.agg(sum("calories_burned")).collect()[0][0] or 0
    avg_heart_rate = workout_df.agg(avg("avg_heart_rate")).collect()[0][0]
    workout_frequency = total_workouts / ((datetime.strptime(end_date, "%Y-%m-%d") - datetime.strptime(start_date, "%Y-%m-%d")).days / 7)
    strength_progress = calculate_strength_progress(equipment_df.filter(col("equipment_category") == "力量训练"))
    cardio_progress = calculate_cardio_progress(equipment_df.filter(col("equipment_category") == "有氧训练"))
    weight_change = calculate_weight_change(health_df)
    body_fat_change = calculate_body_fat_change(health_df)
    muscle_mass_change = calculate_muscle_mass_change(health_df)
    weekly_progress = workout_df.withColumn("week", weekofyear("workout_date")).groupBy("week").agg(count("*").alias("weekly_workouts"), sum("workout_duration").alias("weekly_duration"), sum("calories_burned").alias("weekly_calories")).orderBy("week")
    improvement_rate = calculate_improvement_rate(workout_df, health_df)
    consistency_score = calculate_workout_consistency_score(workout_df)
    goal_achievement = calculate_goal_achievement(member_id, workout_df, health_df)
    effect_data = {
        "member_id": member_id,
        "analysis_period": {"start_date": start_date, "end_date": end_date},
        "total_workouts": total_workouts,
        "total_duration": total_duration,
        "total_calories": total_calories,
        "avg_heart_rate": round(avg_heart_rate, 2) if avg_heart_rate else 0,
        "workout_frequency": round(workout_frequency, 2),
        "strength_progress": strength_progress,
        "cardio_progress": cardio_progress,
        "weight_change": weight_change,
        "body_fat_change": body_fat_change,
        "muscle_mass_change": muscle_mass_change,
        "weekly_progress": [{"week": row["week"], "workouts": row["weekly_workouts"], "duration": row["weekly_duration"], "calories": row["weekly_calories"]} for row in weekly_progress.collect()],
        "improvement_rate": improvement_rate,
        "consistency_score": consistency_score,
        "goal_achievement": goal_achievement
    }
    return effect_data

def calculate_fitness_level(avg_duration, total_workouts, age):
    if avg_duration >= 60 and total_workouts >= 20:
        return "高级"
    elif avg_duration >= 45 and total_workouts >= 10:
        return "中级"
    else:
        return "初级"

def calculate_consistency(monthly_data):
    if len(monthly_data) < 2:
        return "数据不足"
    counts = [item["count"] for item in monthly_data]
    std_dev = np.std(counts)
    return "稳定" if std_dev < 3 else "不稳定"

def calculate_member_retention(workout_df):
    last_month = workout_df.filter(col("workout_date") >= date_sub(current_date(), 30))
    active_members = last_month.select("member_id").distinct().count()
    total_members = workout_df.select("member_id").distinct().count()
    return round((active_members / total_members) * 100, 2) if total_members > 0 else 0

def calculate_strength_progress(strength_df):
    if strength_df.count() == 0:
        return {"progress": "无数据", "improvement": 0}
    first_record = strength_df.orderBy("usage_date").first()
    last_record = strength_df.orderBy(desc("usage_date")).first()
    improvement = ((last_record["weight_used"] - first_record["weight_used"]) / first_record["weight_used"]) * 100
    return {"progress": "提升" if improvement > 0 else "维持", "improvement": round(improvement, 2)}

def calculate_cardio_progress(cardio_df):
    if cardio_df.count() == 0:
        return {"progress": "无数据", "improvement": 0}
    avg_early = cardio_df.orderBy("usage_date").limit(5).agg(avg("duration")).collect()[0][0]
    avg_recent = cardio_df.orderBy(desc("usage_date")).limit(5).agg(avg("duration")).collect()[0][0]
    improvement = ((avg_recent - avg_early) / avg_early) * 100 if avg_early > 0 else 0
    return {"progress": "提升" if improvement > 0 else "维持", "improvement": round(improvement, 2)}

def calculate_weight_change(health_df):
    if health_df.count() < 2:
        return {"change": "数据不足", "value": 0}
    first_weight = health_df.orderBy("record_date").first()["weight"]
    last_weight = health_df.orderBy(desc("record_date")).first()["weight"]
    change = last_weight - first_weight
    return {"change": "减少" if change < 0 else "增加" if change > 0 else "维持", "value": round(abs(change), 2)}

def calculate_body_fat_change(health_df):
    if health_df.count() < 2:
        return {"change": "数据不足", "value": 0}
    first_bf = health_df.orderBy("record_date").first()["body_fat_rate"]
    last_bf = health_df.orderBy(desc("record_date")).first()["body_fat_rate"]
    change = last_bf - first_bf
    return {"change": "减少" if change < 0 else "增加" if change > 0 else "维持", "value": round(abs(change), 2)}

def calculate_muscle_mass_change(health_df):
    if health_df.count() < 2:
        return {"change": "数据不足", "value": 0}
    first_mm = health_df.orderBy("record_date").first()["muscle_mass"]
    last_mm = health_df.orderBy(desc("record_date")).first()["muscle_mass"]
    change = last_mm - first_mm
    return {"change": "增加" if change > 0 else "减少" if change < 0 else "维持", "value": round(abs(change), 2)}

def calculate_improvement_rate(workout_df, health_df):
    workout_improvement = workout_df.count() / 30 if workout_df.count() > 0 else 0
    health_records = health_df.count()
    return round(min(workout_improvement * 10 + health_records * 5, 100), 2)

def calculate_workout_consistency_score(workout_df):
    total_days = workout_df.select("workout_date").distinct().count()
    total_workouts = workout_df.count()
    return round((total_days / total_workouts) * 100, 2) if total_workouts > 0 else 0

def calculate_goal_achievement(member_id, workout_df, health_df):
    weekly_goal = 3
    actual_weekly = workout_df.count() / 4
    achievement_rate = (actual_weekly / weekly_goal) * 100 if weekly_goal > 0 else 0
    return {"rate": round(min(achievement_rate, 100), 2), "status": "已达成" if achievement_rate >= 100 else "未达成"}

基于大数据的健身房会员锻炼数据分析系统文档展示

在这里插入图片描述

💖💖作者:计算机毕业设计江挽 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目