计算机专业的痛只有自己懂:基于Hadoop+Spark的大学生就业数据分析系统成救星

85 阅读7分钟

💖💖作者:计算机毕业设计小途 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目

@TOC

大学生毕业就业数据分析与可视化系统介绍

《基于大数据的大学生毕业就业数据分析与可视化系统》是一个综合运用现代大数据技术栈构建的数据分析平台,该系统采用Hadoop分布式文件系统(HDFS)作为底层存储架构,结合Spark大数据处理引擎实现海量就业数据的快速计算和分析处理。系统后端基于Spring Boot微服务框架,整合SpringMVC和MyBatis实现业务逻辑层和数据持久层的高效协作,前端采用Vue.js响应式框架搭配ElementUI组件库构建用户界面,通过ECharts数据可视化库实现丰富的图表展示效果。系统核心功能模块包括系统首页展示、个人中心管理、用户权限控制、大学生毕业就业数据的统一管理,以及功能强大的可视化大屏分析模块,能够从就业概况统计、专业前景分析、学历水平对比、就业因素关联等多个维度对毕业生就业数据进行深度挖掘和可视化呈现。系统利用Spark SQL进行复杂数据查询,结合Pandas和NumPy进行数据预处理和统计分析,通过MySQL数据库存储结构化数据,最终形成一个集数据采集、存储、处理、分析、展示于一体的完整大数据应用解决方案,为高校就业指导部门和相关决策者提供科学的数据支撑和直观的分析结果。

大学生毕业就业数据分析与可视化系统演示视频

演示视频

大学生毕业就业数据分析与可视化系统演示图片

大学生毕业就业数据管理.png

登陆界面.png

就业概况统计分析.png

就业因素关联分析.png

数据大屏.png

学历水平就业分析.png

用户管理.png

专业前景就业分析.png

大学生毕业就业数据分析与可视化系统代码展示

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, count, avg, sum, when, desc, asc, year, month
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, FloatType
import pandas as pd
import numpy as np
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
import json

spark = SparkSession.builder.appName("GraduateEmploymentAnalysis").config("spark.sql.adaptive.enabled", "true").config("spark.sql.adaptive.coalescePartitions.enabled", "true").getOrCreate()

@csrf_exempt
def employment_overview_analysis(request):
   employment_df = spark.read.jdbc(url="jdbc:mysql://localhost:3306/employment_db", table="graduate_employment_data", properties={"user": "root", "password": "password", "driver": "com.mysql.cj.jdbc.Driver"})
   total_graduates = employment_df.count()
   employed_graduates = employment_df.filter(col("employment_status") == "已就业").count()
   unemployed_graduates = employment_df.filter(col("employment_status") == "未就业").count()
   further_study = employment_df.filter(col("employment_status") == "升学深造").count()
   employment_rate = round((employed_graduates / total_graduates) * 100, 2) if total_graduates > 0 else 0
   further_study_rate = round((further_study / total_graduates) * 100, 2) if total_graduates > 0 else 0
   salary_stats = employment_df.filter(col("employment_status") == "已就业").select(avg("salary").alias("avg_salary"), col("salary").alias("max_salary"), col("salary").alias("min_salary")).collect()
   avg_salary = round(salary_stats[0]['avg_salary'], 2) if salary_stats and salary_stats[0]['avg_salary'] else 0
   monthly_trend = employment_df.groupBy(month(col("graduation_date")).alias("month")).agg(count("*").alias("graduate_count"), sum(when(col("employment_status") == "已就业", 1).otherwise(0)).alias("employed_count")).orderBy("month").collect()
   trend_data = [{"month": row['month'], "graduate_count": row['graduate_count'], "employed_count": row['employed_count'], "employment_rate": round((row['employed_count'] / row['graduate_count']) * 100, 2)} for row in monthly_trend]
   industry_distribution = employment_df.filter(col("employment_status") == "已就业").groupBy("industry").agg(count("*").alias("count")).orderBy(desc("count")).limit(10).collect()
   industry_data = [{"industry": row['industry'], "count": row['count'], "percentage": round((row['count'] / employed_graduates) * 100, 2)} for row in industry_distribution]
   result_data = {"total_graduates": total_graduates, "employed_graduates": employed_graduates, "unemployed_graduates": unemployed_graduates, "further_study": further_study, "employment_rate": employment_rate, "further_study_rate": further_study_rate, "avg_salary": avg_salary, "monthly_trend": trend_data, "industry_distribution": industry_data}
   return JsonResponse({"status": "success", "data": result_data})

@csrf_exempt
def major_employment_analysis(request):
   employment_df = spark.read.jdbc(url="jdbc:mysql://localhost:3306/employment_db", table="graduate_employment_data", properties={"user": "root", "password": "password", "driver": "com.mysql.cj.jdbc.Driver"})
   major_stats = employment_df.groupBy("major").agg(count("*").alias("total_count"), sum(when(col("employment_status") == "已就业", 1).otherwise(0)).alias("employed_count"), sum(when(col("employment_status") == "升学深造", 1).otherwise(0)).alias("further_study_count"), avg(when(col("employment_status") == "已就业", col("salary")).otherwise(None)).alias("avg_salary")).collect()
   major_analysis_data = []
   for row in major_stats:
       total = row['total_count']
       employed = row['employed_count']
       further_study = row['further_study_count']
       employment_rate = round((employed / total) * 100, 2) if total > 0 else 0
       further_study_rate = round((further_study / total) * 100, 2) if total > 0 else 0
       avg_salary = round(row['avg_salary'], 2) if row['avg_salary'] else 0
       major_analysis_data.append({"major": row['major'], "total_graduates": total, "employed_count": employed, "employment_rate": employment_rate, "further_study_rate": further_study_rate, "avg_salary": avg_salary})
   major_analysis_data.sort(key=lambda x: x['employment_rate'], reverse=True)
   top_majors = major_analysis_data[:10]
   major_salary_comparison = employment_df.filter(col("employment_status") == "已就业").groupBy("major").agg(avg("salary").alias("avg_salary"), count("*").alias("sample_size")).filter(col("sample_size") >= 10).orderBy(desc("avg_salary")).limit(15).collect()
   salary_comparison_data = [{"major": row['major'], "avg_salary": round(row['avg_salary'], 2), "sample_size": row['sample_size']} for row in major_salary_comparison]
   major_industry_distribution = employment_df.filter(col("employment_status") == "已就业").groupBy("major", "industry").agg(count("*").alias("count")).collect()
   industry_dist_data = {}
   for row in major_industry_distribution:
       major = row['major']
       if major not in industry_dist_data:
           industry_dist_data[major] = []
       industry_dist_data[major].append({"industry": row['industry'], "count": row['count']})
   result_data = {"major_employment_stats": top_majors, "salary_comparison": salary_comparison_data, "major_industry_distribution": industry_dist_data}
   return JsonResponse({"status": "success", "data": result_data})

@csrf_exempt
def employment_factor_correlation_analysis(request):
   employment_df = spark.read.jdbc(url="jdbc:mysql://localhost:3306/employment_db", table="graduate_employment_data", properties={"user": "root", "password": "password", "driver": "com.mysql.cj.jdbc.Driver"})
   education_employment = employment_df.groupBy("education_level").agg(count("*").alias("total_count"), sum(when(col("employment_status") == "已就业", 1).otherwise(0)).alias("employed_count"), avg(when(col("employment_status") == "已就业", col("salary")).otherwise(None)).alias("avg_salary")).collect()
   education_analysis = [{"education_level": row['education_level'], "total_count": row['total_count'], "employed_count": row['employed_count'], "employment_rate": round((row['employed_count'] / row['total_count']) * 100, 2), "avg_salary": round(row['avg_salary'], 2) if row['avg_salary'] else 0} for row in education_employment]
   gender_employment = employment_df.groupBy("gender").agg(count("*").alias("total_count"), sum(when(col("employment_status") == "已就业", 1).otherwise(0)).alias("employed_count"), avg(when(col("employment_status") == "已就业", col("salary")).otherwise(None)).alias("avg_salary")).collect()
   gender_analysis = [{"gender": row['gender'], "total_count": row['total_count'], "employed_count": row['employed_count'], "employment_rate": round((row['employed_count'] / row['total_count']) * 100, 2), "avg_salary": round(row['avg_salary'], 2) if row['avg_salary'] else 0} for row in gender_employment]
   region_employment = employment_df.groupBy("graduation_region").agg(count("*").alias("total_count"), sum(when(col("employment_status") == "已就业", 1).otherwise(0)).alias("employed_count"), avg(when(col("employment_status") == "已就业", col("salary")).otherwise(None)).alias("avg_salary")).orderBy(desc("employed_count")).limit(20).collect()
   region_analysis = [{"region": row['graduation_region'], "total_count": row['total_count'], "employed_count": row['employed_count'], "employment_rate": round((row['employed_count'] / row['total_count']) * 100, 2), "avg_salary": round(row['avg_salary'], 2) if row['avg_salary'] else 0} for row in region_employment]
   skill_correlation = employment_df.filter(col("employment_status") == "已就业").groupBy("skill_level").agg(count("*").alias("count"), avg("salary").alias("avg_salary")).orderBy("skill_level").collect()
   skill_analysis = [{"skill_level": row['skill_level'], "count": row['count'], "avg_salary": round(row['avg_salary'], 2)} for row in skill_correlation]
   internship_correlation = employment_df.groupBy("internship_experience").agg(count("*").alias("total_count"), sum(when(col("employment_status") == "已就业", 1).otherwise(0)).alias("employed_count")).collect()
   internship_analysis = [{"internship_experience": row['internship_experience'], "total_count": row['total_count'], "employed_count": row['employed_count'], "employment_rate": round((row['employed_count'] / row['total_count']) * 100, 2)} for row in internship_correlation]
   pandas_df = employment_df.select("salary", "skill_level", "internship_experience", "education_level").filter(col("employment_status") == "已就业").toPandas()
   correlation_matrix = pandas_df.select_dtypes(include=[np.number]).corr().to_dict()
   result_data = {"education_analysis": education_analysis, "gender_analysis": gender_analysis, "region_analysis": region_analysis, "skill_analysis": skill_analysis, "internship_analysis": internship_analysis, "correlation_matrix": correlation_matrix}
   return JsonResponse({"status": "success", "data": result_data})

大学生毕业就业数据分析与可视化系统文档展示

文档.png

💖💖作者:计算机毕业设计小途 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目