【大数据】拉勾网计算机岗位招聘数据分析系统 计算机毕业设计项目 Hadoop+Spark环境配置 数据科学与大数据技术 附源码+文档+讲解

36 阅读6分钟

一、个人简介

💖💖作者:计算机编程果茶熊 💙💙个人简介:曾长期从事计算机专业培训教学,担任过编程老师,同时本人也热爱上课教学,擅长Java、微信小程序、Python、Golang、安卓Android等多个IT方向。会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 计算机毕业设计选题 💕💕文末获取源码联系计算机编程果茶熊

二、系统介绍

大数据框架:Hadoop+Spark(Hive需要定制修改) 开发语言:Java+Python(两个版本都支持) 数据库:MySQL 后端框架:SpringBoot(Spring+SpringMVC+Mybatis)+Django(两个版本都支持) 前端:Vue+Echarts+HTML+CSS+JavaScript+jQuery

拉勾网计算机岗位招聘数据分析系统是一个基于大数据技术的招聘信息智能分析平台,采用Hadoop+Spark分布式计算框架处理海量招聘数据,通过Python语言结合Django后端框架构建稳定的服务架构。系统前端采用Vue+ElementUI+Echarts技术栈,为用户提供直观友好的交互界面和丰富的数据可视化展示。在数据处理层面,系统充分利用Hadoop分布式文件系统HDFS存储大规模招聘数据,运用Spark和Spark SQL进行高效的数据清洗、转换和分析计算,结合Pandas和NumPy进行精细化的数据处理和统计分析。系统核心功能涵盖用户管理、招聘数据管理、薪酬水平分析、岗位需求分析、就业市场分析、热门职位分析以及可视化大屏展示,通过MySQL数据库存储结构化数据,为用户提供全方位的计算机岗位招聘市场洞察和决策支持,帮助求职者了解市场趋势,协助企业制定合理的招聘策略。

三、视频解说

拉勾网计算机岗位招聘数据分析系统

四、部分功能展示

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

五、部分代码展示


from pyspark.sql import SparkSession
from pyspark.sql.functions import col, avg, count, desc, asc, when, sum as spark_sum
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, FloatType
import pandas as pd
import numpy as np
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
import json

spark = SparkSession.builder.appName("LaGouRecruitmentAnalysis").config("spark.sql.adaptive.enabled", "true").config("spark.sql.adaptive.coalescePartitions.enabled", "true").getOrCreate()

def salary_analysis(request):
    df = spark.read.option("header", "true").csv("hdfs://localhost:9000/recruitment_data/salary_data.csv")
    df = df.withColumn("min_salary", col("min_salary").cast(IntegerType())).withColumn("max_salary", col("max_salary").cast(IntegerType())).withColumn("avg_salary", (col("min_salary") + col("max_salary")) / 2)
    city_salary_stats = df.groupBy("city").agg(avg("avg_salary").alias("avg_salary"), count("*").alias("job_count")).orderBy(desc("avg_salary"))
    position_salary_stats = df.groupBy("position_name").agg(avg("avg_salary").alias("avg_salary"), count("*").alias("job_count")).filter(col("job_count") >= 10).orderBy(desc("avg_salary"))
    experience_salary_stats = df.groupBy("work_experience").agg(avg("avg_salary").alias("avg_salary"), count("*").alias("job_count")).orderBy(desc("avg_salary"))
    company_size_salary = df.groupBy("company_size").agg(avg("avg_salary").alias("avg_salary"), count("*").alias("job_count")).orderBy(desc("avg_salary"))
    salary_distribution = df.select(when(col("avg_salary") < 8000, "0-8k").when((col("avg_salary") >= 8000) & (col("avg_salary") < 15000), "8k-15k").when((col("avg_salary") >= 15000) & (col("avg_salary") < 25000), "15k-25k").when((col("avg_salary") >= 25000) & (col("avg_salary") < 40000), "25k-40k").otherwise("40k+").alias("salary_range")).groupBy("salary_range").count().orderBy("salary_range")
    education_salary_stats = df.groupBy("education_requirement").agg(avg("avg_salary").alias("avg_salary"), count("*").alias("job_count")).orderBy(desc("avg_salary"))
    city_salary_pandas = city_salary_stats.toPandas()
    position_salary_pandas = position_salary_stats.toPandas()
    experience_salary_pandas = experience_salary_stats.toPandas()
    company_size_pandas = company_size_salary.toPandas()
    salary_dist_pandas = salary_distribution.toPandas()
    education_salary_pandas = education_salary_stats.toPandas()
    trend_analysis = df.groupBy("publish_date").agg(avg("avg_salary").alias("daily_avg_salary"), count("*").alias("daily_job_count")).orderBy("publish_date")
    trend_pandas = trend_analysis.toPandas()
    correlation_matrix = df.select("avg_salary", "work_experience", "education_requirement").toPandas()
    correlation_result = correlation_matrix.corr()
    result_data = {"city_salary": city_salary_pandas.to_dict('records'), "position_salary": position_salary_pandas.to_dict('records'), "experience_salary": experience_salary_pandas.to_dict('records'), "company_size_salary": company_size_pandas.to_dict('records'), "salary_distribution": salary_dist_pandas.to_dict('records'), "education_salary": education_salary_pandas.to_dict('records'), "salary_trend": trend_pandas.to_dict('records'), "correlation_analysis": correlation_result.to_dict()}
    return JsonResponse(result_data)

def job_demand_analysis(request):
    df = spark.read.option("header", "true").csv("hdfs://localhost:9000/recruitment_data/job_demand_data.csv")
    df = df.withColumn("job_count", col("job_count").cast(IntegerType())).withColumn("view_count", col("view_count").cast(IntegerType()))
    skill_demand = df.groupBy("skill_tags").agg(spark_sum("job_count").alias("total_demand"), count("*").alias("company_count")).orderBy(desc("total_demand"))
    city_demand = df.groupBy("city").agg(spark_sum("job_count").alias("total_jobs"), count("*").alias("company_count"), avg("job_count").alias("avg_jobs_per_company")).orderBy(desc("total_jobs"))
    position_demand = df.groupBy("position_category").agg(spark_sum("job_count").alias("total_demand"), count("*").alias("company_count"), avg("view_count").alias("avg_views")).orderBy(desc("total_demand"))
    company_type_demand = df.groupBy("company_type").agg(spark_sum("job_count").alias("total_jobs"), count("*").alias("company_count")).orderBy(desc("total_jobs"))
    experience_demand = df.groupBy("experience_level").agg(spark_sum("job_count").alias("total_demand"), count("*").alias("position_count")).orderBy(desc("total_demand"))
    education_demand = df.groupBy("education_requirement").agg(spark_sum("job_count").alias("total_demand"), count("*").alias("position_count")).orderBy(desc("total_demand"))
    monthly_demand_trend = df.groupBy("publish_month").agg(spark_sum("job_count").alias("monthly_demand"), count("*").alias("monthly_positions")).orderBy("publish_month")
    hot_skills = df.filter(col("job_count") > 50).groupBy("skill_tags").agg(avg("view_count").alias("avg_attention"), spark_sum("job_count").alias("demand_volume")).orderBy(desc("avg_attention"))
    demand_growth = df.groupBy("publish_quarter").agg(spark_sum("job_count").alias("quarterly_demand")).orderBy("publish_quarter")
    growth_rate = demand_growth.toPandas()
    growth_rate['growth_rate'] = growth_rate['quarterly_demand'].pct_change() * 100
    skill_demand_pandas = skill_demand.toPandas()
    city_demand_pandas = city_demand.toPandas()
    position_demand_pandas = position_demand.toPandas()
    company_type_pandas = company_type_demand.toPandas()
    experience_demand_pandas = experience_demand.toPandas()
    education_demand_pandas = education_demand.toPandas()
    monthly_trend_pandas = monthly_demand_trend.toPandas()
    hot_skills_pandas = hot_skills.toPandas()
    regional_analysis = df.groupBy("region", "city").agg(spark_sum("job_count").alias("regional_demand")).orderBy(desc("regional_demand"))
    regional_pandas = regional_analysis.toPandas()
    result_data = {"skill_demand": skill_demand_pandas.to_dict('records'), "city_demand": city_demand_pandas.to_dict('records'), "position_demand": position_demand_pandas.to_dict('records'), "company_type_demand": company_type_pandas.to_dict('records'), "experience_demand": experience_demand_pandas.to_dict('records'), "education_demand": education_demand_pandas.to_dict('records'), "monthly_trend": monthly_trend_pandas.to_dict('records'), "hot_skills": hot_skills_pandas.to_dict('records'), "growth_analysis": growth_rate.to_dict('records'), "regional_analysis": regional_pandas.to_dict('records')}
    return JsonResponse(result_data)

def hot_position_analysis(request):
    df = spark.read.option("header", "true").csv("hdfs://localhost:9000/recruitment_data/hot_position_data.csv")
    df = df.withColumn("application_count", col("application_count").cast(IntegerType())).withColumn("view_count", col("view_count").cast(IntegerType())).withColumn("salary_avg", col("salary_avg").cast(FloatType()))
    hot_positions = df.filter(col("application_count") > 100).groupBy("position_name").agg(spark_sum("application_count").alias("total_applications"), avg("salary_avg").alias("avg_salary"), spark_sum("view_count").alias("total_views"), count("*").alias("job_openings")).orderBy(desc("total_applications"))
    competition_ratio = df.withColumn("competition_ratio", col("application_count") / col("view_count") * 100).groupBy("position_name").agg(avg("competition_ratio").alias("avg_competition"), spark_sum("application_count").alias("total_apps"), spark_sum("view_count").alias("total_views")).filter(col("total_views") > 1000).orderBy(desc("avg_competition"))
    emerging_positions = df.filter(col("publish_date") >= "2024-01-01").groupBy("position_name").agg(count("*").alias("new_openings"), avg("salary_avg").alias("avg_salary"), spark_sum("application_count").alias("total_interest")).filter(col("new_openings") >= 5).orderBy(desc("total_interest"))
    city_hot_positions = df.groupBy("city", "position_name").agg(spark_sum("application_count").alias("city_applications"), avg("salary_avg").alias("city_avg_salary")).filter(col("city_applications") > 50).orderBy(desc("city_applications"))
    skill_popularity = df.groupBy("required_skills").agg(spark_sum("application_count").alias("skill_applications"), count("*").alias("positions_requiring"), avg("salary_avg").alias("skill_avg_salary")).filter(col("positions_requiring") >= 10).orderBy(desc("skill_applications"))
    company_hot_positions = df.groupBy("company_name").agg(spark_sum("application_count").alias("company_total_apps"), count("*").alias("total_positions"), avg("salary_avg").alias("company_avg_salary")).filter(col("total_positions") >= 5).orderBy(desc("company_total_apps"))
    position_trend = df.groupBy("publish_month", "position_name").agg(spark_sum("application_count").alias("monthly_applications")).orderBy("publish_month", desc("monthly_applications"))
    salary_vs_popularity = df.groupBy("salary_range").agg(avg("application_count").alias("avg_applications"), count("*").alias("position_count")).orderBy(desc("avg_applications"))
    experience_hot_positions = df.groupBy("experience_requirement").agg(spark_sum("application_count").alias("exp_total_apps"), avg("salary_avg").alias("exp_avg_salary"), count("*").alias("exp_position_count")).orderBy(desc("exp_total_apps"))
    hot_positions_pandas = hot_positions.toPandas()
    competition_pandas = competition_ratio.toPandas()
    emerging_pandas = emerging_positions.toPandas()
    city_hot_pandas = city_hot_positions.toPandas()
    skill_popularity_pandas = skill_popularity.toPandas()
    company_hot_pandas = company_hot_positions.toPandas()
    trend_pandas = position_trend.toPandas()
    salary_popularity_pandas = salary_vs_popularity.toPandas()
    experience_hot_pandas = experience_hot_positions.toPandas()
    market_heat_index = df.select((col("application_count") * 0.4 + col("view_count") * 0.3 + col("salary_avg") * 0.3).alias("heat_index"), "position_name", "city").groupBy("position_name", "city").agg(avg("heat_index").alias("market_heat")).orderBy(desc("market_heat"))
    heat_index_pandas = market_heat_index.toPandas()
    result_data = {"hot_positions": hot_positions_pandas.to_dict('records'), "competition_analysis": competition_pandas.to_dict('records'), "emerging_positions": emerging_pandas.to_dict('records'), "city_hot_positions": city_hot_pandas.to_dict('records'), "skill_popularity": skill_popularity_pandas.to_dict('records'), "company_hot_positions": company_hot_pandas.to_dict('records'), "position_trend": trend_pandas.to_dict('records'), "salary_vs_popularity": salary_popularity_pandas.to_dict('records'), "experience_hot_positions": experience_hot_pandas.to_dict('records'), "market_heat_index": heat_index_pandas.to_dict('records')}
    return JsonResponse(result_data)

六、部分文档展示

在这里插入图片描述

七、END

💕💕文末获取源码联系计算机编程果茶熊