豆瓣电影数据可视化分析设计与实现 | 大四狗的救命稻草:基于Spark+Django的豆瓣电影数据可视化分析系统实战经验分享

66 阅读5分钟

💖💖作者:计算机毕业设计杰瑞 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学校实战项目 计算机毕业设计选题推荐

豆瓣电影数据可视化分析设计与实现介绍

基于Spark+Django的豆瓣电影数据可视化分析系统是一个集大数据处理、数据分析与可视化展示于一体的综合性平台。该系统采用Hadoop+Spark作为核心大数据处理框架,通过分布式计算能力对海量豆瓣电影数据进行高效处理与分析,后端基于Django框架构建RESTful API接口,前端采用Vue+ElementUI+Echarts技术栈实现数据的交互式可视化展示。系统主要功能模块包括用户管理、电影信息管理、动作电影专项分析、动漫信息统计、敏感词过滤以及系统管理等核心模块。通过Spark SQL对存储在MySQL数据库中的电影数据进行多维度统计分析,利用Pandas和NumPy进行数据预处理与特征提取,最终通过Echarts图表库以柱状图、饼图、折线图等形式直观展现电影评分分布、类型偏好、上映趋势等关键指标。整个系统架构清晰,技术栈成熟稳定,不仅能够处理大规模电影数据的实时分析需求,同时为用户提供了友好的数据查询与可视化交互体验。

豆瓣电影数据可视化分析设计与实现演示视频

演示视频

豆瓣电影数据可视化分析设计与实现演示图片

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

豆瓣电影数据可视化分析设计与实现代码展示

from pyspark.sql import SparkSession
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
from .models import MovieInfo, MovieAnalysis
import json
import pandas as pd

spark = SparkSession.builder.appName("DoubanMovieAnalysis").config("spark.sql.adaptive.enabled", "true").getOrCreate()

@csrf_exempt
def movie_data_analysis(request):
    movie_data = MovieInfo.objects.all().values()
    df = pd.DataFrame(list(movie_data))
    spark_df = spark.createDataFrame(df)
    spark_df.createOrReplaceTempView("movies")
    rating_distribution = spark.sql("SELECT CASE WHEN rating >= 9.0 THEN '优秀' WHEN rating >= 7.0 THEN '良好' WHEN rating >= 5.0 THEN '一般' ELSE '较差' END as rating_level, COUNT(*) as count FROM movies GROUP BY rating_level ORDER BY count DESC").collect()
    genre_analysis = spark.sql("SELECT genre, AVG(rating) as avg_rating, COUNT(*) as movie_count FROM movies WHERE genre IS NOT NULL GROUP BY genre ORDER BY avg_rating DESC").collect()
    year_trend = spark.sql("SELECT YEAR(release_date) as year, COUNT(*) as count, AVG(rating) as avg_rating FROM movies WHERE release_date IS NOT NULL GROUP BY YEAR(release_date) ORDER BY year").collect()
    top_movies = spark.sql("SELECT title, rating, director, genre FROM movies WHERE rating >= 8.5 ORDER BY rating DESC LIMIT 20").collect()
    rating_data = [{"rating_level": row.rating_level, "count": row.count} for row in rating_distribution]
    genre_data = [{"genre": row.genre, "avg_rating": float(row.avg_rating), "movie_count": row.movie_count} for row in genre_analysis]
    year_data = [{"year": row.year, "count": row.count, "avg_rating": float(row.avg_rating)} for row in year_trend]
    top_data = [{"title": row.title, "rating": float(row.rating), "director": row.director, "genre": row.genre} for row in top_movies]
    analysis_result = {"rating_distribution": rating_data, "genre_analysis": genre_data, "year_trend": year_data, "top_movies": top_data}
    return JsonResponse({"success": True, "data": analysis_result})

@csrf_exempt
def visualization_statistics(request):
    if request.method == 'POST':
        params = json.loads(request.body)
        chart_type = params.get('chart_type', 'rating')
        movie_queryset = MovieInfo.objects.all().values()
        movies_df = pd.DataFrame(list(movie_queryset))
        spark_movies = spark.createDataFrame(movies_df)
        spark_movies.createOrReplaceTempView("movie_stats")
        if chart_type == 'rating':
            chart_data = spark.sql("SELECT FLOOR(rating) as rating_range, COUNT(*) as count FROM movie_stats WHERE rating IS NOT NULL GROUP BY FLOOR(rating) ORDER BY rating_range").collect()
            result_data = [{"name": f"{row.rating_range}分", "value": row.count} for row in chart_data]
        elif chart_type == 'genre':
            chart_data = spark.sql("SELECT genre, COUNT(*) as count FROM movie_stats WHERE genre IS NOT NULL GROUP BY genre ORDER BY count DESC LIMIT 10").collect()
            result_data = [{"name": row.genre, "value": row.count} for row in chart_data]
        elif chart_type == 'director':
            chart_data = spark.sql("SELECT director, COUNT(*) as movie_count, AVG(rating) as avg_rating FROM movie_stats WHERE director IS NOT NULL GROUP BY director HAVING COUNT(*) >= 3 ORDER BY avg_rating DESC LIMIT 15").collect()
            result_data = [{"name": row.director, "value": row.movie_count, "rating": float(row.avg_rating)} for row in chart_data]
        else:
            chart_data = spark.sql("SELECT DATE_FORMAT(release_date, 'yyyy-MM') as month, COUNT(*) as count FROM movie_stats WHERE release_date IS NOT NULL GROUP BY DATE_FORMAT(release_date, 'yyyy-MM') ORDER BY month DESC LIMIT 12").collect()
            result_data = [{"name": row.month, "value": row.count} for row in chart_data]
        return JsonResponse({"success": True, "chart_data": result_data, "chart_type": chart_type})

@csrf_exempt
def sensitive_word_filter(request):
    if request.method == 'POST':
        data = json.loads(request.body)
        content = data.get('content', '')
        movie_comments = MovieInfo.objects.filter(description__icontains=content).values('title', 'description', 'rating')
        comments_df = pd.DataFrame(list(movie_comments))
        if not comments_df.empty:
            spark_comments = spark.createDataFrame(comments_df)
            spark_comments.createOrReplaceTempView("comments_table")
            sensitive_keywords = ['暴力', '血腥', '恐怖', '政治', '宗教', '敏感', '禁止', '违法']
            keyword_conditions = " OR ".join([f"description LIKE '%{keyword}%'" for keyword in sensitive_keywords])
            filtered_comments = spark.sql(f"SELECT title, description, rating, CASE WHEN {keyword_conditions} THEN 'high' WHEN description LIKE '%争议%' OR description LIKE '%不当%' THEN 'medium' ELSE 'low' END as risk_level FROM comments_table").collect()
            high_risk_movies = spark.sql(f"SELECT COUNT(*) as count FROM comments_table WHERE {keyword_conditions}").collect()[0].count
            risk_distribution = spark.sql(f"SELECT CASE WHEN {keyword_conditions} THEN 'high' WHEN description LIKE '%争议%' OR description LIKE '%不当%' THEN 'medium' ELSE 'low' END as risk_level, COUNT(*) as count FROM comments_table GROUP BY risk_level").collect()
            keyword_frequency = {}
            for keyword in sensitive_keywords:
                count = spark.sql(f"SELECT COUNT(*) as count FROM comments_table WHERE description LIKE '%{keyword}%'").collect()[0].count
                if count > 0:
                    keyword_frequency[keyword] = count
            filtered_data = [{"title": row.title, "description": row.description[:100], "rating": float(row.rating), "risk_level": row.risk_level} for row in filtered_comments]
            risk_stats = [{"level": row.risk_level, "count": row.count} for row in risk_distribution]
            return JsonResponse({"success": True, "filtered_movies": filtered_data, "high_risk_count": high_risk_movies, "risk_distribution": risk_stats, "keyword_frequency": keyword_frequency})
        else:
            return JsonResponse({"success": False, "message": "未找到相关电影数据"})

豆瓣电影数据可视化分析设计与实现文档展示

在这里插入图片描述

💖💖作者:计算机毕业设计杰瑞 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学校实战项目 计算机毕业设计选题推荐