【数据分析】基于大数据的股市行情数据可视化分析平台 | 大数据毕设选题推荐 大数据实战项目 可视化分析大屏 Hadoop SPark java Python

50 阅读7分钟

💖💖作者:计算机毕业设计杰瑞 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学校实战项目 计算机毕业设计选题推荐

基于大数据的股市行情数据可视化分析平台介绍

基于大数据的股市行情数据可视化分析平台是一个集成Hadoop+Spark大数据处理框架的综合性分析系统,专门针对海量股市行情数据进行高效采集、存储、分析与可视化展示。系统采用Python+Django后端框架(同时支持Java+Spring Boot版本),前端使用Vue+ElementUI+Echarts技术栈构建交互界面,通过Hadoop分布式文件系统HDFS实现海量数据存储,利用Spark+Spark SQL进行快速数据处理和分析,结合Pandas和NumPy进行数据清洗与计算。平台提供系统首页、个人信息管理、用户管理、股市行情数据管理等基础功能模块,核心亮点在于大屏可视化展示和数据可视化分析功能,能够将复杂的股票价格走势、成交量变化、市场行情波动等多维度数据通过Echarts图表直观呈现,帮助用户快速洞察市场趋势。系统将大数据技术与金融数据分析场景深度结合,既能处理百万级甚至千万级的历史行情数据,又能通过可视化大屏实时展示分析结果,为股市数据分析提供了完整的技术解决方案。

基于大数据的股市行情数据可视化分析平台演示视频

演示视频

基于大数据的股市行情数据可视化分析平台演示图片

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

基于大数据的股市行情数据可视化分析平台代码展示

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, avg, sum, max, min, count, window, lag, lead, when
from pyspark.sql.window import Window
from django.http import JsonResponse
from django.views import View
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import json

spark = SparkSession.builder.appName("StockDataAnalysis").config("spark.sql.warehouse.dir", "/user/hive/warehouse").config("spark.executor.memory", "2g").config("spark.driver.memory", "1g").getOrCreate()

class StockDataProcessView(View):
    def post(self, request):
        stock_code = request.POST.get('stock_code')
        start_date = request.POST.get('start_date')
        end_date = request.POST.get('end_date')
        hdfs_path = f"hdfs://localhost:9000/stock_data/{stock_code}/*.csv"
        df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load(hdfs_path)
        df = df.filter((col("trade_date") >= start_date) & (col("trade_date") <= end_date))
        df = df.withColumn("price_change", col("close_price") - col("open_price"))
        df = df.withColumn("price_change_rate", (col("close_price") - col("open_price")) / col("open_price") * 100)
        df = df.withColumn("amount", col("volume") * col("close_price"))
        window_spec = Window.partitionBy("stock_code").orderBy("trade_date")
        df = df.withColumn("prev_close", lag("close_price", 1).over(window_spec))
        df = df.withColumn("ma5", avg("close_price").over(window_spec.rowsBetween(-4, 0)))
        df = df.withColumn("ma10", avg("close_price").over(window_spec.rowsBetween(-9, 0)))
        df = df.withColumn("ma20", avg("close_price").over(window_spec.rowsBetween(-19, 0)))
        df = df.na.fill(0, subset=["prev_close", "ma5", "ma10", "ma20"])
        processed_data = df.select("trade_date", "open_price", "close_price", "high_price", "low_price", "volume", "price_change", "price_change_rate", "amount", "ma5", "ma10", "ma20").orderBy("trade_date")
        result_list = [row.asDict() for row in processed_data.collect()]
        pandas_df = pd.DataFrame(result_list)
        pandas_df['trade_date'] = pd.to_datetime(pandas_df['trade_date'])
        pandas_df = pandas_df.sort_values('trade_date')
        pandas_df['volume_ma5'] = pandas_df['volume'].rolling(window=5).mean()
        pandas_df['price_volatility'] = (pandas_df['high_price'] - pandas_df['low_price']) / pandas_df['open_price'] * 100
        correlation_matrix = pandas_df[['close_price', 'volume', 'price_change_rate']].corr()
        statistics = {
            'max_price': float(pandas_df['close_price'].max()),
            'min_price': float(pandas_df['close_price'].min()),
            'avg_price': float(pandas_df['close_price'].mean()),
            'total_volume': int(pandas_df['volume'].sum()),
            'avg_volume': float(pandas_df['volume'].mean()),
            'max_change_rate': float(pandas_df['price_change_rate'].max()),
            'min_change_rate': float(pandas_df['price_change_rate'].min()),
            'positive_days': int((pandas_df['price_change'] > 0).sum()),
            'negative_days': int((pandas_df['price_change'] < 0).sum()),
            'correlation': correlation_matrix.to_dict()
        }
        final_result = {
            'code': 200,
            'message': '股市数据处理成功',
            'data': json.loads(pandas_df.to_json(orient='records', date_format='iso')),
            'statistics': statistics
        }
        return JsonResponse(final_result)

class VisualAnalysisView(View):
    def get(self, request):
        analysis_type = request.GET.get('type')
        time_range = request.GET.get('range', '30')
        hdfs_path = "hdfs://localhost:9000/stock_data/all/*.csv"
        df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load(hdfs_path)
        end_date = datetime.now()
        start_date = end_date - timedelta(days=int(time_range))
        df = df.filter((col("trade_date") >= start_date.strftime('%Y-%m-%d')) & (col("trade_date") <= end_date.strftime('%Y-%m-%d')))
        if analysis_type == 'market_overview':
            daily_stats = df.groupBy("trade_date").agg(
                count("stock_code").alias("stock_count"),
                sum("volume").alias("total_volume"),
                sum(col("volume") * col("close_price")).alias("total_amount"),
                avg("price_change_rate").alias("avg_change_rate")
            ).orderBy("trade_date")
            rise_fall_stats = df.groupBy("trade_date").agg(
                sum(when(col("close_price") > col("open_price"), 1).otherwise(0)).alias("rise_count"),
                sum(when(col("close_price") < col("open_price"), 1).otherwise(0)).alias("fall_count"),
                sum(when(col("close_price") == col("open_price"), 1).otherwise(0)).alias("flat_count")
            ).orderBy("trade_date")
            daily_data = daily_stats.join(rise_fall_stats, "trade_date")
            result_data = [row.asDict() for row in daily_data.collect()]
        elif analysis_type == 'sector_analysis':
            sector_stats = df.groupBy("sector").agg(
                count("stock_code").alias("stock_count"),
                avg("close_price").alias("avg_price"),
                sum("volume").alias("total_volume"),
                avg("price_change_rate").alias("avg_change_rate"),
                max("close_price").alias("max_price"),
                min("close_price").alias("min_price")
            ).orderBy(col("avg_change_rate").desc())
            result_data = [row.asDict() for row in sector_stats.collect()]
        elif analysis_type == 'volume_analysis':
            volume_rank = df.groupBy("stock_code", "stock_name").agg(
                sum("volume").alias("total_volume"),
                avg("volume").alias("avg_volume"),
                sum(col("volume") * col("close_price")).alias("total_amount")
            ).orderBy(col("total_volume").desc()).limit(50)
            result_data = [row.asDict() for row in volume_rank.collect()]
        elif analysis_type == 'price_distribution':
            price_ranges = df.withColumn("price_range",
                when(col("close_price") < 10, "0-10")
                .when((col("close_price") >= 10) & (col("close_price") < 20), "10-20")
                .when((col("close_price") >= 20) & (col("close_price") < 50), "20-50")
                .when((col("close_price") >= 50) & (col("close_price") < 100), "50-100")
                .otherwise("100+")
            )
            distribution = price_ranges.groupBy("price_range").agg(
                count("stock_code").alias("stock_count"),
                avg("volume").alias("avg_volume")
            ).orderBy("price_range")
            result_data = [row.asDict() for row in distribution.collect()]
        pandas_result = pd.DataFrame(result_data)
        if not pandas_result.empty and 'trade_date' in pandas_result.columns:
            pandas_result['trade_date'] = pd.to_datetime(pandas_result['trade_date'])
            pandas_result = pandas_result.sort_values('trade_date')
        numpy_array = pandas_result.select_dtypes(include=[np.number]).values
        if numpy_array.size > 0:
            trend_analysis = {
                'mean': float(np.mean(numpy_array, axis=0)[0]) if numpy_array.shape[1] > 0 else 0,
                'std': float(np.std(numpy_array, axis=0)[0]) if numpy_array.shape[1] > 0 else 0,
                'max': float(np.max(numpy_array)),
                'min': float(np.min(numpy_array))
            }
        else:
            trend_analysis = {'mean': 0, 'std': 0, 'max': 0, 'min': 0}
        response_data = {
            'code': 200,
            'message': '可视化数据分析完成',
            'type': analysis_type,
            'data': json.loads(pandas_result.to_json(orient='records', date_format='iso')),
            'trend': trend_analysis
        }
        return JsonResponse(response_data)

class BigScreenDataView(View):
    def get(self, request):
        hdfs_realtime_path = "hdfs://localhost:9000/stock_data/realtime/*.csv"
        hdfs_history_path = "hdfs://localhost:9000/stock_data/history/*.csv"
        realtime_df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load(hdfs_realtime_path)
        history_df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load(hdfs_history_path)
        today = datetime.now().strftime('%Y-%m-%d')
        today_data = realtime_df.filter(col("trade_date") == today)
        market_summary = today_data.agg(
            count("stock_code").alias("total_stocks"),
            sum(when(col("price_change_rate") > 0, 1).otherwise(0)).alias("rise_count"),
            sum(when(col("price_change_rate") < 0, 1).otherwise(0)).alias("fall_count"),
            sum("volume").alias("total_volume"),
            sum(col("volume") * col("close_price")).alias("total_amount"),
            avg("price_change_rate").alias("avg_change_rate")
        ).collect()[0].asDict()
        top_gainers = today_data.select("stock_code", "stock_name", "close_price", "price_change_rate", "volume").orderBy(col("price_change_rate").desc()).limit(10)
        top_losers = today_data.select("stock_code", "stock_name", "close_price", "price_change_rate", "volume").orderBy(col("price_change_rate").asc()).limit(10)
        top_volume = today_data.select("stock_code", "stock_name", "close_price", "volume", "price_change_rate").orderBy(col("volume").desc()).limit(10)
        last_30_days = (datetime.now() - timedelta(days=30)).strftime('%Y-%m-%d')
        trend_data = history_df.filter((col("trade_date") >= last_30_days) & (col("trade_date") <= today))
        daily_trend = trend_data.groupBy("trade_date").agg(
            avg("close_price").alias("avg_price"),
            sum("volume").alias("daily_volume"),
            sum(col("volume") * col("close_price")).alias("daily_amount"),
            count(when(col("price_change_rate") > 0, 1)).alias("rise_count")
        ).orderBy("trade_date")
        sector_performance = today_data.groupBy("sector").agg(
            count("stock_code").alias("count"),
            avg("price_change_rate").alias("avg_change"),
            sum("volume").alias("sector_volume")
        ).orderBy(col("avg_change").desc())
        window_spec = Window.partitionBy("stock_code").orderBy("trade_date")
        hot_stocks = trend_data.withColumn("volume_change", (col("volume") - lag("volume", 1).over(window_spec)) / lag("volume", 1).over(window_spec) * 100)
        hot_ranking = hot_stocks.groupBy("stock_code", "stock_name").agg(
            avg("volume_change").alias("avg_volume_change"),
            sum("volume").alias("period_volume"),
            avg("price_change_rate").alias("avg_price_change")
        ).orderBy(col("avg_volume_change").desc()).limit(20)
        gainers_list = [row.asDict() for row in top_gainers.collect()]
        losers_list = [row.asDict() for row in top_losers.collect()]
        volume_list = [row.asDict() for row in top_volume.collect()]
        trend_list = [row.asDict() for row in daily_trend.collect()]
        sector_list = [row.asDict() for row in sector_performance.collect()]
        hot_list = [row.asDict() for row in hot_ranking.collect()]
        trend_df = pd.DataFrame(trend_list)
        if not trend_df.empty:
            trend_df['trade_date'] = pd.to_datetime(trend_df['trade_date'])
            trend_df = trend_df.sort_values('trade_date')
            trend_df['volume_ma5'] = trend_df['daily_volume'].rolling(window=5).mean()
            trend_df['price_ma5'] = trend_df['avg_price'].rolling(window=5).mean()
        big_screen_result = {
            'code': 200,
            'message': '大屏数据加载成功',
            'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
            'market_summary': market_summary,
            'top_gainers': gainers_list,
            'top_losers': losers_list,
            'top_volume': volume_list,
            'daily_trend': json.loads(trend_df.to_json(orient='records', date_format='iso')) if not trend_df.empty else trend_list,
            'sector_performance': sector_list,
            'hot_stocks': hot_list
        }
        return JsonResponse(big_screen_result)

基于大数据的股市行情数据可视化分析平台文档展示

在这里插入图片描述

💖💖作者:计算机毕业设计杰瑞 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学校实战项目 计算机毕业设计选题推荐