一、个人简介
💖💖作者:计算机编程果茶熊 💙💙个人简介:曾长期从事计算机专业培训教学,担任过编程老师,同时本人也热爱上课教学,擅长Java、微信小程序、Python、Golang、安卓Android等多个IT方向。会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 计算机毕业设计选题 💕💕文末获取源码联系计算机编程果茶熊
二、系统介绍
大数据框架:Hadoop+Spark(Hive需要定制修改) 开发语言:Java+Python(两个版本都支持) 数据库:MySQL 后端框架:SpringBoot(Spring+SpringMVC+Mybatis)+Django(两个版本都支持) 前端:Vue+Echarts+HTML+CSS+JavaScript+jQuery
基于大数据的1688商品类目关系分析与可视化系统是一个集数据采集、处理、分析、可视化于一体的综合性平台。该系统采用Hadoop分布式存储架构和Spark大数据计算引擎,通过Python开发语言结合Django Web框架构建后端服务,前端采用Vue.js配合ElementUI和ECharts实现用户交互界面。系统核心功能包括商品类目数据管理、全站宏观结构分析、核心类目特征分析、类目层级关系分析以及类目关联模式分析等模块。通过HDFS分布式文件系统存储海量商品类目数据,利用Spark SQL进行高效的数据查询和计算,结合Pandas和NumPy进行数据预处理和统计分析。系统能够深度挖掘1688平台商品类目之间的复杂关联关系,通过多维度数据分析揭示商品分类的内在规律,并通过直观的图表展示分析结果,为电商平台的商品管理和运营决策提供数据支撑。整个系统架构清晰,技术栈完整,既体现了大数据技术的实际应用,也展现了现代Web开发的技术水平。
三、基于大数据的1688商品类目关系分析与可视化系统-视频解说
大数据技术毕设选题迷茫?1688商品类目分析系统Python+Django完整方案
四、基于大数据的1688商品类目关系分析与可视化系统-功能展示
五、基于大数据的1688商品类目关系分析与可视化系统-代码展示
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, count, sum, avg, desc, asc, when, isnan, isnull
from django.http import JsonResponse
import pandas as pd
import numpy as np
import json
spark = SparkSession.builder.appName("1688CategoryAnalysis").config("spark.sql.adaptive.enabled", "true").config("spark.sql.adaptive.coalescePartitions.enabled", "true").getOrCreate()
def analyze_macro_structure(request):
try:
category_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/category_db").option("dbtable", "category_data").option("user", "root").option("password", "123456").load()
category_df.createOrReplaceTempView("categories")
total_categories = spark.sql("SELECT COUNT(*) as total FROM categories").collect()[0]["total"]
level_distribution = spark.sql("SELECT level, COUNT(*) as count FROM categories GROUP BY level ORDER BY level").collect()
parent_child_stats = spark.sql("SELECT parent_id, COUNT(*) as child_count FROM categories WHERE parent_id IS NOT NULL GROUP BY parent_id").collect()
avg_children_per_parent = spark.sql("SELECT AVG(child_count) as avg_children FROM (SELECT parent_id, COUNT(*) as child_count FROM categories WHERE parent_id IS NOT NULL GROUP BY parent_id)").collect()[0]["avg_children"]
top_level_categories = spark.sql("SELECT category_name, COUNT(*) as subcategory_count FROM categories WHERE level = 1 GROUP BY category_name ORDER BY subcategory_count DESC LIMIT 10").collect()
category_depth = spark.sql("SELECT MAX(level) as max_depth FROM categories").collect()[0]["max_depth"]
leaf_categories = spark.sql("SELECT COUNT(*) as leaf_count FROM categories c1 WHERE NOT EXISTS (SELECT 1 FROM categories c2 WHERE c2.parent_id = c1.category_id)").collect()[0]["leaf_count"]
branching_factor = spark.sql("SELECT category_id, (SELECT COUNT(*) FROM categories c2 WHERE c2.parent_id = categories.category_id) as branch_count FROM categories WHERE level < 3").collect()
category_name_length_stats = spark.sql("SELECT AVG(LENGTH(category_name)) as avg_name_length, MAX(LENGTH(category_name)) as max_name_length, MIN(LENGTH(category_name)) as min_name_length FROM categories").collect()[0]
active_vs_inactive = spark.sql("SELECT status, COUNT(*) as count FROM categories GROUP BY status").collect()
result_data = {"total_categories": total_categories, "level_distribution": [{"level": row["level"], "count": row["count"]} for row in level_distribution], "avg_children_per_parent": float(avg_children_per_parent) if avg_children_per_parent else 0, "top_level_categories": [{"name": row["category_name"], "count": row["subcategory_count"]} for row in top_level_categories], "category_depth": category_depth, "leaf_categories": leaf_categories, "avg_name_length": float(category_name_length_stats["avg_name_length"]), "active_vs_inactive": [{"status": row["status"], "count": row["count"]} for row in active_vs_inactive]}
return JsonResponse({"status": "success", "data": result_data})
except Exception as e:
return JsonResponse({"status": "error", "message": str(e)})
def analyze_core_category_features(request):
try:
category_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/category_db").option("dbtable", "category_data").option("user", "root").option("password", "123456").load()
product_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/category_db").option("dbtable", "product_data").option("user", "root").option("password", "123456").load()
category_df.createOrReplaceTempView("categories")
product_df.createOrReplaceTempView("products")
category_product_stats = spark.sql("SELECT c.category_id, c.category_name, COUNT(p.product_id) as product_count, AVG(p.price) as avg_price, SUM(p.sales_volume) as total_sales FROM categories c LEFT JOIN products p ON c.category_id = p.category_id GROUP BY c.category_id, c.category_name").collect()
high_volume_categories = spark.sql("SELECT c.category_name, COUNT(p.product_id) as product_count FROM categories c JOIN products p ON c.category_id = p.category_id GROUP BY c.category_name HAVING COUNT(p.product_id) > 1000 ORDER BY product_count DESC").collect()
price_range_analysis = spark.sql("SELECT c.category_name, MIN(p.price) as min_price, MAX(p.price) as max_price, AVG(p.price) as avg_price, STDDEV(p.price) as price_stddev FROM categories c JOIN products p ON c.category_id = p.category_id GROUP BY c.category_name").collect()
sales_performance = spark.sql("SELECT c.category_name, SUM(p.sales_volume) as total_sales, AVG(p.sales_volume) as avg_sales, COUNT(p.product_id) as product_count FROM categories c JOIN products p ON c.category_id = p.category_id GROUP BY c.category_name ORDER BY total_sales DESC LIMIT 20").collect()
category_growth_trend = spark.sql("SELECT c.category_name, DATE_FORMAT(p.create_time, 'yyyy-MM') as month, COUNT(p.product_id) as monthly_new_products FROM categories c JOIN products p ON c.category_id = p.category_id WHERE p.create_time >= DATE_SUB(CURRENT_DATE, INTERVAL 12 MONTH) GROUP BY c.category_name, DATE_FORMAT(p.create_time, 'yyyy-MM') ORDER BY month").collect()
supplier_distribution = spark.sql("SELECT c.category_name, COUNT(DISTINCT p.supplier_id) as unique_suppliers, COUNT(p.product_id) as total_products, COUNT(DISTINCT p.supplier_id) / COUNT(p.product_id) as supplier_diversity_ratio FROM categories c JOIN products p ON c.category_id = p.category_id GROUP BY c.category_name").collect()
category_competition_index = spark.sql("SELECT c.category_name, COUNT(DISTINCT p.supplier_id) as supplier_count, COUNT(p.product_id) as product_count, AVG(p.price) as avg_price, (COUNT(DISTINCT p.supplier_id) * 1.0 / COUNT(p.product_id)) as competition_ratio FROM categories c JOIN products p ON c.category_id = p.category_id GROUP BY c.category_name HAVING COUNT(p.product_id) > 50").collect()
seasonal_analysis = spark.sql("SELECT c.category_name, MONTH(p.create_time) as month, COUNT(p.product_id) as monthly_products, SUM(p.sales_volume) as monthly_sales FROM categories c JOIN products p ON c.category_id = p.category_id GROUP BY c.category_name, MONTH(p.create_time) ORDER BY c.category_name, month").collect()
result_data = {"category_product_stats": [{"category_id": row["category_id"], "name": row["category_name"], "product_count": row["product_count"], "avg_price": float(row["avg_price"]) if row["avg_price"] else 0, "total_sales": row["total_sales"] if row["total_sales"] else 0} for row in category_product_stats], "high_volume_categories": [{"name": row["category_name"], "count": row["product_count"]} for row in high_volume_categories], "price_analysis": [{"name": row["category_name"], "min_price": float(row["min_price"]) if row["min_price"] else 0, "max_price": float(row["max_price"]) if row["max_price"] else 0, "avg_price": float(row["avg_price"]) if row["avg_price"] else 0} for row in price_range_analysis], "sales_performance": [{"name": row["category_name"], "total_sales": row["total_sales"], "avg_sales": float(row["avg_sales"]) if row["avg_sales"] else 0} for row in sales_performance], "supplier_distribution": [{"name": row["category_name"], "unique_suppliers": row["unique_suppliers"], "diversity_ratio": float(row["supplier_diversity_ratio"]) if row["supplier_diversity_ratio"] else 0} for row in supplier_distribution]}
return JsonResponse({"status": "success", "data": result_data})
except Exception as e:
return JsonResponse({"status": "error", "message": str(e)})
def analyze_category_association_patterns(request):
try:
category_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/category_db").option("dbtable", "category_data").option("user", "root").option("password", "123456").load()
product_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/category_db").option("dbtable", "product_data").option("user", "root").option("password", "123456").load()
order_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/category_db").option("dbtable", "order_items").option("user", "root").option("password", "123456").load()
category_df.createOrReplaceTempView("categories")
product_df.createOrReplaceTempView("products")
order_df.createOrReplaceTempView("order_items")
co_purchase_analysis = spark.sql("SELECT c1.category_name as category_a, c2.category_name as category_b, COUNT(*) as co_occurrence_count, COUNT(*) * 1.0 / (SELECT COUNT(*) FROM order_items) as association_strength FROM order_items oi1 JOIN order_items oi2 ON oi1.order_id = oi2.order_id AND oi1.product_id != oi2.product_id JOIN products p1 ON oi1.product_id = p1.product_id JOIN products p2 ON oi2.product_id = p2.product_id JOIN categories c1 ON p1.category_id = c1.category_id JOIN categories c2 ON p2.category_id = c2.category_id WHERE c1.category_id < c2.category_id GROUP BY c1.category_name, c2.category_name HAVING COUNT(*) > 10 ORDER BY co_occurrence_count DESC LIMIT 50").collect()
similar_categories = spark.sql("SELECT c1.category_name as category_a, c2.category_name as category_b, ABS(AVG(p1.price) - AVG(p2.price)) as price_difference, CORR(p1.sales_volume, p2.sales_volume) as sales_correlation FROM products p1 JOIN products p2 ON p1.category_id != p2.category_id JOIN categories c1 ON p1.category_id = c1.category_id JOIN categories c2 ON p2.category_id = c2.category_id GROUP BY c1.category_name, c2.category_name HAVING CORR(p1.sales_volume, p2.sales_volume) > 0.5").collect()
category_transition_patterns = spark.sql("SELECT c1.category_name as from_category, c2.category_name as to_category, COUNT(*) as transition_count FROM (SELECT oi1.product_id as prod1, oi2.product_id as prod2, oi1.order_id FROM order_items oi1 JOIN order_items oi2 ON oi1.order_id = oi2.order_id WHERE oi1.product_id != oi2.product_id) transitions JOIN products p1 ON transitions.prod1 = p1.product_id JOIN products p2 ON transitions.prod2 = p2.product_id JOIN categories c1 ON p1.category_id = c1.category_id JOIN categories c2 ON p2.category_id = c2.category_id GROUP BY c1.category_name, c2.category_name HAVING COUNT(*) > 5 ORDER BY transition_count DESC").collect()
category_cluster_analysis = spark.sql("SELECT c.category_name, AVG(p.price) as avg_price, AVG(p.sales_volume) as avg_sales, COUNT(p.product_id) as product_count, COUNT(DISTINCT p.supplier_id) as supplier_count, CASE WHEN AVG(p.price) > 500 AND AVG(p.sales_volume) > 100 THEN 'high_value_high_volume' WHEN AVG(p.price) > 500 AND AVG(p.sales_volume) <= 100 THEN 'high_value_low_volume' WHEN AVG(p.price) <= 500 AND AVG(p.sales_volume) > 100 THEN 'low_value_high_volume' ELSE 'low_value_low_volume' END as cluster_type FROM categories c JOIN products p ON c.category_id = p.category_id GROUP BY c.category_name").collect()
cross_category_popularity = spark.sql("SELECT c1.category_name, c2.category_name as related_category, COUNT(*) as cross_appearances, AVG(p1.price + p2.price) as combined_avg_price FROM order_items oi1 JOIN order_items oi2 ON oi1.order_id = oi2.order_id AND oi1.product_id != oi2.product_id JOIN products p1 ON oi1.product_id = p1.product_id JOIN products p2 ON oi2.product_id = p2.product_id JOIN categories c1 ON p1.category_id = c1.category_id JOIN categories c2 ON p2.category_id = c2.category_id WHERE c1.category_id != c2.category_id GROUP BY c1.category_name, c2.category_name ORDER BY cross_appearances DESC LIMIT 100").collect()
seasonal_association = spark.sql("SELECT c1.category_name, c2.category_name, MONTH(oi1.order_time) as month, COUNT(*) as monthly_associations FROM order_items oi1 JOIN order_items oi2 ON oi1.order_id = oi2.order_id AND oi1.product_id != oi2.product_id JOIN products p1 ON oi1.product_id = p1.product_id JOIN products p2 ON oi2.product_id = p2.product_id JOIN categories c1 ON p1.category_id = c1.category_id JOIN categories c2 ON p2.category_id = c2.category_id WHERE c1.category_id != c2.category_id GROUP BY c1.category_name, c2.category_name, MONTH(oi1.order_time) HAVING COUNT(*) > 3").collect()
result_data = {"co_purchase_patterns": [{"category_a": row["category_a"], "category_b": row["category_b"], "co_occurrence": row["co_occurrence_count"], "strength": float(row["association_strength"])} for row in co_purchase_analysis], "similar_categories": [{"category_a": row["category_a"], "category_b": row["category_b"], "price_diff": float(row["price_difference"]) if row["price_difference"] else 0, "correlation": float(row["sales_correlation"]) if row["sales_correlation"] else 0} for row in similar_categories], "transition_patterns": [{"from_category": row["from_category"], "to_category": row["to_category"], "count": row["transition_count"]} for row in category_transition_patterns], "cluster_analysis": [{"category": row["category_name"], "avg_price": float(row["avg_price"]) if row["avg_price"] else 0, "avg_sales": float(row["avg_sales"]) if row["avg_sales"] else 0, "cluster_type": row["cluster_type"]} for row in category_cluster_analysis], "cross_popularity": [{"category": row["category_name"], "related": row["related_category"], "appearances": row["cross_appearances"]} for row in cross_category_popularity]}
return JsonResponse({"status": "success", "data": result_data})
except Exception as e:
return JsonResponse({"status": "error", "message": str(e)})
六、基于大数据的1688商品类目关系分析与可视化系统-文档展示
七、END
💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 计算机毕业设计选题 💕💕文末获取源码联系计算机编程果茶熊