Google Merchandise Store Customer Behavior Analysis
Google Merchandise Store电子商务网站客户行为分析
Topic: User Path Analysis (Identification and Optimization of Conversion and Loss Critical Paths)
主题:用户路径分析(转化与流失关键路径识别与优化) Background and Purpose
背景与目的
This analysis based on Google Analytic Sample Dataset, it described customer behavior on an ecommerce website. This website belongs to ecommerce website, customers would transform from viewing to consuming based on the pages of the website.
该分析基于谷歌分析样本数据集,数据集描述了电子商务网站上的客户行为。这个网站属于电子商务平台,客户会根据网站上的页面进行从浏览到消费的过程。
Analysis Method
分析手段
Customer Path Feature Analysis
用户路径特征分析
Based on Sankey Diagram and Funnel Chart, recognize the high incoming path (like “yt” to “about”) and high outcoming path (like “about” page has 10% loss rate). Combined with transform probability rate, it was proved that the “yt” to “about” to “copyright” had the highest rate link, and it was steay.
通过Sankey图与漏斗图,识别高流路路径(如“yt”-“about”)及高流失节点(如“about”页面流失率10%)。结合转移概率矩阵,验证路径“yt”-“about”-“copyright”为最高概率链路并具有稳定性。
Transform and loss connection analysis
转化与流失关联分析
Analysis customers has important page at stay time, jump rate and subsequent actions; judgement the page content and user motivation comparation (like “about” page had lacked of transforming guide).
分析用户在关键页面的停留时间、跳出率与后续行为的关系;
评估页面内容与用户动机的匹配度(如“about”页面是否缺乏转化引导)。 Data Describe
数据概述
Data resource:
数据来源
Important data field:
关键字段
Basic indicators:
基本字段
User Id
用户标识:
fullvisitId:用户唯一标识,用于追踪用户行为;user unique indicator, used to tracing user behavior
visitId:访问唯一标识,用于区分单次会话。visit unique indicator, used to distinguish every call.
Visit data
行为数据:
totals:visits:用户访问次数;user visit times
totals:hits:用户浏览页面数;user viewing page times
totals:timeOnSite:用户在页面停留时间(单位:秒)。Stay time on user visiting pages
Path data
路径数据:
trafficSource:refferalPath:页面跳转路径(假设字段,用于推断用户路径);pages jumping path
device:browser:用户设备类型(如Chrome、Safari)。user supplement class
Data handling:
Samples: 60 rows (In the future, it would expand more samples)
Clean steps: de duplication, de empty, and outlier detection
Hypothesis testing: because of path information loss, according to trafficSource:refferalPath as path inference basis ,need to supplement verification with ****session logs.
数据处理:
样本量:60条(需说明样本代表性,建议后续扩大样本量);
清洗步骤:去重、去空、异常值检查;
假设验证:因路径信息缺失,采用trafficSource:refferalPath作为路径推断依据,需结合会话日志补充验证。
Analysis indicators and construction
分析指标和构建
Core indicators:
核心指标:
Transform rate: user transform rate from viewing to consuming
转化率:用户从浏览到下单的最终转化比例
Loss rate: important rate (like “about” page) about user jumping rate
流失率:关键节点(如“about”页面)的用户跳出比例
Path comprehensive rate: Proportion of high-frequency path (like”yt” to “about”) traffic
路径集中度:高频路径(如”yt” to “about”)流量占比
Transform rate: the jump rate between pages (like “yt” to “about” rate is one)
转移概率:页面间的跳转概率(如”yt” to “about”概率为1)
Indicators describe
指标描述
In the beginning, it was obvious that customers would love to visit this website and preferred to stay on any website for a while. Customers always visited any page at least ones (Figure 1), and almost every customer viewed anyone page 3.77 times (Figure 2), and every customer spent 296.10 seconds on current page (Figure 3). It could due to the website setting, but also could mean that before viewing this website, customers already knew a few about the website or felt empathy for this website.
首先,客户会很明显喜欢访问这个网站,并愿意在任何网页上停留一段时间。客户总是至少访问了一次某个页面(图1),几乎每个客户都查看了任意一个页面3.77次(图2),每个客户在当前页面上花费了296.10秒(图3)。这可能是由于网站设置造成的,但也可能意味着在查看此网站之前,客户已经对该网站有了一些了解或对该网站产生了移情。
Figure 1 Customer visit times (the x-axis is the user number)
Figure 2 Customer view any page times (the x-axis is the user number)
Figure 3 Customer spend seconds (the x-axis is the user number)
Visualize Tool
可视化工具
Sankey Diagram (Figure 4): Display the distribution of user path traffic;
Sankey图(图4):展示用户路径流量分布;
Funnel Chart (Figure 5): Quantify the loss rate of each link
漏斗图(图5):量化各环节流失率;
In customer path and conversion analysis, trafficSource:refferalPath field was used to assume that the user page jumps because we could not find another field described customers viewing behavior. In this data field, we can define the every page could be a link from viewing step to consumer step, and customers have a single move for every step, which means it would be significant to describe the customers behavior on this website. In Sankey Diagram (Figure 4), “yt” page has most incoming flow, which means “yt” page having most customer viewed this page; “about” page has most outcoming, which means “about” page having most customer from this page transforming to other pages. And the “yt” to “about” is the most important path. It could be an assumption that this path is the common access paths for customer on the website. In Funnel Chart (Figure 5), it also found “yt” and “about” had highest loss rate about current pages at 10%. This situation could lead to another assumption that customer would not love to step to next link and did not have motivation about next link. The suggestion is adding more promotions or activities in this link or supplement advertising for the next link or provide rewards.
在客户路径和转换分析中,trafficSource:refferalPath字段用于假设用户页面路径,由于我们找不到另一个描述客户浏览行为的字段。在这个字段中,我们可以明确的定义每个页面都可以代表从浏览到消费的环节,并且客户在网站上的每个步骤拥有单一动作。在Sankey图(图4)中,“yt”页面有最多的传入流量,这意味着“yt“页面有最多客户查看了此页面;“about”页面有最多的传出流量,这意味着“about”页有最多的客户从这个页面转移到其他页面的。而“yt”到“about”是最重要的路径。可以假设此路径是客户在网站上的常见访问路径。在漏斗图(图5)中,它还发现“yt”和“about”在当前页面上的流失率最高,为10%。这种情况可能会导致另一种假设,即客户不喜欢跳到下一个链接,也没有下一个连接的动机。建议在此链接中添加更多促销或活动,或为下一个链接补充广告或提供奖励。
Figure 4 Sankey Diagram
Figure 5 Funnel Chart
Transition probability matrix analysis
跳转转化概率矩阵分析
Based on the transition probability matrix, it found “analytics” to “web”, “items” to “c10b14f9a69ff71b1b7a”, “permissions” to “using-the-logo.html” and “yt” to “about” had the highest transition probability equal to one, and ”about” to “copyright” equals to 0.5, which show those five links were the important links in this website, that described the customers had most interests and familiar with those links. This behavior might lead to other links and finally transfer to place an order. It also shows that the highest transition probability was “yt” to “about” to “copyright” link, which means this link had the high-click-through-rate and was convenience and accuracy for most customers in consumer behavior.
基于路径转移概率矩阵,它发现“analytics”到“web”、“items”到“c10b14f9a69ff71b1b7a”、“permissions”和“yt”到“about”的转移概率最高,等于1,“about”到“copyright”等于0.5,这表明这五个链接是本网站中的重要链接,描述了客户对这些链接最感兴趣和最熟悉的链接。这种行为可能会转移到其他链接,并最终转移到下订单。它还表明,最高的转换概率是“yt”到“about”到“copyright”的链接,这意味着该链接的点击率很高,对大多数客户的消费行为来说是方便和准确的。
Conclusion and Suggestion
结论和建议
From this analysis, it was obvious that the customer behavior has a pattern about viewing certain page which included “yt” and “about” pages, and customers might need more time or more motivation to transfer to consumer step, like provide promotional information. Furthermore, transition probability shows “yt” to “about” to “copyright” link was the most common link which customers prefer to view, but “about” pages losing rate was 10%, it needs to optimize guidance design. There is some suggestion about the certain issue:
1. Accelerate more customers at “about” page before transfer to next page, the objective was trying to bring more customers come to this link, could be bring more activities about referring another customer or involving more consumers participated activities like setting pop-up prompt “Viewing “about” page get more coupons”, which could motivate more customers.
2. Add popular product recommendations on the "yt" page to attract customers to click on the "about" page, which could optimize content matching. This could lead to inspire customers spending more time on “yt” page.
3. Encourage more customers participated in “yt” page activities like “Reward participants with coupons” or “Participate in activities to win product gifts”,which could help to increase more customers stay in this certain link about “yt” to “about” and transfer to consumer behavior, which could increase transform rate.
4. Develop more steps or actions on “about” pages, which could trying to invite customers explore more best-sell products and some time-limited action like “fulfill questionary to get more score”. This could lead to decrease loss rate.
从这一分析中可以明显看出,客户行为具有查看特定页面的模式,其中包括“yt”和“about”页面,客户可能需要更多时间或更多动力才能转移到消费步骤。此外,转换概率显示,从“yt”到“about”再到“copyright”的链接是客户最喜欢查看的链接, 但是“about” 页面的流失率高达10%, 需要引导升级设计。关于这个问题,这里有一些建议:
1.需要在转移到下一页之前,在“about”页面积累更多客户,目的是带来更多的当前连接的客户,可以是带来更多关于推荐另一个客户的活动或引入更多消费者参加的活动,比如设置弹窗页面“浏览”about”页面赢取更多优惠券”,从而鼓励更多的消费者。
- 在“yt”页面增加热门商品推荐,吸引用户点击“about”页面。从而激发消费整花费更多的时间停留在”yt”页面。
3.鼓励更多的客户参与“yt”页面活动,比如“用优惠券奖励参与者”或“参与赢取产品礼品的活动”,这有助于增加更多的客户处于这个关于“yt“到“about”的连接并转移到消费者行为,从而增加转化率。
4.开发更多的环节和动作在”about”页面,可以尝试邀请更多消费者了解更多热销品,并且参与一些如“填写问卷获得积分”的限时活动,从而降低流失率。
最后,我想聊一聊我自己做完这个小案例之后的感想,我认为用户行为分析这套方法不仅仅适用于线上电商或线上零售业,同时针对内容平台和移动应用也有着独特的适用性,通过路径分析可以在内容优化、互动设计、交互功能、奖励机制等方面提供更加清晰和可靠的转化节点依据以及更加明确和可实施的优化建议基础。 在代码实现部分,我利用了AI技术,以半人工半自动化的方式进行了实现,可以适用于数据清洗、矩阵转换、可视化绘图以及梳理逻辑框架等领域,成果还不赖,但是新手还是不能灵活的将AI应用于多个场景,而且受个人代码局限,我的AI助手仍然仅仅能实现逻辑初步框架设计和图表优化,具体实施设计暂时难以优化,还是人工完成,我认为后期可以挑战一下初步设计+需求细化自动化+自动化编码这个思路,欢迎讨论。
欢迎关注“陈留的数据分析小岛”公众号,讨论和合作都可以来找我