本文已参加「新人创作礼」活动,一起开启掘金创作之路。
这次带来的是 RStudio 的 ggplot2 绘图系统的讲解,超详细!!!
文中提到的代码在导入 ggplot2 和 tidyverse 包后,均可直接运行,所有数据集均为 R 自带数据集。
文末还为大家准备了习题和讲解哦,感兴趣的大伙们可以做一做呀!
ggplot2 绘图系统:简介
ggplot2 是一个现代的数据可视化工具,由 Hadley Wickham 在2005年以 R 包的形式发布。ggplot2 实现了统计学家 Leland Wilkinson 提出的图形语法系统 The Grammar of Graphics,这套系统将图形分解为各种语义学元素(比如:标度 scales,图层 layers 等)。ggplot2 已经在全世界范围内成为了非常受欢迎的绘图工具,并且被移植到了其它平台(如 Python)。
ggplot2 的术语
data:定义了数据集mapping:将数据映射到aesthetics美学属性aes:美学属性,包括 x y 坐标,颜色,大小等(x,y,color,fill,alpha,size,linetype)geom_xxx:几何对象,定义了图形类型(geom_point,geom_line,geom_bar,geom_boxplot,...)stat_xxx:统计变换(stat_bin,stat_count,stat_density,stat_identity,...)coord_xxx:坐标系变换(coord_cartesian,coord_polar,coord_flip,...)scale_xxx:调整标度(scale_color_gradient,scale_fill_gradient,scale_x_continuous,...)facet_xxx:分面(facet_wrap,facet_grid,facet_null)guides:调整图例theme:调整主题
ggplot2 绘图系统:图层
ggplot2的图层
ggplot2使用各个图层的叠加来绘制可用的图像.ggplot2的最简结构:数据data+ 映射mapping=aes(...)+ 几何对象geom_xxx
# 如果没有安装ggplot2, 需要用以下命令安装
# install.packages("ggplot2")
# 载入包
library(ggplot2)
# ggplot2的各个图层
# 数据层
ggplot(data = mtcars)
# 数据层 + 映射层
ggplot(data = mtcars) + aes(x = wt, y = mpg)
# 通常可以使用 mapping 参数将这两层合并到一起写:
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg))
# 数据层 + 映射层 + 几何对象
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) + geom_point()
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) + geom_line()
# 数据层 + 映射层 + 几何对象1 + 几何对象2
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) + geom_point() + geom_line()
ggplot2 绘图系统: 几何对象
图形类型(几何对象)
我们常说的 图形类型(散点图,线图,条形图...) 由 ggplot2 的 几何对象(geom_xxx) 来控制,下面我给大家以 R 自带数据集 mtcars 为例,分别讲解各种常用的图形,附代码:
# 常用图形类型
# 散点图
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) +
geom_point()
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) +
geom_point(color = "red", size = 5)
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) +
geom_point(color = "red", size = 5, alpha = 0.3)
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) +
geom_point(color = "red", size = 5, alpha = 0.3, shape = 17)
# 线图,线图没有shape属性,若利用 rgb 表示颜色,出现第七位和第八位代表透明度
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) +
geom_line()
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) +
geom_line(color = "orange")
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) +
geom_line(color = "orange", linetype = 2)
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) +
geom_line(color = "orange", size = 2)
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) +
geom_line(color = "orange", size = 2, alpha = 0.4)
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) +
geom_line(color = rgb(0.8, 0.9, 0.2))
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) +
geom_line(color = "#FF0000")
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) +
geom_line(color = "#FF000044")
# 条形图
ggplot(data = mtcars, mapping = aes(x = cyl)) +
geom_bar()
ggplot(data = mtcars, mapping = aes(x = factor(cyl))) +
geom_bar()
# 注意上面两图中 x=cyl 和 x=factor(cyl) 的区别
ggplot(data = mtcars, mapping = aes(x = factor(cyl))) +
geom_bar(color = "pink")
ggplot(data = mtcars, mapping = aes(x = factor(cyl))) +
geom_bar(fill = "pink")
ggplot(data = mtcars, mapping = aes(x = factor(cyl))) +
geom_bar(fill = "pink", width = 0.2)
# 直方图
ggplot(data = mtcars, mapping = aes(x = mpg)) +
geom_histogram()
ggplot(data = mtcars, mapping = aes(x = mpg)) +
geom_histogram(bins = 6)
ggplot(data = mtcars, mapping = aes(x = mpg)) +
geom_histogram(bins = 6, color = "white")
ggplot(data = mtcars, mapping = aes(x = mpg)) +
geom_histogram(bins = 6, color = "blue", fill = "white")
# 箱线图
ggplot(data = mtcars, mapping = aes(y = mpg)) +
geom_boxplot()
ggplot(data = mtcars, mapping = aes(y = mpg)) +
geom_boxplot(fill = "darkgreen", alpha = 0.4)
# 没有饼图
# 面积图(以后会提到)
ggplot2 绘图系统:映射
mapping 映射 指的是数据集中的各个变量是如何对应到图形元素(x轴,y轴,形状,大小,颜色,透明度)的。
数据集中的变量有多种类型(数值型变量 / 名义变量 / 有序变量),需要用合适的图形元素来表达数据。
# 映射图形元素
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg, color = cyl)) +
geom_point()
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg, color = factor(cyl), size = wt)) +
geom_point()
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) +
geom_line() +
geom_point(aes(color = factor(cyl), size = wt))
ggplot(data = mtcars, mapping = aes(x = factor(cyl))) +
geom_bar()
ggplot(data = mtcars, mapping = aes(x = factor(cyl), fill = factor(cyl))) +
geom_bar()
ggplot(data = mtcars, mapping = aes(x = factor(cyl), fill = cyl)) +
geom_bar() # 注意与 fill = factor(cyl) 的区别
ggplot(data = mtcars, mapping = aes(x = factor(cyl), fill = factor(am))) +
geom_bar()
ggplot(data = mtcars, mapping = aes(x = factor(cyl), fill = factor(am))) +
geom_bar(position = "dodge")
ggplot(data = mtcars, mapping = aes(x = factor(cyl), fill = factor(am))) +
geom_bar(position = "fill")
ggplot(data = mtcars, mapping = aes(x = mpg)) +
geom_histogram(bins = 10, color = "white")
ggplot(data = mtcars, mapping = aes(x = mpg, fill = factor(cyl))) +
geom_histogram(bins = 10, color = "white")
ggplot(data = mtcars, mapping = aes(x = factor(cyl), y = mpg)) +
geom_boxplot()
ggplot(data = mtcars, mapping = aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
geom_boxplot()
ggplot(data = mtcars, mapping = aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
geom_boxplot(alpha = 0.5)
特别的
## 颜色堆积条形图
ggplot(data = mtcars, mapping = aes(x = factor(cyl), fill = factor(am))) +
geom_bar()
## 并不是所有元素都能用position,只有条形属性才可用,并排显示
ggplot(data = mtcars, mapping = aes(x = factor(cyl), fill = factor(am))) +
geom_bar(position = "dodge")
## 填满同高显示
ggplot(data = mtcars, mapping = aes(x = factor(cyl), fill = factor(am))) +
geom_bar(position = "fill")
ggplot(data = mtcars, mapping = aes(x = mpg)) +
geom_histogram(bins = 10, color = "white")
## 以三种颜色显示出三种缸与车重之间的关系
ggplot(data = mtcars, mapping = aes(x = mpg, fill = factor(cyl))) +
geom_histogram(bins = 10, color = "white")
ggplot(data = mtcars, mapping = aes(x = factor(cyl), y = mpg)) +
geom_boxplot()
ggplot(data = mtcars, mapping = aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
geom_boxplot()
ggplot(data = mtcars, mapping = aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
geom_boxplot(alpha = 0.5)
习题
在掌握了以上知识后,我也给大家布置两个小小的题目:
问题一:
ggplot2自带了一个数据集 mpg,其中包含了一些车辆信息(跟 mtcars 数据集很像)
要求:
(1) 基于 mpg 数据集, 用 ggplot2 画出 10 副图形
(2) 这些图形必须包含 geom_point,geom_line,geom_bar,geom_histogram,geom_boxplot 这五种图形类型(或者他们之间的组合)
(3) 在画图时使用各种美学映射
问题二:
复现下面这三张图(同样基于 mpg 数据集)
习题答案讲解
对于问题一,如果大家有问题的话,我可以单独出一片文章来进行讲解啦,下面是问题二的相关代码,大家可以如果做完了的话可以看一看,有什么问题可以及时提出来哦!
library(ggplot2)
library(tidyverse)
图一:
mpg%>%
ggplot(aes(x=cty,y=displ,color=class))+
geom_point()
图二:
mpg%>%
ggplot(aes(x=factor(year),fill=class))+
geom_bar(position='dodge')
图三:
mpg%>%
ggplot(aes(x=cty,fill=factor(year)))+
geom_histogram(bins=10,color='black')