可视化:RStudio ggplot2绘图系统讲解(一)

268 阅读5分钟

本文已参加「新人创作礼」活动,一起开启掘金创作之路。

这次带来的是 RStudio 的 ggplot2 绘图系统的讲解,超详细!!!

文中提到的代码在导入 ggplot2tidyverse 包后,均可直接运行,所有数据集均为 R 自带数据集。

文末还为大家准备了习题和讲解哦,感兴趣的大伙们可以做一做呀!

ggplot2 绘图系统:简介

ggplot2 是一个现代的数据可视化工具,由 Hadley Wickham 在2005年以 R 包的形式发布。ggplot2 实现了统计学家 Leland Wilkinson 提出的图形语法系统 The Grammar of Graphics,这套系统将图形分解为各种语义学元素(比如:标度 scales,图层 layers 等)。ggplot2 已经在全世界范围内成为了非常受欢迎的绘图工具,并且被移植到了其它平台(如 Python)。

ggplot2 的术语

  • data:定义了数据集
  • mapping:将数据映射到 aesthetics 美学属性
  • aes:美学属性,包括 x y 坐标,颜色,大小等(x,y,color,fill,alpha,size,linetype)
  • geom_xxx:几何对象,定义了图形类型(geom_point,geom_line,geom_bar,geom_boxplot,...)
  • stat_xxx:统计变换(stat_bin,stat_count,stat_density,stat_identity,...)
  • coord_xxx:坐标系变换(coord_cartesian,coord_polar,coord_flip,...)
  • scale_xxx:调整标度(scale_color_gradient,scale_fill_gradient,scale_x_continuous,...)
  • facet_xxx:分面(facet_wrap,facet_grid,facet_null)
  • guides:调整图例
  • theme:调整主题

ggplot2 绘图系统:图层

ggplot2的图层

  • ggplot2 使用各个图层的叠加来绘制可用的图像.
  • ggplot2 的最简结构:数据 data + 映射 mapping=aes(...) + 几何对象 geom_xxx
# 如果没有安装ggplot2, 需要用以下命令安装
# install.packages("ggplot2")
# 载入包
library(ggplot2)

# ggplot2的各个图层

# 数据层
ggplot(data = mtcars)

# 数据层 + 映射层
ggplot(data = mtcars) + aes(x = wt, y = mpg)

# 通常可以使用 mapping 参数将这两层合并到一起写:
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg))

# 数据层 + 映射层 + 几何对象
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) + geom_point()
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) + geom_line()

# 数据层 + 映射层 + 几何对象1 + 几何对象2
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) + geom_point() + geom_line()

ggplot2 绘图系统: 几何对象

图形类型(几何对象)

我们常说的 图形类型(散点图,线图,条形图...) 由 ggplot2 的 几何对象(geom_xxx) 来控制,下面我给大家以 R 自带数据集 mtcars 为例,分别讲解各种常用的图形,附代码:

# 常用图形类型

# 散点图
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) + 
  geom_point()

ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) + 
  geom_point(color = "red", size = 5)

ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) + 
  geom_point(color = "red", size = 5, alpha = 0.3)

ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) + 
  geom_point(color = "red", size = 5, alpha = 0.3, shape = 17)

# 线图,线图没有shape属性,若利用 rgb 表示颜色,出现第七位和第八位代表透明度
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) + 
  geom_line()

ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) + 
  geom_line(color = "orange")

ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) + 
  geom_line(color = "orange", linetype = 2)

ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) + 
  geom_line(color = "orange", size = 2)

ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) + 
  geom_line(color = "orange", size = 2, alpha = 0.4)

ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) +
  geom_line(color = rgb(0.8, 0.9, 0.2))

ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) +
  geom_line(color = "#FF0000")

ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) +
  geom_line(color = "#FF000044")

# 条形图
ggplot(data = mtcars, mapping = aes(x = cyl)) + 
  geom_bar()

ggplot(data = mtcars, mapping = aes(x = factor(cyl))) + 
  geom_bar()
# 注意上面两图中 x=cyl 和 x=factor(cyl) 的区别

ggplot(data = mtcars, mapping = aes(x = factor(cyl))) + 
  geom_bar(color = "pink")

ggplot(data = mtcars, mapping = aes(x = factor(cyl))) + 
  geom_bar(fill = "pink")

ggplot(data = mtcars, mapping = aes(x = factor(cyl))) + 
  geom_bar(fill = "pink", width = 0.2)

# 直方图
ggplot(data = mtcars, mapping = aes(x = mpg)) + 
  geom_histogram()

ggplot(data = mtcars, mapping = aes(x = mpg)) + 
  geom_histogram(bins = 6)

ggplot(data = mtcars, mapping = aes(x = mpg)) + 
  geom_histogram(bins = 6, color = "white")

ggplot(data = mtcars, mapping = aes(x = mpg)) + 
  geom_histogram(bins = 6, color = "blue", fill = "white")

# 箱线图
ggplot(data = mtcars, mapping = aes(y = mpg)) + 
  geom_boxplot()

ggplot(data = mtcars, mapping = aes(y = mpg)) + 
  geom_boxplot(fill = "darkgreen", alpha = 0.4)


# 没有饼图

# 面积图(以后会提到)

ggplot2 绘图系统:映射

mapping 映射 指的是数据集中的各个变量是如何对应到图形元素(x轴,y轴,形状,大小,颜色,透明度)的。

数据集中的变量有多种类型(数值型变量 / 名义变量 / 有序变量),需要用合适的图形元素来表达数据。

# 映射图形元素
ggplot(data = mtcars, mapping = aes(x = wt, y = mpg, color = cyl)) + 
  geom_point()

ggplot(data = mtcars, mapping = aes(x = wt, y = mpg, color = factor(cyl), size = wt)) + 
  geom_point()

ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) + 
  geom_line() +
  geom_point(aes(color = factor(cyl), size = wt))


ggplot(data = mtcars, mapping = aes(x = factor(cyl))) +
  geom_bar()

ggplot(data = mtcars, mapping = aes(x = factor(cyl), fill = factor(cyl))) +
  geom_bar()

ggplot(data = mtcars, mapping = aes(x = factor(cyl), fill = cyl)) +
  geom_bar() # 注意与 fill = factor(cyl) 的区别

ggplot(data = mtcars, mapping = aes(x = factor(cyl), fill = factor(am))) +
  geom_bar()

ggplot(data = mtcars, mapping = aes(x = factor(cyl), fill = factor(am))) +
  geom_bar(position = "dodge")

ggplot(data = mtcars, mapping = aes(x = factor(cyl), fill = factor(am))) +
  geom_bar(position = "fill")


ggplot(data = mtcars, mapping = aes(x = mpg)) +
  geom_histogram(bins = 10, color = "white")

ggplot(data = mtcars, mapping = aes(x = mpg, fill = factor(cyl))) +
  geom_histogram(bins = 10, color = "white")


ggplot(data = mtcars, mapping = aes(x = factor(cyl), y = mpg)) +
  geom_boxplot()

ggplot(data = mtcars, mapping = aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
  geom_boxplot()

ggplot(data = mtcars, mapping = aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
  geom_boxplot(alpha = 0.5)

特别的

## 颜色堆积条形图
ggplot(data = mtcars, mapping = aes(x = factor(cyl), fill = factor(am))) +
  geom_bar()

## 并不是所有元素都能用position,只有条形属性才可用,并排显示
ggplot(data = mtcars, mapping = aes(x = factor(cyl), fill = factor(am))) +
  geom_bar(position = "dodge")

## 填满同高显示
ggplot(data = mtcars, mapping = aes(x = factor(cyl), fill = factor(am))) +
  geom_bar(position = "fill")

ggplot(data = mtcars, mapping = aes(x = mpg)) +
  geom_histogram(bins = 10, color = "white")

## 以三种颜色显示出三种缸与车重之间的关系
ggplot(data = mtcars, mapping = aes(x = mpg, fill = factor(cyl))) +
  geom_histogram(bins = 10, color = "white")

ggplot(data = mtcars, mapping = aes(x = factor(cyl), y = mpg)) +
  geom_boxplot()
ggplot(data = mtcars, mapping = aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
  geom_boxplot()
ggplot(data = mtcars, mapping = aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
  geom_boxplot(alpha = 0.5)

习题

在掌握了以上知识后,我也给大家布置两个小小的题目:

问题一:

ggplot2自带了一个数据集 mpg,其中包含了一些车辆信息(跟 mtcars 数据集很像)

要求:

(1) 基于 mpg 数据集, 用 ggplot2 画出 10 副图形

(2) 这些图形必须包含 geom_point,geom_line,geom_bar,geom_histogram,geom_boxplot 这五种图形类型(或者他们之间的组合)

(3) 在画图时使用各种美学映射

问题二:

复现下面这三张图(同样基于 mpg 数据集)

image.png

image.png

image.png

习题答案讲解

对于问题一,如果大家有问题的话,我可以单独出一片文章来进行讲解啦,下面是问题二的相关代码,大家可以如果做完了的话可以看一看,有什么问题可以及时提出来哦!

library(ggplot2)
library(tidyverse)

图一:
mpg%>%
  ggplot(aes(x=cty,y=displ,color=class))+
  geom_point()

图二:
mpg%>%
  ggplot(aes(x=factor(year),fill=class))+
  geom_bar(position='dodge')

图三:
mpg%>%
  ggplot(aes(x=cty,fill=factor(year)))+
  geom_histogram(bins=10,color='black')