library(tidyverse) # install.packages("tidyverse")
library(patchwork) # install.packages("patchwork")
── Attaching core tidyverse packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
 dplyr     1.1.4      readr     2.1.5
 forcats   1.0.0      stringr   1.5.1
 ggplot2   3.5.0      tibble    3.2.1
 lubridate 1.9.3      tidyr     1.3.1
 purrr     1.0.2     
── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
 dplyr::filter() masks stats::filter()
 dplyr::lag()    masks stats::lag()
 Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

数据可视化-ggplot入门#

图形属性#

  • 我们在图中画一个点,那么这个点就有(形状,大小,颜色,位置,透明度)等属性, 这些属性就是图形属性(有时也称之为图形元素或者视觉元素) image.png

  • 点和线常用的图形属性

geom

x

y

size

color

shape

linetype

alpha

fill

group

point

line

宏包ggplot2#

  • ggplot2有一套优雅的绘图语法,包名中“gg”是grammar of graphics的简称。

  • 数值到图形属性的映射过程

    • image.png

  • 我们希望用点的大小代表这个位置上的某个变量(比如,降雨量,产品销量等等),那么变量的数值越小,点的半径就小一点,数值越大,点就可以大一点;或者变量的数值大,点的颜色就深一点,数值小,点的颜色就浅一点。即,数值到图形属性的映射过程。

  • 映射是一个数学词汇,可以理解为一一对应

1 ggplot()函数包括9个部件#

  • 数据 (data) (数据框)

  • 映射 (mapping)

  • 几何形状 (geom)

  • 统计变换 (stats)

  • 标度 (scale)

  • 坐标系 (coord)

  • 分面 (facet)

  • 主题 (theme)

  • 存储和输出 (output)

其中数据、映射、几何形状这三个是必需的。 语法模版如下

ggplot(data = <DATA>) + 
   <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
Error in parse(text = x, srcfile = src): <text>:1:15: 意外的'<'
1: ggplot(data = <
                  ^
Traceback:

此外,图形中还可能包含数据的统计变换(statistical transformation,缩写stats),最后绘制在某个特定的坐标系(coordinate system,缩写coord)中,而分面(facet)则可以用来生成数据不同子集的图形。

library(tidyverse)
library(janitor)
library(palmerpenguins)

penguins <- penguins %>% 
  janitor::clean_names() %>% 
  drop_na()
载入程辑包:‘janitor’
The following objects are masked from ‘package:stats’:

    chisq.test, fisher.test
penguins %>% 
  head()
A tibble: 6 × 8
speciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsexyear
<fct><fct><dbl><dbl><int><int><fct><int>
AdelieTorgersen39.118.71813750male 2007
AdelieTorgersen39.517.41863800female2007
AdelieTorgersen40.318.01953250female2007
AdelieTorgersen36.719.31933450female2007
AdelieTorgersen39.320.61903650male 2007
AdelieTorgersen38.917.81813625female2007
penguins %>% 
  select(species, sex, bill_length_mm, bill_depth_mm) %>% 
  head(4)
A tibble: 4 × 4
speciessexbill_length_mmbill_depth_mm
<fct><fct><dbl><dbl>
Adeliemale 39.118.7
Adeliefemale39.517.4
Adeliefemale40.318.0
Adeliefemale36.719.3
ggplot(data = penguins)+
  geom_point(aes(x = bill_length_mm, y = bill_depth_mm,
                 size=species, color=species,
                shape=species, alpha=sex))
Warning message:
“Using size for a discrete variable is not advised.”
Warning message:
“Using alpha for a discrete variable is not advised.”
../_images/3e99fe1235246fbc373d533cf7e964700393f180a22516375c76c2e166b7c404.png
  • image.png

  • ggplot() 初始化绘图,相当于打开了一张纸,准备画画。

  • ggplot(data = penguins) 表示使用penguins这个数据框来画图。

  • +表示添加图层。

  • geom_point()表示绘制散点图。

  • aes()表示数值和视觉属性之间的映射。

  • aes()除了位置上映射,还可以实现**色彩(color)、形状(size)或透明度(alpha)**等视觉属性的映射。

aes(x = bill_length_mm, y = bill_depth_mm),意思是变量bill_length_mm作为(映射为)x轴方向的位置,变量bill_depth_mm作为(映射为)y轴方向的位置。

  • ggplot()内部有一套默认的设置

  • image.png

2 映射 vs 设置#

  • 想把图中的点指定为某一种颜色,可以使用设置语句,比如

## 映射只是将你提供的元素映射进图片中,
## 而设置则会按照想要的方式设置某些元素
p1 = ggplot(penguins)+
  geom_point(aes(x = bill_length_mm, y = bill_depth_mm,
                 color = "blue", size=5, alpha=0.5) 
             )

p2 = ggplot(penguins)+
  geom_point(aes(x = bill_length_mm, y = bill_depth_mm), 
             color = "blue", size=5, alpha=0.5)

p1 / p2 / ggplot(penguins)+
             geom_point(aes(x = bill_length_mm, y = bill_depth_mm,
                        color = "blue", size=5, alpha=0.5), 
                        color="blue")
## 所以p1中只是将"blue"这个元素映射入图片的color,表示只有一种类型的颜色
# 不会改变图片的颜色属性,这是我的个人理解
../_images/81976a48f941c83b970ba07119c8ec48e71be0622b0d788322306a5316aeb74f.png

3 几何形状#

  • geom_point() 可以画散点图,也可以使用geom_smooth()绘制平滑曲线

  • geom_smooth()有多个绘制平滑曲线的方法——线性拟合

ggplot(penguins) +
  geom_smooth(aes(x = bill_length_mm, y = bill_depth_mm),
             method = "lm")
`geom_smooth()` using formula = 'y ~ x'
../_images/700711eb3dceac6967e5764238d21811e1df4e5ba938ecb83c6fdca38b04030a.png

4 图层叠加#

ggplot(penguins)+
  geom_point(aes(x = bill_length_mm, y = bill_depth_mm))+
  geom_smooth(aes(x = bill_length_mm, y = bill_depth_mm), method="lm")
`geom_smooth()` using formula = 'y ~ x'
../_images/49a3171dce3984e0c1828b444402fcece76f5785e16adc646bcfd41edee1b784.png
## 如果两图层使用的x y 相同,则可以直接将元素映射到底层(公用)ggplot()中
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm))+
  geom_point()+
  geom_smooth(method="lm")
`geom_smooth()` using formula = 'y ~ x'
../_images/49a3171dce3984e0c1828b444402fcece76f5785e16adc646bcfd41edee1b784.png

5 Global vs Local#

  • 以下两段代码出来的图是一样。但背后的含义却不同。

  • 映射关系aes(x = bill_length_mm, y = bill_depth_mm) 写在ggplot()里, 为全局声明(Global)。

    • 那么,当geom_point()画图时,发现缺少图形所需要的映射关系(点的位置、点的大小、点的颜色等等),就会从ggplot()全局变量中继承映射关系

  • 如果映射关系aes(x = bill_length_mm, y = bill_depth_mm)写在几何形状geom_point()里, 那么此处的映射关系就为局部声明(Local)

    • 那么geom_point()绘图时,发现所需要的映射关系已经存在,就不会继承全局变量的映射关系。

p1 = ggplot(penguins, aes(x=bill_length_mm, y=bill_depth_mm, 
                          color=species))+
       geom_point()

p2 = ggplot(penguins) +
       geom_point(aes(x = bill_length_mm, y = bill_depth_mm, 
                      color = species))
p1/p2
../_images/002740cb400ba85a79e102d96afb63442d6017487db15d5de1ab1c9057c014c6.png
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm))+
  geom_point(aes(color=species))+
  geom_smooth(method="lm")
# 这里的 geom_point() 和 geom_smooth() 都会从全局变量中继承位置映射关系

ggplot(penguins,aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point(aes(color = sex))
# 局部变量中的映射关系aes(color=)已经存在,因此不会从全局变量中继承
# 沿用当前的映射关系
`geom_smooth()` using formula = 'y ~ x'
../_images/da629521d6fd0d50409e333dcbabb3db5693a56797f39e4060860748a2af7e01.png ../_images/e45d3f7dcece8bca64f96043224eb38d3540548339b607eb6212d16e48360f8e.png
### 图层从全局声明中继承
# 当当前local单独声明时,图例标题还是会继承全局声明,而图例内容则沿用当前
p1 = ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
      geom_point()

p2 = ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm)) +
      geom_point(aes(color = species))

p3 = ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = sex)) +
      geom_point(aes(color = species))
# p3图例的标题继承了全局声明中的color
p1/p2/p3
../_images/b30de1bf4a10fbe8f33dee1309a6f1929947f64a3bbbb515d9196477a14c6475.png
### 图层之间没有继承关系
p1 = ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm)) +
       geom_point() +
       geom_smooth(method = "lm")


p2 = ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm)) +
       geom_point(aes(color = species)) +
       geom_smooth(method = "lm")

p3 = ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm)) +
       geom_smooth(method = "lm") +
       geom_point(aes(color = species))

p4 = ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
       geom_point() +
       geom_smooth(method = "lm") 

(p1 + p2) / (p3 + p4)
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
../_images/f93d155127a77f85faec59fb95585161980c13e694040d5bec4147215ef9da52.png
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point() +
  geom_smooth(method = "lm", aes(color = sex)) 
`geom_smooth()` using formula = 'y ~ x'
../_images/76cfa21b493cf1bb3fbd926ed9d6cc736f9ffb14d297f6859da72616d76f6836.png

6 保存图片#

  • ggsave()函数,将图片保存为所需要的格式,如”.pdf”, “.png”等, 还可以指定图片的高度和宽度,默认units是英寸,也可以使用”cm”, or “mm”.

  • 如果想保存当前图形,ggplot()可以不用赋值,同时省略ggsave()中的 plot = p1ggsave()会自动保存最近一次的绘图

p1 <- penguins %>% 
  ggplot(aes(x = bill_length_mm, y = bill_depth_mm)) +
  geom_smooth(method = lm) +
  geom_point(aes(color = species)) +
  ggtitle("This is my first plot")
p1

ggsave(plot=p1, filename="my_plot.pdf", width=8, height=5, dpi=330)
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
../_images/d2a521da846cfb458ec1a917b9326c3412055d578b2f1a0cb05ee901b68fe83d.png
penguins %>% 
  ggplot(aes(x = bill_length_mm, y = bill_depth_mm)) +
  geom_smooth(method = lm) +
  geom_point(aes(color = species)) +
  ggtitle("This is my first plot")

ggsave("my_last_plot.pdf", width = 8, height = 6, dpi = 330)
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
../_images/d2a521da846cfb458ec1a917b9326c3412055d578b2f1a0cb05ee901b68fe83d.png
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color=species)) +
  geom_point() +
  geom_smooth(method="lm") +
  geom_smooth(aes(color=species), method="lm") 
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
../_images/0046e7864539a7600ea7845413088559a736fd68f7fde1da279d98901c28f243.png