ggplot2之几何形状#

library(tidyverse)
── Attaching core tidyverse packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
 dplyr     1.1.4      readr     2.1.5
 forcats   1.0.0      stringr   1.5.1
 ggplot2   3.5.0      tibble    3.2.1
 lubridate 1.9.3      tidyr     1.3.1
 purrr     1.0.2     
── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
 dplyr::filter() masks stats::filter()
 dplyr::lag()    masks stats::lag()
 Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

1 图形语法#

图形语法 “grammar of graphics” (“ggplot2” 中的gg就来源于此) 使用图层(layer)去描述和构建图形,下图是ggplot2图层概念的示意图image.png

image.png

2 图形部件#

一张统计图形就是从数据到几何形状(geometric object,缩写geom)所包含的图形属性(aesthetic attribute,缩写aes)的一种映射。

1.data: 数据框data.frame (注意,不支持向量vector和列表list类型)

2.aes: 数据框中的数据变量映射到图形属性。什么叫图形属性?就是图中点的位置、形状,大小,颜色等眼睛能看到的东西。什么叫映射?就是一种对应关系,比如数学中的函数b = f(a)就是ab之间的一种映射关系, a的值决定或者控制了b的值,在ggplot2语法里,a就是我们输入的数据变量,b就是图形属性, 这些图形属性包括:

  • x(x轴方向的位置)

  • y(y轴方向的位置)

  • color(点或者线等元素的颜色)

  • size(点或者线等元素的大小)

  • shape(点或者线等元素的形状)

  • alpha(点或者线等元素的透明度)

3.geoms: 几何形状,确定我们想画什么样的图,一个geom_***确定一种形状。更多几何形状推荐阅读这里

  • geom_bar()

  • geom_density()

  • geom_freqpoly()

  • geom_histogram()

  • geom_violin()

  • geom_boxplot()

  • geom_col()

  • geom_point()

  • geom_smooth()

  • geom_tile()

  • geom_density2d()

  • geom_bin2d()

  • geom_hex()

  • geom_count()

  • geom_text()

  • geom_sf()

4.stats: 统计变换

5.scales: 标度

6.coord: 坐标系统

7.facet: 分面

8.layer: 增加图层

9.theme: 主题风格

10.save: 保存图片

image.png

开始#

R语言数据类型,有字符串型、数值型、因子型、逻辑型、日期型等。 ggplot2会将字符串型、因子型、逻辑型默认为离散变量,而数值型默认为连续变量,将日期时间为日期变量:

  • 离散变量: 字符串型, 因子型, 逻辑型

  • 连续变量: 双精度数值, 整数数值

  • 日期变量: 日期, 时间, 日期时间

我们在呈现数据的时候,可能会同时用到多种类型的数据,比如

  • 一个离散

  • 一个连续

  • 两个离散

  • 两个连续

  • 一个离散, 一个连续

  • 三个连续

1 导入数据#

后续用到的所有数据均可在https://github.com/Crazzy-Rabbit/R_for_Data_Science/tree/master/demo_data下载

gapdata <- read_csv("./demo_data/gapminder.csv")
Rows: 1704 Columns: 6
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (2): country, continent
dbl (4): year, lifeExp, pop, gdpPercap
 Use `spec()` to retrieve the full column specification for this data.
 Specify the column types or set `show_col_types = FALSE` to quiet this message.
gapdata %>% head()
A tibble: 6 × 6
countrycontinentyearlifeExppopgdpPercap
<chr><chr><dbl><dbl><dbl><dbl>
AfghanistanAsia195228.801 8425333779.4453
AfghanistanAsia195730.332 9240934820.8530
AfghanistanAsia196231.99710267083853.1007
AfghanistanAsia196734.02011537966836.1971
AfghanistanAsia197236.08813079460739.9811
AfghanistanAsia197738.43814880372786.1134

2 检查数据#

是否有缺失值

gapdata %>% 
  summarise(
    across(everything(), ~sum(is.na(.)))
  )
A tibble: 1 × 6
countrycontinentyearlifeExppopgdpPercap
<int><int><int><int><int><int>
000000

基本绘图#

1 柱状图#

常用于一个离散变量

geom_bar()自动完成对相应变量的count

gapdata %>% 
  ggplot(aes(x = continent)) +
  geom_bar()
../_images/8a13a1994c1ed94539cb0ed20f7dc5bc61fe5cbd2c567014294ba8b0da9108dd.png
gapdata %>% 
  ggplot(aes(x = reorder(continent, continent, length))) +
  geom_bar()
../_images/ef24426b25231e63e1bca3e33ba3b632715cf5d24dfee8074343923ab597f38a.png
gapdata %>% 
  ggplot(aes(x = reorder(continent, continent, length))) +
  geom_bar() +
  coord_flip()
../_images/33ada791cf03602b62bda87fa69d401355b078d18c0f0414aac3600238702aec.png
# geom_bar vs stat_count
library(patchwork)
p = gapdata %>% 
  ggplot(aes(x = continent)) + 
  stat_count()

p1 = gapdata %>% 
  ggplot(aes(x = continent)) +
  geom_bar()

p / p1
../_images/8ee0feca6c02624fb26f2f79282fc06ed84437ea0a2c1e65088d081f7de1e034.png
gapdata %>% count(continent)
A spec_tbl_df: 5 × 2
continentn
<chr><int>
Africa 624
Americas300
Asia 396
Europe 360
Oceania 24

geom_bar() 自动完成了对对应行的count这个统计

gapdata %>% 
  distinct(continent, country) %>% 
  ggplot(aes(x = continent)) +
  geom_bar()
../_images/a15a4a76ebc87b3a49a4083c0cb76cd54d42fe6ab07f5e74ec24dd1cddcaa794.png

可先进行统计,再画图,不过显然直接用geom_bar()代码更少

gapdata %>% 
  distinct(continent, country) %>% 
  group_by(continent) %>% 
  summarise(num = n()) %>% 
  ggplot(aes(x = continent, y = num)) +
  geom_col()
../_images/d43e9d9bb89cc1ab29602ca6feb83e782fcd0c8fcf1705f50e2fada35074b6a4.png

2 直方图#

常用于一个连续变量

geom_histograms(), 默认使用 position = "stack"

gapdata %>% 
  ggplot(aes(x = lifeExp)) +
  geom_histogram() # corresponding to stat_bin()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
../_images/e04b8785bdec558ece4c00785f139f7402a4653203633e48eabd81b2cd998e7a.png
gapdata %>% 
  ggplot(aes(x = lifeExp)) +
  geom_histogram(binwidth = 1)
../_images/260e89838b988f34a5e66703dd4f50ea9daa3de515cf598c7e038857204e739f.png

geom_histograms(), 默认使用 position = "stack"

gapdata %>% 
  ggplot(aes(x = lifeExp, fill = continent)) +
  geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
../_images/056627adb39c5cabe915e2d0c58c6c8c12deb07d0e4b870ba1149e006f2dc1e2.png

也可以指定 position = "identity"

参数的含义是指直方图的条形应当以其实际计数(频数)堆叠在一起,而不进行任何调整

gapdata %>% 
  ggplot(aes(x = lifeExp, fill = continent)) + 
  geom_histogram(position = "identity")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
../_images/1f5f4ba88c4f04c50127e015e39c3b70c32b20643b1e1837eed798e6b5b6e929.png

3 频次图#

geom_freqpoly()

gapdata %>% 
  ggplot(aes(x = lifeExp, color = continent)) +
  geom_freqpoly()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
../_images/548124953d5f87549fa5e67ff9065493654ba154d51cff8a8c79e809a097a321.png

4 密度图#

geom_density()

  • geom_density()adjust 用于调节bandwidth, adjust = 1/2 means use half of the default bandwidth.

geom_line(stat = "density")

#' smooth histogram = density plot
gapdata %>% 
  ggplot(aes(x = lifeExp)) +
  geom_density()
../_images/097956d47c2e885e7b1b91c643ba84dc8d285a8711e52402d100440bfccc24f7.png
gapdata %>% 
  ggplot(aes(x = lifeExp)) +
  geom_line(stat = "density")
../_images/1516d5d74260182608e252b71f7a7cba15a592c760c28ae0ed42708fd75c9821.png

adjust 用于调节bandwidth, adjust = 1/2means use half of the default bandwidth.

gapdata %>% 
  ggplot(aes(x = lifeExp)) +
  geom_density(adjust = 0.2)
../_images/91e0a7265cf71a09269c04bd55fb2c77286ea2751178f6c32d5d1becde5de8d1.png
gapdata %>% 
  ggplot(aes(x = lifeExp, color = continent)) +
  geom_density()
../_images/81b7e282edd40c13d52e3daeb19a44c4085bca8206f67615d3a7edcdc8759530.png
gapdata %>% 
  ggplot(aes(x = lifeExp, fill = continent)) +
  geom_density(alpha = 0.2)
../_images/c28640767a5cbddf07f8a5e68d20949f9cbfcaf4947c52b87aa03c5b6b2cd436.png
gapdata %>% 
  filter(continent != "Oceania") %>% 
  ggplot(aes(x = lifeExp, fill = continent)) +
  geom_density(alpha = 0.2)
../_images/8c521b85585687b231c384a7c5339ada1181910efc9f42bfcff4079bc7ce58c0.png

直方图和密度图画在一起。注意y = stat(density)表示y是由x新生成的变量,这是一种固定写法,类似的还有stat(count), stat(level)

gapdata %>% 
  filter(continent != "Oceania") %>% 
  ggplot(aes(x = lifeExp, y = stat(density))) +
  geom_histogram(aes(fill = continent)) +
  geom_density()
Warning message:
“`stat(density)` was deprecated in ggplot2 3.4.0.
 Please use `after_stat(density)` instead.”
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
../_images/6474e799f83bede3f3376c1f20e11d4f409a67f352b103618f3b61da114ed2c2.png

5 箱线图#

一个离散变量 + 一个连续变量

gapdata %>% 
  ggplot(aes(x = year, y = lifeExp)) +
  geom_boxplot()
Warning message:
“Continuous x aesthetic
 did you forget `aes(group = ...)`?”
../_images/dbd954bd3cb8c62478038f187893d3aab236a20515fb5805226f285514d9d4fd.png

数据框中的year变量是数值型,需要先转换成因子型,弄成离散型变量

gapdata %>% 
  ggplot(aes(x = as.factor(year), y = lifeExp)) +
  geom_boxplot()
../_images/9318e6662ac947f0b2aa7b4763356579f4c71a7934be9fbac1bc4fde6c057ff5.png

当然,也可以用group明确指定分组变量

gapdata %>% 
  ggplot(aes(x = year, y = lifeExp)) +
  geom_boxplot(aes(group = year))
../_images/bfe5a871d0d1c971c2d0adaa673273815190b152c4adb579df0c11e037b78a08.png

小提琴图+散点+光滑曲线

gapdata %>% 
  ggplot(aes(x = year, y = lifeExp))+
  geom_violin(aes(group = year))+
  geom_jitter(alpha = 0.25)+
  geom_smooth(se = TRUE)
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
../_images/3abe65e2089e8930f126eca51932699260b62e70baab874c96dc2df08b5e8e8a.png

6 抖动散点图#

点重叠的处理方案

geom_jitter()

gapdata %>% 
  ggplot(aes(x = continent, y = lifeExp)) +
  geom_point()
../_images/c097bcd6c442fdb0e3db98afa8aecad1c01219cda89d6f58edbbd458d6281c0c.png
gapdata %>% 
  ggplot(aes(x = continent, y = lifeExp))+
  geom_jitter()
../_images/6feba60266ac97245b3a9d7a38ef40591fd57ecddcd3d2895d82d487f0c13d69.png
gapdata %>% 
  ggplot(aes(x = continent, y = lifeExp)) +
  geom_boxplot()
../_images/2877ae50a9f9c82f2609b7982d74712d4dc87d82a003b93fb6c2700c3a66e84d.png
gapdata %>% 
  ggplot(aes(x = continent, y = lifeExp))+
  geom_boxplot()+
  geom_jitter(alpha = 0.25)
../_images/e65df2094359c23775918917dc851dd7587c3324afd3e9303049b075762eb3e9.png
gapdata %>% 
  ggplot(aes(x = continent, y = lifeExp))+
  geom_jitter()+
  stat_summary(fun.y = median, colour = "red", geom = "point", size = 5)
Warning message:
“The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
 Please use the `fun` argument instead.”
../_images/03b60e020f254ad2b8a281b1283931b0e1799bde52c41b461de5141a04eb1040.png
gapdata %>%
  ggplot(aes(reorder(x = continent, lifeExp), y = lifeExp)) +
  geom_jitter() +
  stat_summary(fun.y = median, colour = "red", geom = "point", size = 5)
../_images/dc9e42ad516ca3fd78309b36bdd77ef8f636ed7ff165ba0f5ea276e1cb7b187a.png

注意到我们已经提到过 stat_count / stat_bin / stat_summary

gapdata %>% 
  ggplot(aes(x = continent, y = lifeExp))+
  geom_violin(trim = FALSE, alpha = 0.5) +
  stat_summary(fun.y = mean,
    fun.ymax = function(x){mean(x) + sd(x)},
    fun.ymin = function(x){mean(x) - sd(x)},
    geom = "pointrange")
Warning message:
“The `fun.ymin` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
 Please use the `fun.min` argument instead.”
Warning message:
“The `fun.ymax` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
 Please use the `fun.max` argument instead.”
../_images/e7deb3e8cc1b5108b8e75c774c109dd9ed4f0ef405de65f874e8a614fc1c8c2a.png
gapdata %>% 
  ggplot(aes(x = continent, y = lifeExp))+
  geom_violin(trim = FALSE, alpha = 0.5) +
  stat_summary(fun.y = mean,
    fun.ymax = ~mean(.x) + sd(.x),
    fun.ymin = ~mean(.x) - sd(.x),
    geom = "pointrange")
../_images/e7deb3e8cc1b5108b8e75c774c109dd9ed4f0ef405de65f874e8a614fc1c8c2a.png

7 山峦图#

常用于一个离散变量 + 一个连续变量

ggridges::geom_density_ridges()

gapdata %>% 
  ggplot(aes(x = lifeExp, y = continent, 
             fill = continent))+
  ggridges::geom_density_ridges()
Picking joint bandwidth of 2.23
../_images/f51b7fc195b4608035bf4dbc59020ee98e18611eccd409c1006c84f1b7dd374f.png
gapdata %>% 
  ggplot(aes(x = lifeExp, y = continent,
            fill = continent))+
  ggridges::geom_density_ridges()+
  scale_fill_manual(
    values = c("#003f5c", "#58508d", "#bc5090", "#ff6361", "#ffa600"))
Picking joint bandwidth of 2.23
../_images/fc4546c7f4e46772c60e73a17642aa43cc861989338965aa53c823484b6d6f0c.png
# colorspace 调色板
gapdata %>% 
  ggplot(aes(x = lifeExp, y = continent, 
             fill = continent))+
  ggridges::geom_density_ridges()+
  scale_fill_manual(
    values = colorspace::sequential_hcl(5, palette = "Peach"))
Picking joint bandwidth of 2.23
../_images/b1d9607a1852c273b9dd5b58340e462966ddf65b028b98a38a7ad7aca182fab2.png

散点图#

常用于两个连续变量

geom_point()

gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp))+
  geom_point()
../_images/573e425c91b07199b0abe39316acd05ef6378cf6424a70eb1340e677d878a31b.png

更好的 log 转化方式

  • scale_x_log10()

  • scale_y_log10()

# 一般
gapdata %>% 
  ggplot(aes(x = log(gdpPercap), y = lifeExp))+
  geom_point()
../_images/87bf1cdff586644bd96f444966c4cb62f37065afd7225d2b45e998a99b5ceed7.png
# 更好方式
gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp)) +
  geom_point()+
  scale_x_log10()
../_images/057ceb5dfaee4c7e6237f5dd4be1a1e8bd545c6f1ddd6c3d5e7dbe41446a1dd7.png
着色方式
Error in eval(expr, envir, enclos): 找不到对象'着色方式'
Traceback:
gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp))+
  geom_point(aes(color = continent))

gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp, 
             color = continent))+
  geom_point()
../_images/88924114b773fd8cf5ebdf9e812611c9605b9ec23aaebaf4aa5b73c7d076b97d.png ../_images/88924114b773fd8cf5ebdf9e812611c9605b9ec23aaebaf4aa5b73c7d076b97d.png
gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp))+
  geom_point(alpha = (1/3), size = 2)
../_images/947483e17d4ba8127f3c63991ffeda826978c187faf8d2b2108de0b8f10eadb9.png
gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp))+
  geom_point(alpha = 0.3)+
  geom_smooth()
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
../_images/a460500b598568abe3ac593f8e5e7a94f04c5f0248985dc207f5fa8891f1d89f.png
gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp))+
  geom_point()+
  geom_smooth(lwd = 3, se = FALSE)
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
../_images/96d1da28d714cad17cdb549c4594c365270ab545a0446551a435059aaa1d9b35.png
gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp))+
  geom_point()+
  geom_smooth(lwd = 3, se = FALSE, method = "lm")
`geom_smooth()` using formula = 'y ~ x'
../_images/ba4edb396d45482fc5995d065cf4d8cc64794fbf95233f5fddf7bf1f74fbd224.png
gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp, 
             color = continent))+
  geom_point()+
  geom_smooth(lwd = 3, se = FALSE, method = "lm")
`geom_smooth()` using formula = 'y ~ x'
../_images/42c8a6966f2692be510f2facb88fb09eb0a4173b6f5ff219c7032940c63a6b70.png
gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp, 
             color = continent))+
  geom_point(alpha = 0.3)+
  geom_smooth(lwd = 1, color = "blue", se = TRUE, method = "lm")
`geom_smooth()` using formula = 'y ~ x'
../_images/72d41e02603ebca0046772311ff57fc7f619c622ac42e28a01b44a4a03f08143.png
jCountries <- c("Canada", "Rwanda", "Cambodia", "Mexico")

gapdata %>% 
  filter(country %in% jCountries) %>% 
  ggplot(aes(x = year, y = lifeExp, color = country))+
  geom_line()+
  geom_point()
../_images/56c9a072b7d09de585320f2e5bed387a4dd8f5bbf00b5ccf870cfb96270698c3.png

可以看到,图例的顺序和图中的顺序不太一致,

在设置color的时候可以对continent进行reorder

gapdata %>% 
  filter(country %in% jCountries) %>% 
  ggplot(aes(x = year, y = lifeExp, 
             color = reorder(country, -1 * lifeExp, max)
            ))+
  geom_line()+
  geom_point()
../_images/dcd3fd59aab0d6dadcaa270bfd97ef2b8aeace43bb8a25dae7176ef620828845.png

当然还有如下方式

利用if_else函数增加一列,并直接用geom_label(aes(label = end_label))讲其加入图中max那个点

gapdata %>% 
  filter(country %in% jCountries) %>% 
  group_by(country) %>% 
  mutate(end_label = if_else(year == max(year), country, NA_character_)) %>% 
  ggplot(aes(x = year, y = lifeExp, 
            color = country))+
  geom_line()+
  geom_point()+
  geom_label(aes(label = end_label))+
  theme(legend.position = "none")
Warning message:
“Removed 44 rows containing missing values or values outside the scale range (`geom_label()`).”
../_images/898ec368f5bd1cbe4fa2ba4f90a8529d9f4e0ef2eb86cf32a607a1aa5929a392.png

如果觉得麻烦,可以用gghighlight宏包

# install.packages("gghighlight")
library(gghighlight)
gapdata %>% 
  filter(country %in% jCountries) %>% 
  ggplot(aes(x = year, y = lifeExp,
             color = country))+
  geom_line()+
  geom_point()+
  gghighlight::gghighlight()
label_key: country
../_images/d0db514bd4e5111505f49d1c2d4c9e8f21c4813811cb11a94864199301f3471f.png

9 点线图#

geom_point() + geom_segment()

# 点图
gapdata %>% 
  filter(continent == "Asia" & year == 2007) %>% 
  ggplot(aes(x = lifeExp, y = country))+
  geom_point()
../_images/9876f19cc7f8eae976b3ebf1038ad66fc498667fb22651a47c43d91286764ed7.png
# 点线图
gapdata %>% 
  filter(continent == "Asia" & year == 2007) %>% 
  ggplot(aes(x = lifeExp, y = reorder(country, lifeExp),
             ))+
  geom_point(color = "blue", size = 2)+
  geom_segment(aes(x = 40, xend = lifeExp, 
                   y=reorder(country,lifeExp),yend=reorder(country,lifeExp)),
                   color = "lightgrey")+
  labs(x = "Life Expectancy (years)", y = "",
      title = "Life Expectancy by Country",
      subtitle = "GapMinder data for Asia - 2007")+
  theme_minimal()+
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())
../_images/3d615c23e5c96f8d0fddadea8595ccc9296a03fb8b41d6c908f87758d1daf85a.png

10 分面#

  • 分面有两个 - facet_grid() - facet_wrap()

1 facet_grid()#

  • create a grid of graphs, by rows and columns

  • use vars() to call on the variables

  • adjust scales with scales = "free"

gapdata %>% 
  ggplot(aes(x = lifeExp)) +
  geom_density()+
  facet_grid(. ~ continent)
../_images/29b7f5c72b8fea877935f26c7e5a9303950231479d9c3c2f6e8cdecbe6f8fd7b.png
gapdata %>% 
  filter(continent != "Oceania") %>% 
  ggplot(aes(x = lifeExp, fill = continent))+
  geom_histogram()+
  facet_grid(continent ~ .)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
../_images/027ff5b3179df56a4f07e4d3fd597f9044ef2ea9f65fca5f64595e53b7461940.png
gapdata %>%   
  filter(continent != "Oceania") %>% 
  ggplot(aes(x = lifeExp, y = stat(density)))+
  geom_histogram(aes(fill = continent))+
  geom_density()+
  facet_grid(continent~ .)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
../_images/90e3cbdc8f49c8c077c9ac7196a180178806f8fe7373912bc444425098c7566c.png

2 facet_wrap()#

  • create small multiples by “wrapping” a series of plots

  • use vars() to call on the variables

  • nrow and ncol arguments for dictating shape of grid

gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp, color = continent))+
  geom_point(show.legend = FALSE)+
  facet_wrap(~continent)
../_images/9cd01126b467a335f0c22128505d284c74be38e559cfa879a072873e1b747e71.png

11 文本标注#

ggforce::geom_mark_ellipse()

ggrepel::geom_text_repel()

gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp))+
  geom_point()+
  ggforce::geom_mark_ellipse(aes(
    filter = gdpPercap > 70000,
    label = "Rich country",
    description = "What country are they?"
  ))
../_images/200adc3c89eebe2a4c998ef0742b7c64566370fe03d92771508d515229167a52.png
ten_countries <- gapdata %>% 
  distinct(country) %>% 
  pull() %>%
  sample(10)
ten_countries
  1. 'Mexico'
  2. 'Liberia'
  3. 'Myanmar'
  4. 'Guinea'
  5. 'Sao Tome and Principe'
  6. 'Vietnam'
  7. 'Puerto Rico'
  8. 'Algeria'
  9. 'Croatia'
  10. 'Uganda'
library(ggrepel)
gapdata %>% 
  filter(year == 2007) %>% 
  mutate(
    label = ifelse(country %in% ten_countries, as.character(country), "")
  ) %>% 
  ggplot(aes(log(gdpPercap), lifeExp))+
  geom_point(size = 3.5, alpha = 0.9, shape = 21, 
            col = "white", fill = "#0162B2")+
  geom_text_repel(aes(label = label), size = 4.5,
                 point.padding = 0.2, box.padding = 0.3,
                 force = 1, min.segment.length = 0)+
  theme_minimal(14)+
  theme(legend.position = "none",
       panel.grid.minor = element_blank())+
  labs(x = "log(GDP per capita)",
       y = "life expectancy")
../_images/421bcc18a823708aafdb21ced09ecdb4cced075b63482ef33adaa3d340c2618b.png

12 errorbar图#

geom_errorbar()

avg_gapdata <- gapdata %>% 
  group_by(continent) %>% 
  summarise(mean = mean(lifeExp), sd = sd(lifeExp)
           )
avg_gapdata
A tibble: 5 × 3
continentmeansd
<chr><dbl><dbl>
Africa 48.86533 9.150210
Americas64.65874 9.345088
Asia 60.0649011.864532
Europe 71.90369 5.433178
Oceania 74.32621 3.795611
avg_gapdata %>% 
  ggplot(aes(continent, mean, fill = continent))+
  geom_point()+
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd),
               width = 0.25)
../_images/9010299a2b104fc385504b90a85a94bb240cfccce2ece68af51f1c094a68acd4.png

13 椭圆图#

stat_ellipse(type = "norm", level = 0.95),也就是添加置信区间

gapdata %>% 
  ggplot(aes(x = log(gdpPercap), y = lifeExp))+
  geom_point()+
  stat_ellipse(type = "norm", level = 0.95)
../_images/7ba97b1fb8cd4104dbfba0584ee61592141698cc6fa25cd37de1640ae369fc0f.png

14 2D 密度图#

与一维的情形geom_density()类似, geom_density_2d(), geom_bin2d(), geom_hex()常用于刻画两个变量构成的二维区间的密度

#geom_bin2d()
gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp))+
  geom_bin2d()
../_images/3e2564426d2ab89a95057bb3cbbefc32231be9ad5946b29c3bb784557af63a0e.png
# geom_density2d()
gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp))+
  geom_density2d()
../_images/ecb9d98ed3698e80fed15cda4c5ecea36f44720444868b3a2aa01442bb994894.png

15 马赛克图#

geom_tile()geom_contour()geom_raster()常用于3个变量

gapdata %>% 
  group_by(continent, year) %>% 
  summarise(mean_lifeExp = mean(lifeExp)) %>% 
  ggplot(aes(x = year, y = continent, fill = mean_lifeExp))+
  geom_tile()+
  scale_fill_viridis_c()
`summarise()` has grouped output by 'continent'. You can override using the `.groups` argument.
../_images/0f2673c7a7d6ff75d6d77bee88f6e87d9d770556b82d19bd5fe57e9df725f830.png

事实上可以有更好的呈现方式

gapdata %>% 
  group_by(continent, year) %>% 
  summarise(mean_lifeExp = mean(lifeExp)) %>% 
  ggplot(aes(x = year, y = continent, 
             size = mean_lifeExp, color = mean_lifeExp))+
  geom_point()+
  scale_color_viridis_c()+
  theme_minimal(15)
`summarise()` has grouped output by 'continent'. You can override using the `.groups` argument.
../_images/05b1d2a00e98ea840cf05c85dd8c7ef5cec7efd08b9e1e4dadd1d491f9661dd0.png

把数值放入点中

geom_text()

gapdata %>% 
  group_by(continent, year) %>% 
  summarise(mean_lifeExp = mean(lifeExp)) %>% 
  ggplot(aes(x = year, y = continent, size = mean_lifeExp))+
  geom_point(shape = 21, color = "red", fill = "white")+
  scale_size_continuous(range = c(7, 15))+
  geom_text(aes(label = round(mean_lifeExp, 2)), size = 3, color = "black")+
  theme_minimal()+
  theme(legend.position = "none")
`summarise()` has grouped output by 'continent'. You can override using the `.groups` argument.
../_images/1b885a1a9fd2ef483208c3c0864396ec9626fdf73f28cac6aef5d1bcd0375980.png
library(tidyverse)
tbl <-
  tibble(
    x = rep(c(1, 2, 3), times = 2),
    y = 1:6,
    group = rep(c("group1", "group2"), each = 3)
  )
ggplot(tbl, aes(x, y)) + geom_line()
ggplot(tbl, aes(x, y, group = group)) + geom_line()
ggplot(tbl, aes(x, y, fill = group)) + geom_line()
ggplot(tbl, aes(x, y, color = group)) + geom_line()
../_images/035522dc8166beb1d8489e9fb0e5b786d7f823cd4077203281c5b1485a4dd666.png ../_images/29e26a6eba3cba246d79b34af62fc9a9dedc861d13e447f94e1477f100c2ad6a.png ../_images/29e26a6eba3cba246d79b34af62fc9a9dedc861d13e447f94e1477f100c2ad6a.png ../_images/8d5e4db59bfada691c41a8c3e969f5b20c576a10befd6b5f8104294c3b5834e8.png