ggplot2之几何形状

`ggplot2`之几何形状#

library(tidyverse)

── Attaching core tidyverse packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     

── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

1 图形语法#

图形语法 “grammar of graphics” (“ggplot2” 中的gg就来源于此) 使用图层(layer)去描述和构建图形，下图是ggplot2图层概念的示意图

2 图形部件#

一张统计图形就是从数据到几何形状(geometric object，缩写geom)所包含的图形属性(aesthetic attribute，缩写aes)的一种映射。

1.data: 数据框data.frame (注意，不支持向量vector和列表list类型）

2.aes: 数据框中的数据变量映射到图形属性。什么叫图形属性？就是图中点的位置、形状，大小，颜色等眼睛能看到的东西。什么叫映射？就是一种对应关系，比如数学中的函数b = f(a)就是a和b之间的一种映射关系, a的值决定或者控制了b的值，在ggplot2语法里，a就是我们输入的数据变量，b就是图形属性，这些图形属性包括：

x（x轴方向的位置）
y（y轴方向的位置）
color（点或者线等元素的颜色）
size（点或者线等元素的大小）
shape（点或者线等元素的形状）
alpha（点或者线等元素的透明度）

3.geoms: 几何形状，确定我们想画什么样的图，一个geom_***确定一种形状。更多几何形状推荐阅读这里

geom_bar()
geom_density()
geom_freqpoly()
geom_histogram()
geom_violin()
geom_boxplot()
geom_col()
geom_point()
geom_smooth()
geom_tile()
geom_density2d()
geom_bin2d()
geom_hex()
geom_count()
geom_text()
geom_sf()

4.stats: 统计变换

5.scales: 标度

6.coord: 坐标系统

7.facet: 分面

8.layer：增加图层

9.theme: 主题风格

10.save: 保存图片

开始#

R语言数据类型，有字符串型、数值型、因子型、逻辑型、日期型等。 ggplot2会将字符串型、因子型、逻辑型默认为离散变量，而数值型默认为连续变量，将日期时间为日期变量：

离散变量: 字符串型, 因子型, 逻辑型
连续变量: 双精度数值, 整数数值
日期变量: 日期, 时间, 日期时间

我们在呈现数据的时候，可能会同时用到多种类型的数据，比如

一个离散
一个连续
两个离散
两个连续
一个离散, 一个连续
三个连续

1 导入数据#

后续用到的所有数据均可在https://github.com/Crazzy-Rabbit/R_for_Data_Science/tree/master/demo_data下载

gapdata <- read_csv("./demo_data/gapminder.csv")

Rows: 1704 Columns: 6

── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (2): country, continent
dbl (4): year, lifeExp, pop, gdpPercap

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

gapdata %>% head()

A tibble: 6 × 6
country	continent	year	lifeExp	pop	gdpPercap
<chr>	<chr>	<dbl>	<dbl>	<dbl>	<dbl>
Afghanistan	Asia	1952	28.801	8425333	779.4453
Afghanistan	Asia	1957	30.332	9240934	820.8530
Afghanistan	Asia	1962	31.997	10267083	853.1007
Afghanistan	Asia	1967	34.020	11537966	836.1971
Afghanistan	Asia	1972	36.088	13079460	739.9811
Afghanistan	Asia	1977	38.438	14880372	786.1134

2 检查数据#

是否有缺失值

gapdata %>% 
  summarise(
    across(everything(), ~sum(is.na(.)))
  )

A tibble: 1 × 6
country	continent	year	lifeExp	pop	gdpPercap
<int>	<int>	<int>	<int>	<int>	<int>
0	0	0	0	0	0

基本绘图#

1 柱状图#

常用于一个离散变量

geom_bar()自动完成对相应变量的count

gapdata %>% 
  ggplot(aes(x = continent)) +
  geom_bar()

../_images/8a13a1994c1ed94539cb0ed20f7dc5bc61fe5cbd2c567014294ba8b0da9108dd.png

gapdata %>% 
  ggplot(aes(x = reorder(continent, continent, length))) +
  geom_bar()

../_images/ef24426b25231e63e1bca3e33ba3b632715cf5d24dfee8074343923ab597f38a.png

gapdata %>% 
  ggplot(aes(x = reorder(continent, continent, length))) +
  geom_bar() +
  coord_flip()

../_images/33ada791cf03602b62bda87fa69d401355b078d18c0f0414aac3600238702aec.png

# geom_bar vs stat_count
library(patchwork)
p = gapdata %>% 
  ggplot(aes(x = continent)) + 
  stat_count()

p1 = gapdata %>% 
  ggplot(aes(x = continent)) +
  geom_bar()

p / p1

../_images/8ee0feca6c02624fb26f2f79282fc06ed84437ea0a2c1e65088d081f7de1e034.png

gapdata %>% count(continent)

A spec_tbl_df: 5 × 2
continent	n
<chr>	<int>
Africa	624
Americas	300
Asia	396
Europe	360
Oceania	24

geom_bar() 自动完成了对对应行的count这个统计

gapdata %>% 
  distinct(continent, country) %>% 
  ggplot(aes(x = continent)) +
  geom_bar()

../_images/a15a4a76ebc87b3a49a4083c0cb76cd54d42fe6ab07f5e74ec24dd1cddcaa794.png

可先进行统计，再画图，不过显然直接用geom_bar()代码更少

gapdata %>% 
  distinct(continent, country) %>% 
  group_by(continent) %>% 
  summarise(num = n()) %>% 
  ggplot(aes(x = continent, y = num)) +
  geom_col()

../_images/d43e9d9bb89cc1ab29602ca6feb83e782fcd0c8fcf1705f50e2fada35074b6a4.png

2 直方图#

常用于一个连续变量

geom_histograms(), 默认使用 position = "stack"

gapdata %>% 
  ggplot(aes(x = lifeExp)) +
  geom_histogram() # corresponding to stat_bin()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

../_images/e04b8785bdec558ece4c00785f139f7402a4653203633e48eabd81b2cd998e7a.png

gapdata %>% 
  ggplot(aes(x = lifeExp)) +
  geom_histogram(binwidth = 1)

../_images/260e89838b988f34a5e66703dd4f50ea9daa3de515cf598c7e038857204e739f.png

geom_histograms(), 默认使用 position = "stack"

gapdata %>% 
  ggplot(aes(x = lifeExp, fill = continent)) +
  geom_histogram()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

../_images/056627adb39c5cabe915e2d0c58c6c8c12deb07d0e4b870ba1149e006f2dc1e2.png

也可以指定 position = "identity"

参数的含义是指直方图的条形应当以其实际计数（频数）堆叠在一起，而不进行任何调整

gapdata %>% 
  ggplot(aes(x = lifeExp, fill = continent)) + 
  geom_histogram(position = "identity")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

../_images/1f5f4ba88c4f04c50127e015e39c3b70c32b20643b1e1837eed798e6b5b6e929.png

3 频次图#

geom_freqpoly()

gapdata %>% 
  ggplot(aes(x = lifeExp, color = continent)) +
  geom_freqpoly()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

../_images/548124953d5f87549fa5e67ff9065493654ba154d51cff8a8c79e809a097a321.png

4 密度图#

geom_density()

geom_density() 中adjust 用于调节bandwidth, adjust = 1/2 means use half of the default bandwidth.

geom_line(stat = "density")

#' smooth histogram = density plot
gapdata %>% 
  ggplot(aes(x = lifeExp)) +
  geom_density()

../_images/097956d47c2e885e7b1b91c643ba84dc8d285a8711e52402d100440bfccc24f7.png

gapdata %>% 
  ggplot(aes(x = lifeExp)) +
  geom_line(stat = "density")

../_images/1516d5d74260182608e252b71f7a7cba15a592c760c28ae0ed42708fd75c9821.png

adjust 用于调节bandwidth, adjust = 1/2means use half of the default bandwidth.

gapdata %>% 
  ggplot(aes(x = lifeExp)) +
  geom_density(adjust = 0.2)

../_images/91e0a7265cf71a09269c04bd55fb2c77286ea2751178f6c32d5d1becde5de8d1.png

gapdata %>% 
  ggplot(aes(x = lifeExp, color = continent)) +
  geom_density()

../_images/81b7e282edd40c13d52e3daeb19a44c4085bca8206f67615d3a7edcdc8759530.png

gapdata %>% 
  ggplot(aes(x = lifeExp, fill = continent)) +
  geom_density(alpha = 0.2)

../_images/c28640767a5cbddf07f8a5e68d20949f9cbfcaf4947c52b87aa03c5b6b2cd436.png

gapdata %>% 
  filter(continent != "Oceania") %>% 
  ggplot(aes(x = lifeExp, fill = continent)) +
  geom_density(alpha = 0.2)

../_images/8c521b85585687b231c384a7c5339ada1181910efc9f42bfcff4079bc7ce58c0.png

直方图和密度图画在一起。注意y = stat(density)表示y是由x新生成的变量，这是一种固定写法，类似的还有stat(count), stat(level)

gapdata %>% 
  filter(continent != "Oceania") %>% 
  ggplot(aes(x = lifeExp, y = stat(density))) +
  geom_histogram(aes(fill = continent)) +
  geom_density()

Warning message:
“`stat(density)` was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.”

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

../_images/6474e799f83bede3f3376c1f20e11d4f409a67f352b103618f3b61da114ed2c2.png

5 箱线图#

一个离散变量 + 一个连续变量

gapdata %>% 
  ggplot(aes(x = year, y = lifeExp)) +
  geom_boxplot()

Warning message:
“Continuous x aesthetic
ℹ did you forget `aes(group = ...)`?”

../_images/dbd954bd3cb8c62478038f187893d3aab236a20515fb5805226f285514d9d4fd.png

数据框中的year变量是数值型，需要先转换成因子型，弄成离散型变量

gapdata %>% 
  ggplot(aes(x = as.factor(year), y = lifeExp)) +
  geom_boxplot()

../_images/9318e6662ac947f0b2aa7b4763356579f4c71a7934be9fbac1bc4fde6c057ff5.png

当然，也可以用group明确指定分组变量

gapdata %>% 
  ggplot(aes(x = year, y = lifeExp)) +
  geom_boxplot(aes(group = year))

../_images/bfe5a871d0d1c971c2d0adaa673273815190b152c4adb579df0c11e037b78a08.png

小提琴图+散点+光滑曲线

gapdata %>% 
  ggplot(aes(x = year, y = lifeExp))+
  geom_violin(aes(group = year))+
  geom_jitter(alpha = 0.25)+
  geom_smooth(se = TRUE)

`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

../_images/3abe65e2089e8930f126eca51932699260b62e70baab874c96dc2df08b5e8e8a.png

6 抖动散点图#

点重叠的处理方案

geom_jitter()

gapdata %>% 
  ggplot(aes(x = continent, y = lifeExp)) +
  geom_point()

../_images/c097bcd6c442fdb0e3db98afa8aecad1c01219cda89d6f58edbbd458d6281c0c.png

gapdata %>% 
  ggplot(aes(x = continent, y = lifeExp))+
  geom_jitter()

../_images/6feba60266ac97245b3a9d7a38ef40591fd57ecddcd3d2895d82d487f0c13d69.png

gapdata %>% 
  ggplot(aes(x = continent, y = lifeExp)) +
  geom_boxplot()

../_images/2877ae50a9f9c82f2609b7982d74712d4dc87d82a003b93fb6c2700c3a66e84d.png

gapdata %>% 
  ggplot(aes(x = continent, y = lifeExp))+
  geom_boxplot()+
  geom_jitter(alpha = 0.25)

../_images/e65df2094359c23775918917dc851dd7587c3324afd3e9303049b075762eb3e9.png

gapdata %>% 
  ggplot(aes(x = continent, y = lifeExp))+
  geom_jitter()+
  stat_summary(fun.y = median, colour = "red", geom = "point", size = 5)

Warning message:
“The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
ℹ Please use the `fun` argument instead.”

../_images/03b60e020f254ad2b8a281b1283931b0e1799bde52c41b461de5141a04eb1040.png

gapdata %>%
  ggplot(aes(reorder(x = continent, lifeExp), y = lifeExp)) +
  geom_jitter() +
  stat_summary(fun.y = median, colour = "red", geom = "point", size = 5)

../_images/dc9e42ad516ca3fd78309b36bdd77ef8f636ed7ff165ba0f5ea276e1cb7b187a.png

注意到我们已经提到过 stat_count / stat_bin / stat_summary

gapdata %>% 
  ggplot(aes(x = continent, y = lifeExp))+
  geom_violin(trim = FALSE, alpha = 0.5) +
  stat_summary(fun.y = mean,
    fun.ymax = function(x){mean(x) + sd(x)},
    fun.ymin = function(x){mean(x) - sd(x)},
    geom = "pointrange")

Warning message:
“The `fun.ymin` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
ℹ Please use the `fun.min` argument instead.”

Warning message:
“The `fun.ymax` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
ℹ Please use the `fun.max` argument instead.”

../_images/e7deb3e8cc1b5108b8e75c774c109dd9ed4f0ef405de65f874e8a614fc1c8c2a.png

gapdata %>% 
  ggplot(aes(x = continent, y = lifeExp))+
  geom_violin(trim = FALSE, alpha = 0.5) +
  stat_summary(fun.y = mean,
    fun.ymax = ~mean(.x) + sd(.x),
    fun.ymin = ~mean(.x) - sd(.x),
    geom = "pointrange")

7 山峦图#

常用于一个离散变量 + 一个连续变量

ggridges::geom_density_ridges()

gapdata %>% 
  ggplot(aes(x = lifeExp, y = continent, 
             fill = continent))+
  ggridges::geom_density_ridges()

Picking joint bandwidth of 2.23

../_images/f51b7fc195b4608035bf4dbc59020ee98e18611eccd409c1006c84f1b7dd374f.png

gapdata %>% 
  ggplot(aes(x = lifeExp, y = continent,
            fill = continent))+
  ggridges::geom_density_ridges()+
  scale_fill_manual(
    values = c("#003f5c", "#58508d", "#bc5090", "#ff6361", "#ffa600"))

Picking joint bandwidth of 2.23

../_images/fc4546c7f4e46772c60e73a17642aa43cc861989338965aa53c823484b6d6f0c.png

# colorspace 调色板
gapdata %>% 
  ggplot(aes(x = lifeExp, y = continent, 
             fill = continent))+
  ggridges::geom_density_ridges()+
  scale_fill_manual(
    values = colorspace::sequential_hcl(5, palette = "Peach"))

Picking joint bandwidth of 2.23

../_images/b1d9607a1852c273b9dd5b58340e462966ddf65b028b98a38a7ad7aca182fab2.png

散点图#

常用于两个连续变量

geom_point()

gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp))+
  geom_point()

../_images/573e425c91b07199b0abe39316acd05ef6378cf6424a70eb1340e677d878a31b.png

更好的 log 转化方式

scale_x_log10()
scale_y_log10()

# 一般
gapdata %>% 
  ggplot(aes(x = log(gdpPercap), y = lifeExp))+
  geom_point()

../_images/87bf1cdff586644bd96f444966c4cb62f37065afd7225d2b45e998a99b5ceed7.png

# 更好方式
gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp)) +
  geom_point()+
  scale_x_log10()

../_images/057ceb5dfaee4c7e6237f5dd4be1a1e8bd545c6f1ddd6c3d5e7dbe41446a1dd7.png

着色方式

Error in eval(expr, envir, enclos): 找不到对象'着色方式'
Traceback:

gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp))+
  geom_point(aes(color = continent))

gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp, 
             color = continent))+
  geom_point()

../_images/88924114b773fd8cf5ebdf9e812611c9605b9ec23aaebaf4aa5b73c7d076b97d.png

gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp))+
  geom_point(alpha = (1/3), size = 2)

../_images/947483e17d4ba8127f3c63991ffeda826978c187faf8d2b2108de0b8f10eadb9.png

gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp))+
  geom_point(alpha = 0.3)+
  geom_smooth()

`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

../_images/a460500b598568abe3ac593f8e5e7a94f04c5f0248985dc207f5fa8891f1d89f.png

gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp))+
  geom_point()+
  geom_smooth(lwd = 3, se = FALSE)

`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

../_images/96d1da28d714cad17cdb549c4594c365270ab545a0446551a435059aaa1d9b35.png

gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp))+
  geom_point()+
  geom_smooth(lwd = 3, se = FALSE, method = "lm")

`geom_smooth()` using formula = 'y ~ x'

../_images/ba4edb396d45482fc5995d065cf4d8cc64794fbf95233f5fddf7bf1f74fbd224.png

gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp, 
             color = continent))+
  geom_point()+
  geom_smooth(lwd = 3, se = FALSE, method = "lm")

`geom_smooth()` using formula = 'y ~ x'

../_images/42c8a6966f2692be510f2facb88fb09eb0a4173b6f5ff219c7032940c63a6b70.png

gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp, 
             color = continent))+
  geom_point(alpha = 0.3)+
  geom_smooth(lwd = 1, color = "blue", se = TRUE, method = "lm")

`geom_smooth()` using formula = 'y ~ x'

../_images/72d41e02603ebca0046772311ff57fc7f619c622ac42e28a01b44a4a03f08143.png

jCountries <- c("Canada", "Rwanda", "Cambodia", "Mexico")

gapdata %>% 
  filter(country %in% jCountries) %>% 
  ggplot(aes(x = year, y = lifeExp, color = country))+
  geom_line()+
  geom_point()

../_images/56c9a072b7d09de585320f2e5bed387a4dd8f5bbf00b5ccf870cfb96270698c3.png

可以看到，图例的顺序和图中的顺序不太一致，

在设置color的时候可以对continent进行reorder

gapdata %>% 
  filter(country %in% jCountries) %>% 
  ggplot(aes(x = year, y = lifeExp, 
             color = reorder(country, -1 * lifeExp, max)
            ))+
  geom_line()+
  geom_point()

../_images/dcd3fd59aab0d6dadcaa270bfd97ef2b8aeace43bb8a25dae7176ef620828845.png

当然还有如下方式

利用if_else函数增加一列，并直接用geom_label(aes(label = end_label))讲其加入图中max那个点

gapdata %>% 
  filter(country %in% jCountries) %>% 
  group_by(country) %>% 
  mutate(end_label = if_else(year == max(year), country, NA_character_)) %>% 
  ggplot(aes(x = year, y = lifeExp, 
            color = country))+
  geom_line()+
  geom_point()+
  geom_label(aes(label = end_label))+
  theme(legend.position = "none")

Warning message:
“Removed 44 rows containing missing values or values outside the scale range (`geom_label()`).”

../_images/898ec368f5bd1cbe4fa2ba4f90a8529d9f4e0ef2eb86cf32a607a1aa5929a392.png

如果觉得麻烦，可以用gghighlight宏包

# install.packages("gghighlight")
library(gghighlight)
gapdata %>% 
  filter(country %in% jCountries) %>% 
  ggplot(aes(x = year, y = lifeExp,
             color = country))+
  geom_line()+
  geom_point()+
  gghighlight::gghighlight()

label_key: country

../_images/d0db514bd4e5111505f49d1c2d4c9e8f21c4813811cb11a94864199301f3471f.png

9 点线图#

geom_point() + geom_segment()

# 点图
gapdata %>% 
  filter(continent == "Asia" & year == 2007) %>% 
  ggplot(aes(x = lifeExp, y = country))+
  geom_point()

../_images/9876f19cc7f8eae976b3ebf1038ad66fc498667fb22651a47c43d91286764ed7.png

# 点线图
gapdata %>% 
  filter(continent == "Asia" & year == 2007) %>% 
  ggplot(aes(x = lifeExp, y = reorder(country, lifeExp),
             ))+
  geom_point(color = "blue", size = 2)+
  geom_segment(aes(x = 40, xend = lifeExp, 
                   y=reorder(country,lifeExp),yend=reorder(country,lifeExp)),
                   color = "lightgrey")+
  labs(x = "Life Expectancy (years)", y = "",
      title = "Life Expectancy by Country",
      subtitle = "GapMinder data for Asia - 2007")+
  theme_minimal()+
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())

../_images/3d615c23e5c96f8d0fddadea8595ccc9296a03fb8b41d6c908f87758d1daf85a.png

10 分面#

分面有两个 - facet_grid() - facet_wrap()

11 文本标注#

ggforce::geom_mark_ellipse()

ggrepel::geom_text_repel()

gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp))+
  geom_point()+
  ggforce::geom_mark_ellipse(aes(
    filter = gdpPercap > 70000,
    label = "Rich country",
    description = "What country are they?"
  ))

../_images/200adc3c89eebe2a4c998ef0742b7c64566370fe03d92771508d515229167a52.png

ten_countries <- gapdata %>% 
  distinct(country) %>% 
  pull() %>%
  sample(10)
ten_countries

'Mexico'
'Liberia'
'Myanmar'
'Guinea'
'Sao Tome and Principe'
'Vietnam'
'Puerto Rico'
'Algeria'
'Croatia'
'Uganda'

library(ggrepel)
gapdata %>% 
  filter(year == 2007) %>% 
  mutate(
    label = ifelse(country %in% ten_countries, as.character(country), "")
  ) %>% 
  ggplot(aes(log(gdpPercap), lifeExp))+
  geom_point(size = 3.5, alpha = 0.9, shape = 21, 
            col = "white", fill = "#0162B2")+
  geom_text_repel(aes(label = label), size = 4.5,
                 point.padding = 0.2, box.padding = 0.3,
                 force = 1, min.segment.length = 0)+
  theme_minimal(14)+
  theme(legend.position = "none",
       panel.grid.minor = element_blank())+
  labs(x = "log(GDP per capita)",
       y = "life expectancy")

../_images/421bcc18a823708aafdb21ced09ecdb4cced075b63482ef33adaa3d340c2618b.png

12 errorbar图#

geom_errorbar()

avg_gapdata <- gapdata %>% 
  group_by(continent) %>% 
  summarise(mean = mean(lifeExp), sd = sd(lifeExp)
           )
avg_gapdata

A tibble: 5 × 3
continent	mean	sd
<chr>	<dbl>	<dbl>
Africa	48.86533	9.150210
Americas	64.65874	9.345088
Asia	60.06490	11.864532
Europe	71.90369	5.433178
Oceania	74.32621	3.795611

avg_gapdata %>% 
  ggplot(aes(continent, mean, fill = continent))+
  geom_point()+
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd),
               width = 0.25)

../_images/9010299a2b104fc385504b90a85a94bb240cfccce2ece68af51f1c094a68acd4.png

13 椭圆图#

stat_ellipse(type = "norm", level = 0.95),也就是添加置信区间

gapdata %>% 
  ggplot(aes(x = log(gdpPercap), y = lifeExp))+
  geom_point()+
  stat_ellipse(type = "norm", level = 0.95)

../_images/7ba97b1fb8cd4104dbfba0584ee61592141698cc6fa25cd37de1640ae369fc0f.png

14 2D 密度图#

与一维的情形geom_density()类似， geom_density_2d(), geom_bin2d(), geom_hex()常用于刻画两个变量构成的二维区间的密度

#geom_bin2d()
gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp))+
  geom_bin2d()

../_images/3e2564426d2ab89a95057bb3cbbefc32231be9ad5946b29c3bb784557af63a0e.png

# geom_density2d()
gapdata %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp))+
  geom_density2d()

../_images/ecb9d98ed3698e80fed15cda4c5ecea36f44720444868b3a2aa01442bb994894.png

15 马赛克图#

geom_tile()， geom_contour()， geom_raster()常用于3个变量

gapdata %>% 
  group_by(continent, year) %>% 
  summarise(mean_lifeExp = mean(lifeExp)) %>% 
  ggplot(aes(x = year, y = continent, fill = mean_lifeExp))+
  geom_tile()+
  scale_fill_viridis_c()

`summarise()` has grouped output by 'continent'. You can override using the `.groups` argument.

../_images/0f2673c7a7d6ff75d6d77bee88f6e87d9d770556b82d19bd5fe57e9df725f830.png

事实上可以有更好的呈现方式

gapdata %>% 
  group_by(continent, year) %>% 
  summarise(mean_lifeExp = mean(lifeExp)) %>% 
  ggplot(aes(x = year, y = continent, 
             size = mean_lifeExp, color = mean_lifeExp))+
  geom_point()+
  scale_color_viridis_c()+
  theme_minimal(15)

`summarise()` has grouped output by 'continent'. You can override using the `.groups` argument.

../_images/05b1d2a00e98ea840cf05c85dd8c7ef5cec7efd08b9e1e4dadd1d491f9661dd0.png

把数值放入点中

geom_text()

gapdata %>% 
  group_by(continent, year) %>% 
  summarise(mean_lifeExp = mean(lifeExp)) %>% 
  ggplot(aes(x = year, y = continent, size = mean_lifeExp))+
  geom_point(shape = 21, color = "red", fill = "white")+
  scale_size_continuous(range = c(7, 15))+
  geom_text(aes(label = round(mean_lifeExp, 2)), size = 3, color = "black")+
  theme_minimal()+
  theme(legend.position = "none")

`summarise()` has grouped output by 'continent'. You can override using the `.groups` argument.

../_images/1b885a1a9fd2ef483208c3c0864396ec9626fdf73f28cac6aef5d1bcd0375980.png

library(tidyverse)
tbl <-
  tibble(
    x = rep(c(1, 2, 3), times = 2),
    y = 1:6,
    group = rep(c("group1", "group2"), each = 3)
  )
ggplot(tbl, aes(x, y)) + geom_line()
ggplot(tbl, aes(x, y, group = group)) + geom_line()
ggplot(tbl, aes(x, y, fill = group)) + geom_line()
ggplot(tbl, aes(x, y, color = group)) + geom_line()

../_images/035522dc8166beb1d8489e9fb0e5b786d7f823cd4077203281c5b1485a4dd666.png

../_images/29e26a6eba3cba246d79b34af62fc9a9dedc861d13e447f94e1477f100c2ad6a.png

../_images/8d5e4db59bfada691c41a8c3e969f5b20c576a10befd6b5f8104294c3b5834e8.png

ggplot2之几何形状

Contents

`ggplot2`之几何形状#

1 图形语法#

2 图形部件#

开始#

1 导入数据#

2 检查数据#

基本绘图#

1 柱状图#

2 直方图#

3 频次图#

4 密度图#

5 箱线图#

6 抖动散点图#

7 山峦图#

散点图#

9 点线图#

10 分面#

1 `facet_grid()`#

2 `facet_wrap()`#

11 文本标注#

12 errorbar图#

13 椭圆图#

14 2D 密度图#

15 马赛克图#

ggplot2之几何形状

Contents

ggplot2之几何形状#

1 图形语法#

2 图形部件#

开始#

1 导入数据#

2 检查数据#

基本绘图#

1 柱状图#

2 直方图#

3 频次图#

4 密度图#

5 箱线图#

6 抖动散点图#

7 山峦图#

散点图#

9 点线图#

10 分面#

1 facet_grid()#

2 facet_wrap()#

11 文本标注#

12 errorbar图#

13 椭圆图#

14 2D 密度图#

15 马赛克图#

`ggplot2`之几何形状#

1 `facet_grid()`#

2 `facet_wrap()`#