ggplot2
之几何形状#
library(tidyverse)
── Attaching core tidyverse packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.0 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
1 图形语法#
图形语法 “grammar of graphics
” (“ggplot2
” 中的gg
就来源于此) 使用图层(layer
)去描述和构建图形,下图是ggplot2图层概念的示意图
2 图形部件#
一张统计图形就是从数据到几何形状(geometric object
,缩写geom
)所包含的图形属性(aesthetic attribute
,缩写aes
)的一种映射。
1.data
: 数据框data.frame
(注意,不支持向量vector
和列表list
类型)
2.aes
: 数据框中的数据变量映射到图形属性。什么叫图形属性?就是图中点的位置、形状,大小,颜色等眼睛能看到的东西。什么叫映射?就是一种对应关系,比如数学中的函数b = f(a)
就是a
和b
之间的一种映射关系, a
的值决定或者控制了b
的值,在ggplot2
语法里,a
就是我们输入的数据变量,b
就是图形属性, 这些图形属性包括:
x(x轴方向的位置)
y(y轴方向的位置)
color(点或者线等元素的颜色)
size(点或者线等元素的大小)
shape(点或者线等元素的形状)
alpha(点或者线等元素的透明度)
3.geoms
: 几何形状,确定我们想画什么样的图,一个geom_***
确定一种形状。更多几何形状推荐阅读这里
geom_bar()
geom_density()
geom_freqpoly()
geom_histogram()
geom_violin()
geom_boxplot()
geom_col()
geom_point()
geom_smooth()
geom_tile()
geom_density2d()
geom_bin2d()
geom_hex()
geom_count()
geom_text()
geom_sf()
4.stats
: 统计变换
5.scales
: 标度
6.coord
: 坐标系统
7.facet
: 分面
8.layer
: 增加图层
9.theme
: 主题风格
10.save
: 保存图片
开始#
R语言数据类型,有字符串型、数值型、因子型、逻辑型、日期型等。 ggplot2
会将字符串型、因子型、逻辑型默认为离散变量,而数值型默认为连续变量,将日期时间为日期变量:
离散变量: 字符串型, 因子型, 逻辑型
连续变量: 双精度数值, 整数数值
日期变量: 日期, 时间, 日期时间
我们在呈现数据的时候,可能会同时用到多种类型的数据,比如
一个离散
一个连续
两个离散
两个连续
一个离散, 一个连续
三个连续
1 导入数据#
后续用到的所有数据均可在https://github.com/Crazzy-Rabbit/R_for_Data_Science/tree/master/demo_data
下载
gapdata <- read_csv("./demo_data/gapminder.csv")
Rows: 1704 Columns: 6
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (2): country, continent
dbl (4): year, lifeExp, pop, gdpPercap
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
gapdata %>% head()
country | continent | year | lifeExp | pop | gdpPercap |
---|---|---|---|---|---|
<chr> | <chr> | <dbl> | <dbl> | <dbl> | <dbl> |
Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.4453 |
Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.8530 |
Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.1007 |
Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.1971 |
Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.9811 |
Afghanistan | Asia | 1977 | 38.438 | 14880372 | 786.1134 |
2 检查数据#
是否有缺失值
gapdata %>%
summarise(
across(everything(), ~sum(is.na(.)))
)
country | continent | year | lifeExp | pop | gdpPercap |
---|---|---|---|---|---|
<int> | <int> | <int> | <int> | <int> | <int> |
0 | 0 | 0 | 0 | 0 | 0 |
基本绘图#
1 柱状图#
常用于一个离散变量
geom_bar()
自动完成对相应变量的count
gapdata %>%
ggplot(aes(x = continent)) +
geom_bar()
data:image/s3,"s3://crabby-images/3a691/3a6910d5640997a1b92a07086400d9d6e1d349aa" alt="../_images/8a13a1994c1ed94539cb0ed20f7dc5bc61fe5cbd2c567014294ba8b0da9108dd.png"
gapdata %>%
ggplot(aes(x = reorder(continent, continent, length))) +
geom_bar()
data:image/s3,"s3://crabby-images/cac5c/cac5cd4548db7c42717583c0d20f21af1316c0d0" alt="../_images/ef24426b25231e63e1bca3e33ba3b632715cf5d24dfee8074343923ab597f38a.png"
gapdata %>%
ggplot(aes(x = reorder(continent, continent, length))) +
geom_bar() +
coord_flip()
data:image/s3,"s3://crabby-images/acc29/acc293cce9d12cb293f2b16f34e8877239519569" alt="../_images/33ada791cf03602b62bda87fa69d401355b078d18c0f0414aac3600238702aec.png"
# geom_bar vs stat_count
library(patchwork)
p = gapdata %>%
ggplot(aes(x = continent)) +
stat_count()
p1 = gapdata %>%
ggplot(aes(x = continent)) +
geom_bar()
p / p1
data:image/s3,"s3://crabby-images/8d7cf/8d7cf46bb987fa35082e2d97bf96d94530149784" alt="../_images/8ee0feca6c02624fb26f2f79282fc06ed84437ea0a2c1e65088d081f7de1e034.png"
gapdata %>% count(continent)
continent | n |
---|---|
<chr> | <int> |
Africa | 624 |
Americas | 300 |
Asia | 396 |
Europe | 360 |
Oceania | 24 |
geom_bar()
自动完成了对对应行的count
这个统计
gapdata %>%
distinct(continent, country) %>%
ggplot(aes(x = continent)) +
geom_bar()
data:image/s3,"s3://crabby-images/dbd61/dbd6165a64425e8cf130f9e6c11c3c634584b109" alt="../_images/a15a4a76ebc87b3a49a4083c0cb76cd54d42fe6ab07f5e74ec24dd1cddcaa794.png"
可先进行统计,再画图,不过显然直接用geom_bar()
代码更少
gapdata %>%
distinct(continent, country) %>%
group_by(continent) %>%
summarise(num = n()) %>%
ggplot(aes(x = continent, y = num)) +
geom_col()
data:image/s3,"s3://crabby-images/c8ebc/c8ebcb52a1c0bec5c2860508cebaad53326adfa0" alt="../_images/d43e9d9bb89cc1ab29602ca6feb83e782fcd0c8fcf1705f50e2fada35074b6a4.png"
2 直方图#
常用于一个连续变量
geom_histograms()
, 默认使用 position = "stack"
gapdata %>%
ggplot(aes(x = lifeExp)) +
geom_histogram() # corresponding to stat_bin()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
data:image/s3,"s3://crabby-images/fdfd9/fdfd9cf1d588927afb0586a4e0385f4b502e01d2" alt="../_images/e04b8785bdec558ece4c00785f139f7402a4653203633e48eabd81b2cd998e7a.png"
gapdata %>%
ggplot(aes(x = lifeExp)) +
geom_histogram(binwidth = 1)
data:image/s3,"s3://crabby-images/cf1c2/cf1c22b9e1fd8fd33b28fa5fb57c14597fc8f3f0" alt="../_images/260e89838b988f34a5e66703dd4f50ea9daa3de515cf598c7e038857204e739f.png"
geom_histograms()
, 默认使用 position = "stack"
gapdata %>%
ggplot(aes(x = lifeExp, fill = continent)) +
geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
data:image/s3,"s3://crabby-images/b19a2/b19a24bdb8d4f5065e354ef9ff7eec66a80ef289" alt="../_images/056627adb39c5cabe915e2d0c58c6c8c12deb07d0e4b870ba1149e006f2dc1e2.png"
也可以指定 position = "identity"
参数的含义是指直方图的条形应当以其实际计数(频数)堆叠在一起,而不进行任何调整
gapdata %>%
ggplot(aes(x = lifeExp, fill = continent)) +
geom_histogram(position = "identity")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
data:image/s3,"s3://crabby-images/cf5fc/cf5fc8e62fba9aec2b9318287e43fc3d277e898f" alt="../_images/1f5f4ba88c4f04c50127e015e39c3b70c32b20643b1e1837eed798e6b5b6e929.png"
3 频次图#
geom_freqpoly()
gapdata %>%
ggplot(aes(x = lifeExp, color = continent)) +
geom_freqpoly()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
data:image/s3,"s3://crabby-images/6e762/6e762b5b499d7269eb9136fb4bf9aba0b1f9cd47" alt="../_images/548124953d5f87549fa5e67ff9065493654ba154d51cff8a8c79e809a097a321.png"
4 密度图#
geom_density()
geom_density()
中adjust
用于调节bandwidth
,adjust = 1/2
means use half of the default bandwidth.
geom_line(stat = "density")
#' smooth histogram = density plot
gapdata %>%
ggplot(aes(x = lifeExp)) +
geom_density()
data:image/s3,"s3://crabby-images/55ac4/55ac4fdf8849f252cdd8d563bafe8ce760b5c1d9" alt="../_images/097956d47c2e885e7b1b91c643ba84dc8d285a8711e52402d100440bfccc24f7.png"
gapdata %>%
ggplot(aes(x = lifeExp)) +
geom_line(stat = "density")
data:image/s3,"s3://crabby-images/7fa38/7fa38ef3fdcf8f6933014e996e03b8b453f9aa2a" alt="../_images/1516d5d74260182608e252b71f7a7cba15a592c760c28ae0ed42708fd75c9821.png"
adjust
用于调节bandwidth
, adjust = 1/2
means use half of the default bandwidth.
gapdata %>%
ggplot(aes(x = lifeExp)) +
geom_density(adjust = 0.2)
data:image/s3,"s3://crabby-images/72e5b/72e5be1d98381d01633f5eb7d8bd982c3a0f8d15" alt="../_images/91e0a7265cf71a09269c04bd55fb2c77286ea2751178f6c32d5d1becde5de8d1.png"
gapdata %>%
ggplot(aes(x = lifeExp, color = continent)) +
geom_density()
data:image/s3,"s3://crabby-images/c8b9a/c8b9a82a0da7e5199ad1c40f54c2cb9914924691" alt="../_images/81b7e282edd40c13d52e3daeb19a44c4085bca8206f67615d3a7edcdc8759530.png"
gapdata %>%
ggplot(aes(x = lifeExp, fill = continent)) +
geom_density(alpha = 0.2)
data:image/s3,"s3://crabby-images/a9c95/a9c95ca3203c9397ae9e958de0f66e825d464134" alt="../_images/c28640767a5cbddf07f8a5e68d20949f9cbfcaf4947c52b87aa03c5b6b2cd436.png"
gapdata %>%
filter(continent != "Oceania") %>%
ggplot(aes(x = lifeExp, fill = continent)) +
geom_density(alpha = 0.2)
data:image/s3,"s3://crabby-images/24053/240535620b6c08a9811ceedda2358ffdf331d0a7" alt="../_images/8c521b85585687b231c384a7c5339ada1181910efc9f42bfcff4079bc7ce58c0.png"
直方图和密度图画在一起。注意y = stat(density)
表示y
是由x
新生成的变量,这是一种固定写法,类似的还有stat(count)
, stat(level)
gapdata %>%
filter(continent != "Oceania") %>%
ggplot(aes(x = lifeExp, y = stat(density))) +
geom_histogram(aes(fill = continent)) +
geom_density()
Warning message:
“`stat(density)` was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.”
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
data:image/s3,"s3://crabby-images/e897f/e897fd08494ad7f39e8538e0774d698e419b4600" alt="../_images/6474e799f83bede3f3376c1f20e11d4f409a67f352b103618f3b61da114ed2c2.png"
5 箱线图#
一个离散变量 + 一个连续变量
gapdata %>%
ggplot(aes(x = year, y = lifeExp)) +
geom_boxplot()
Warning message:
“Continuous x aesthetic
ℹ did you forget `aes(group = ...)`?”
data:image/s3,"s3://crabby-images/de41b/de41b8252332e2afecd0c4ea6fc09efcb921d9b2" alt="../_images/dbd954bd3cb8c62478038f187893d3aab236a20515fb5805226f285514d9d4fd.png"
数据框中的year
变量是数值型,需要先转换成因子型,弄成离散型变量
gapdata %>%
ggplot(aes(x = as.factor(year), y = lifeExp)) +
geom_boxplot()
data:image/s3,"s3://crabby-images/37cb9/37cb9aa2692282884ee5bfc6be59cffec1ad4aba" alt="../_images/9318e6662ac947f0b2aa7b4763356579f4c71a7934be9fbac1bc4fde6c057ff5.png"
当然,也可以用group
明确指定分组变量
gapdata %>%
ggplot(aes(x = year, y = lifeExp)) +
geom_boxplot(aes(group = year))
data:image/s3,"s3://crabby-images/b30b6/b30b66154289bdcdad033abc616e87204996e2ed" alt="../_images/bfe5a871d0d1c971c2d0adaa673273815190b152c4adb579df0c11e037b78a08.png"
小提琴图+散点+光滑曲线
gapdata %>%
ggplot(aes(x = year, y = lifeExp))+
geom_violin(aes(group = year))+
geom_jitter(alpha = 0.25)+
geom_smooth(se = TRUE)
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
data:image/s3,"s3://crabby-images/ebb23/ebb23763cdd0c9bc8a5fb73be11e488be4c6824b" alt="../_images/3abe65e2089e8930f126eca51932699260b62e70baab874c96dc2df08b5e8e8a.png"
6 抖动散点图#
点重叠的处理方案
geom_jitter()
gapdata %>%
ggplot(aes(x = continent, y = lifeExp)) +
geom_point()
data:image/s3,"s3://crabby-images/a38fe/a38fe68a2fc0fe317ccac9eda88d197634716a41" alt="../_images/c097bcd6c442fdb0e3db98afa8aecad1c01219cda89d6f58edbbd458d6281c0c.png"
gapdata %>%
ggplot(aes(x = continent, y = lifeExp))+
geom_jitter()
data:image/s3,"s3://crabby-images/b8a4e/b8a4e10e9f0adac15f7e40a952487a180960e15b" alt="../_images/6feba60266ac97245b3a9d7a38ef40591fd57ecddcd3d2895d82d487f0c13d69.png"
gapdata %>%
ggplot(aes(x = continent, y = lifeExp)) +
geom_boxplot()
data:image/s3,"s3://crabby-images/fd31a/fd31a6580afa6eab785a329f18d27ed4017f897d" alt="../_images/2877ae50a9f9c82f2609b7982d74712d4dc87d82a003b93fb6c2700c3a66e84d.png"
gapdata %>%
ggplot(aes(x = continent, y = lifeExp))+
geom_boxplot()+
geom_jitter(alpha = 0.25)
data:image/s3,"s3://crabby-images/14e91/14e91d79b55f9097c36e5b191e8dd3b6f207686d" alt="../_images/e65df2094359c23775918917dc851dd7587c3324afd3e9303049b075762eb3e9.png"
gapdata %>%
ggplot(aes(x = continent, y = lifeExp))+
geom_jitter()+
stat_summary(fun.y = median, colour = "red", geom = "point", size = 5)
Warning message:
“The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
ℹ Please use the `fun` argument instead.”
data:image/s3,"s3://crabby-images/d9284/d9284d18fc3729bb972efc6e168690c6b2cf5b02" alt="../_images/03b60e020f254ad2b8a281b1283931b0e1799bde52c41b461de5141a04eb1040.png"
gapdata %>%
ggplot(aes(reorder(x = continent, lifeExp), y = lifeExp)) +
geom_jitter() +
stat_summary(fun.y = median, colour = "red", geom = "point", size = 5)
data:image/s3,"s3://crabby-images/5176d/5176dfd3197796d3dfebaf781c1826ccf79d84f5" alt="../_images/dc9e42ad516ca3fd78309b36bdd77ef8f636ed7ff165ba0f5ea276e1cb7b187a.png"
注意到我们已经提到过 stat_count
/ stat_bin
/ stat_summary
gapdata %>%
ggplot(aes(x = continent, y = lifeExp))+
geom_violin(trim = FALSE, alpha = 0.5) +
stat_summary(fun.y = mean,
fun.ymax = function(x){mean(x) + sd(x)},
fun.ymin = function(x){mean(x) - sd(x)},
geom = "pointrange")
Warning message:
“The `fun.ymin` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
ℹ Please use the `fun.min` argument instead.”
Warning message:
“The `fun.ymax` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
ℹ Please use the `fun.max` argument instead.”
data:image/s3,"s3://crabby-images/72430/724303b0ac5236065b8bc90f80289f93b0545723" alt="../_images/e7deb3e8cc1b5108b8e75c774c109dd9ed4f0ef405de65f874e8a614fc1c8c2a.png"
gapdata %>%
ggplot(aes(x = continent, y = lifeExp))+
geom_violin(trim = FALSE, alpha = 0.5) +
stat_summary(fun.y = mean,
fun.ymax = ~mean(.x) + sd(.x),
fun.ymin = ~mean(.x) - sd(.x),
geom = "pointrange")
data:image/s3,"s3://crabby-images/72430/724303b0ac5236065b8bc90f80289f93b0545723" alt="../_images/e7deb3e8cc1b5108b8e75c774c109dd9ed4f0ef405de65f874e8a614fc1c8c2a.png"
7 山峦图#
常用于一个离散变量 + 一个连续变量
ggridges::geom_density_ridges()
gapdata %>%
ggplot(aes(x = lifeExp, y = continent,
fill = continent))+
ggridges::geom_density_ridges()
Picking joint bandwidth of 2.23
data:image/s3,"s3://crabby-images/efd00/efd00e62fa9f97728a508dba0774b1b29d687b3e" alt="../_images/f51b7fc195b4608035bf4dbc59020ee98e18611eccd409c1006c84f1b7dd374f.png"
gapdata %>%
ggplot(aes(x = lifeExp, y = continent,
fill = continent))+
ggridges::geom_density_ridges()+
scale_fill_manual(
values = c("#003f5c", "#58508d", "#bc5090", "#ff6361", "#ffa600"))
Picking joint bandwidth of 2.23
data:image/s3,"s3://crabby-images/49bec/49bec4c66bb2597a6f7216e40768a2397c35b39e" alt="../_images/fc4546c7f4e46772c60e73a17642aa43cc861989338965aa53c823484b6d6f0c.png"
# colorspace 调色板
gapdata %>%
ggplot(aes(x = lifeExp, y = continent,
fill = continent))+
ggridges::geom_density_ridges()+
scale_fill_manual(
values = colorspace::sequential_hcl(5, palette = "Peach"))
Picking joint bandwidth of 2.23
data:image/s3,"s3://crabby-images/70425/704259f793bbc72bbd1ab42a71ce2f824a2e4e58" alt="../_images/b1d9607a1852c273b9dd5b58340e462966ddf65b028b98a38a7ad7aca182fab2.png"
散点图#
常用于两个连续变量
geom_point()
gapdata %>%
ggplot(aes(x = gdpPercap, y = lifeExp))+
geom_point()
data:image/s3,"s3://crabby-images/bed89/bed897c38277b30710cf7c8609b5ac9de6f02595" alt="../_images/573e425c91b07199b0abe39316acd05ef6378cf6424a70eb1340e677d878a31b.png"
更好的 log
转化方式
scale_x_log10()
scale_y_log10()
# 一般
gapdata %>%
ggplot(aes(x = log(gdpPercap), y = lifeExp))+
geom_point()
data:image/s3,"s3://crabby-images/8b82e/8b82e05d930a083c8d8312d7efd4d71420abb48b" alt="../_images/87bf1cdff586644bd96f444966c4cb62f37065afd7225d2b45e998a99b5ceed7.png"
# 更好方式
gapdata %>%
ggplot(aes(x = gdpPercap, y = lifeExp)) +
geom_point()+
scale_x_log10()
data:image/s3,"s3://crabby-images/ed793/ed793b438a60b6554fc8f4893f77b7988893130b" alt="../_images/057ceb5dfaee4c7e6237f5dd4be1a1e8bd545c6f1ddd6c3d5e7dbe41446a1dd7.png"
着色方式
Error in eval(expr, envir, enclos): 找不到对象'着色方式'
Traceback:
gapdata %>%
ggplot(aes(x = gdpPercap, y = lifeExp))+
geom_point(aes(color = continent))
gapdata %>%
ggplot(aes(x = gdpPercap, y = lifeExp,
color = continent))+
geom_point()
data:image/s3,"s3://crabby-images/53e23/53e2336565e1533d287e7f846ad26ab87581a403" alt="../_images/88924114b773fd8cf5ebdf9e812611c9605b9ec23aaebaf4aa5b73c7d076b97d.png"
data:image/s3,"s3://crabby-images/53e23/53e2336565e1533d287e7f846ad26ab87581a403" alt="../_images/88924114b773fd8cf5ebdf9e812611c9605b9ec23aaebaf4aa5b73c7d076b97d.png"
gapdata %>%
ggplot(aes(x = gdpPercap, y = lifeExp))+
geom_point(alpha = (1/3), size = 2)
data:image/s3,"s3://crabby-images/bb26f/bb26ff635d64a169c79b75c8e24aa568c529ca4f" alt="../_images/947483e17d4ba8127f3c63991ffeda826978c187faf8d2b2108de0b8f10eadb9.png"
gapdata %>%
ggplot(aes(x = gdpPercap, y = lifeExp))+
geom_point(alpha = 0.3)+
geom_smooth()
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
data:image/s3,"s3://crabby-images/39910/399105901d1f48cb7795e4b9505f152fca2f24f7" alt="../_images/a460500b598568abe3ac593f8e5e7a94f04c5f0248985dc207f5fa8891f1d89f.png"
gapdata %>%
ggplot(aes(x = gdpPercap, y = lifeExp))+
geom_point()+
geom_smooth(lwd = 3, se = FALSE)
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
data:image/s3,"s3://crabby-images/d25ea/d25ea4ea5fb83abf06d4bf09ec78443c9539b8b8" alt="../_images/96d1da28d714cad17cdb549c4594c365270ab545a0446551a435059aaa1d9b35.png"
gapdata %>%
ggplot(aes(x = gdpPercap, y = lifeExp))+
geom_point()+
geom_smooth(lwd = 3, se = FALSE, method = "lm")
`geom_smooth()` using formula = 'y ~ x'
data:image/s3,"s3://crabby-images/ad735/ad73525c93cbaf7c5f19ed1fee8dcf33fbc05658" alt="../_images/ba4edb396d45482fc5995d065cf4d8cc64794fbf95233f5fddf7bf1f74fbd224.png"
gapdata %>%
ggplot(aes(x = gdpPercap, y = lifeExp,
color = continent))+
geom_point()+
geom_smooth(lwd = 3, se = FALSE, method = "lm")
`geom_smooth()` using formula = 'y ~ x'
data:image/s3,"s3://crabby-images/e0ba4/e0ba4eb0e15e3fdc1b7b6d95b4db1bd671e32892" alt="../_images/42c8a6966f2692be510f2facb88fb09eb0a4173b6f5ff219c7032940c63a6b70.png"
gapdata %>%
ggplot(aes(x = gdpPercap, y = lifeExp,
color = continent))+
geom_point(alpha = 0.3)+
geom_smooth(lwd = 1, color = "blue", se = TRUE, method = "lm")
`geom_smooth()` using formula = 'y ~ x'
data:image/s3,"s3://crabby-images/5067d/5067d7fc0406649214bdb990ed026b0873945e69" alt="../_images/72d41e02603ebca0046772311ff57fc7f619c622ac42e28a01b44a4a03f08143.png"
jCountries <- c("Canada", "Rwanda", "Cambodia", "Mexico")
gapdata %>%
filter(country %in% jCountries) %>%
ggplot(aes(x = year, y = lifeExp, color = country))+
geom_line()+
geom_point()
data:image/s3,"s3://crabby-images/a1d49/a1d494989e414dff77a7ee83d64d49761d4b62fc" alt="../_images/56c9a072b7d09de585320f2e5bed387a4dd8f5bbf00b5ccf870cfb96270698c3.png"
可以看到,图例的顺序和图中的顺序不太一致,
在设置color的时候可以对continent进行reorder
gapdata %>%
filter(country %in% jCountries) %>%
ggplot(aes(x = year, y = lifeExp,
color = reorder(country, -1 * lifeExp, max)
))+
geom_line()+
geom_point()
data:image/s3,"s3://crabby-images/72b0e/72b0ece7cd6068118db84869f3b67be0ca948f66" alt="../_images/dcd3fd59aab0d6dadcaa270bfd97ef2b8aeace43bb8a25dae7176ef620828845.png"
当然还有如下方式
利用if_else
函数增加一列,并直接用geom_label(aes(label = end_label))
讲其加入图中max
那个点
gapdata %>%
filter(country %in% jCountries) %>%
group_by(country) %>%
mutate(end_label = if_else(year == max(year), country, NA_character_)) %>%
ggplot(aes(x = year, y = lifeExp,
color = country))+
geom_line()+
geom_point()+
geom_label(aes(label = end_label))+
theme(legend.position = "none")
Warning message:
“Removed 44 rows containing missing values or values outside the scale range (`geom_label()`).”
data:image/s3,"s3://crabby-images/b486c/b486c839e7478a02bbeb4e5221b27970f3ee8813" alt="../_images/898ec368f5bd1cbe4fa2ba4f90a8529d9f4e0ef2eb86cf32a607a1aa5929a392.png"
如果觉得麻烦,可以用gghighlight
宏包
# install.packages("gghighlight")
library(gghighlight)
gapdata %>%
filter(country %in% jCountries) %>%
ggplot(aes(x = year, y = lifeExp,
color = country))+
geom_line()+
geom_point()+
gghighlight::gghighlight()
label_key: country
data:image/s3,"s3://crabby-images/76618/76618c6310eca7fe1dc75c66bdb830385013e3fb" alt="../_images/d0db514bd4e5111505f49d1c2d4c9e8f21c4813811cb11a94864199301f3471f.png"
9 点线图#
geom_point() + geom_segment()
# 点图
gapdata %>%
filter(continent == "Asia" & year == 2007) %>%
ggplot(aes(x = lifeExp, y = country))+
geom_point()
data:image/s3,"s3://crabby-images/3e2cc/3e2cc0ed73bc4f59b0df61b3c8562bb1ccb6100a" alt="../_images/9876f19cc7f8eae976b3ebf1038ad66fc498667fb22651a47c43d91286764ed7.png"
# 点线图
gapdata %>%
filter(continent == "Asia" & year == 2007) %>%
ggplot(aes(x = lifeExp, y = reorder(country, lifeExp),
))+
geom_point(color = "blue", size = 2)+
geom_segment(aes(x = 40, xend = lifeExp,
y=reorder(country,lifeExp),yend=reorder(country,lifeExp)),
color = "lightgrey")+
labs(x = "Life Expectancy (years)", y = "",
title = "Life Expectancy by Country",
subtitle = "GapMinder data for Asia - 2007")+
theme_minimal()+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
data:image/s3,"s3://crabby-images/705b4/705b4b8e3b52d444793d612f8e308751bf2a15a1" alt="../_images/3d615c23e5c96f8d0fddadea8595ccc9296a03fb8b41d6c908f87758d1daf85a.png"
10 分面#
分面有两个 -
facet_grid()
-facet_wrap()
1 facet_grid()
#
create a grid of graphs, by rows and columns
use
vars()
to call on the variablesadjust scales with
scales = "free"
gapdata %>%
ggplot(aes(x = lifeExp)) +
geom_density()+
facet_grid(. ~ continent)
data:image/s3,"s3://crabby-images/32d08/32d085ae9ace1c03c760bd1b57ead2830e14d7c0" alt="../_images/29b7f5c72b8fea877935f26c7e5a9303950231479d9c3c2f6e8cdecbe6f8fd7b.png"
gapdata %>%
filter(continent != "Oceania") %>%
ggplot(aes(x = lifeExp, fill = continent))+
geom_histogram()+
facet_grid(continent ~ .)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
data:image/s3,"s3://crabby-images/de0bf/de0bf643ab024c87443e8a8affa11c385ef7451f" alt="../_images/027ff5b3179df56a4f07e4d3fd597f9044ef2ea9f65fca5f64595e53b7461940.png"
gapdata %>%
filter(continent != "Oceania") %>%
ggplot(aes(x = lifeExp, y = stat(density)))+
geom_histogram(aes(fill = continent))+
geom_density()+
facet_grid(continent~ .)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
data:image/s3,"s3://crabby-images/3f689/3f6898fd02926b3895068f0bd2b8d264063fac63" alt="../_images/90e3cbdc8f49c8c077c9ac7196a180178806f8fe7373912bc444425098c7566c.png"
2 facet_wrap()
#
create small multiples by “wrapping” a series of plots
use
vars()
to call on the variablesnrow and ncol arguments for dictating shape of grid
gapdata %>%
ggplot(aes(x = gdpPercap, y = lifeExp, color = continent))+
geom_point(show.legend = FALSE)+
facet_wrap(~continent)
data:image/s3,"s3://crabby-images/f2458/f24585dfc2f0beb229ecae204a24012b5147f476" alt="../_images/9cd01126b467a335f0c22128505d284c74be38e559cfa879a072873e1b747e71.png"
11 文本标注#
ggforce::geom_mark_ellipse()
ggrepel::geom_text_repel()
gapdata %>%
ggplot(aes(x = gdpPercap, y = lifeExp))+
geom_point()+
ggforce::geom_mark_ellipse(aes(
filter = gdpPercap > 70000,
label = "Rich country",
description = "What country are they?"
))
data:image/s3,"s3://crabby-images/05f6e/05f6e4cfa6a8b796993d886bde45995ce1ba705a" alt="../_images/200adc3c89eebe2a4c998ef0742b7c64566370fe03d92771508d515229167a52.png"
ten_countries <- gapdata %>%
distinct(country) %>%
pull() %>%
sample(10)
ten_countries
- 'Mexico'
- 'Liberia'
- 'Myanmar'
- 'Guinea'
- 'Sao Tome and Principe'
- 'Vietnam'
- 'Puerto Rico'
- 'Algeria'
- 'Croatia'
- 'Uganda'
library(ggrepel)
gapdata %>%
filter(year == 2007) %>%
mutate(
label = ifelse(country %in% ten_countries, as.character(country), "")
) %>%
ggplot(aes(log(gdpPercap), lifeExp))+
geom_point(size = 3.5, alpha = 0.9, shape = 21,
col = "white", fill = "#0162B2")+
geom_text_repel(aes(label = label), size = 4.5,
point.padding = 0.2, box.padding = 0.3,
force = 1, min.segment.length = 0)+
theme_minimal(14)+
theme(legend.position = "none",
panel.grid.minor = element_blank())+
labs(x = "log(GDP per capita)",
y = "life expectancy")
data:image/s3,"s3://crabby-images/0d23a/0d23aaa6db854a594344239a202b74cd5e979ba0" alt="../_images/421bcc18a823708aafdb21ced09ecdb4cced075b63482ef33adaa3d340c2618b.png"
12 errorbar图#
geom_errorbar()
avg_gapdata <- gapdata %>%
group_by(continent) %>%
summarise(mean = mean(lifeExp), sd = sd(lifeExp)
)
avg_gapdata
continent | mean | sd |
---|---|---|
<chr> | <dbl> | <dbl> |
Africa | 48.86533 | 9.150210 |
Americas | 64.65874 | 9.345088 |
Asia | 60.06490 | 11.864532 |
Europe | 71.90369 | 5.433178 |
Oceania | 74.32621 | 3.795611 |
avg_gapdata %>%
ggplot(aes(continent, mean, fill = continent))+
geom_point()+
geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd),
width = 0.25)
data:image/s3,"s3://crabby-images/09920/099209551ef5e0e4f6ff4b8aa629c76baab8e052" alt="../_images/9010299a2b104fc385504b90a85a94bb240cfccce2ece68af51f1c094a68acd4.png"
13 椭圆图#
stat_ellipse(type = "norm", level = 0.95)
,也就是添加置信区间
gapdata %>%
ggplot(aes(x = log(gdpPercap), y = lifeExp))+
geom_point()+
stat_ellipse(type = "norm", level = 0.95)
data:image/s3,"s3://crabby-images/246f2/246f22387357be7f13de54776b91d3a46b566cb8" alt="../_images/7ba97b1fb8cd4104dbfba0584ee61592141698cc6fa25cd37de1640ae369fc0f.png"
14 2D 密度图#
与一维的情形geom_density()
类似, geom_density_2d()
, geom_bin2d()
, geom_hex()
常用于刻画两个变量构成的二维区间的密度
#geom_bin2d()
gapdata %>%
ggplot(aes(x = gdpPercap, y = lifeExp))+
geom_bin2d()
data:image/s3,"s3://crabby-images/674cf/674cf266d68f90b55d124c6aa063ce33fa73518f" alt="../_images/3e2564426d2ab89a95057bb3cbbefc32231be9ad5946b29c3bb784557af63a0e.png"
# geom_density2d()
gapdata %>%
ggplot(aes(x = gdpPercap, y = lifeExp))+
geom_density2d()
data:image/s3,"s3://crabby-images/5c04d/5c04d744bb800a2ff7976d60b70ecadbc5095d3c" alt="../_images/ecb9d98ed3698e80fed15cda4c5ecea36f44720444868b3a2aa01442bb994894.png"
15 马赛克图#
geom_tile()
, geom_contour()
, geom_raster()
常用于3个变量
gapdata %>%
group_by(continent, year) %>%
summarise(mean_lifeExp = mean(lifeExp)) %>%
ggplot(aes(x = year, y = continent, fill = mean_lifeExp))+
geom_tile()+
scale_fill_viridis_c()
`summarise()` has grouped output by 'continent'. You can override using the `.groups` argument.
data:image/s3,"s3://crabby-images/8d888/8d8883aa87e65ad39333d40973f15d3e4a7f7b5d" alt="../_images/0f2673c7a7d6ff75d6d77bee88f6e87d9d770556b82d19bd5fe57e9df725f830.png"
事实上可以有更好的呈现方式
gapdata %>%
group_by(continent, year) %>%
summarise(mean_lifeExp = mean(lifeExp)) %>%
ggplot(aes(x = year, y = continent,
size = mean_lifeExp, color = mean_lifeExp))+
geom_point()+
scale_color_viridis_c()+
theme_minimal(15)
`summarise()` has grouped output by 'continent'. You can override using the `.groups` argument.
data:image/s3,"s3://crabby-images/d8c99/d8c99adabe478cab7dfa701678a10913b9b4beb3" alt="../_images/05b1d2a00e98ea840cf05c85dd8c7ef5cec7efd08b9e1e4dadd1d491f9661dd0.png"
把数值放入点中
geom_text()
gapdata %>%
group_by(continent, year) %>%
summarise(mean_lifeExp = mean(lifeExp)) %>%
ggplot(aes(x = year, y = continent, size = mean_lifeExp))+
geom_point(shape = 21, color = "red", fill = "white")+
scale_size_continuous(range = c(7, 15))+
geom_text(aes(label = round(mean_lifeExp, 2)), size = 3, color = "black")+
theme_minimal()+
theme(legend.position = "none")
`summarise()` has grouped output by 'continent'. You can override using the `.groups` argument.
data:image/s3,"s3://crabby-images/79626/79626ae237e3bc8b31b2e93c051593c3b958fdb6" alt="../_images/1b885a1a9fd2ef483208c3c0864396ec9626fdf73f28cac6aef5d1bcd0375980.png"
library(tidyverse)
tbl <-
tibble(
x = rep(c(1, 2, 3), times = 2),
y = 1:6,
group = rep(c("group1", "group2"), each = 3)
)
ggplot(tbl, aes(x, y)) + geom_line()
ggplot(tbl, aes(x, y, group = group)) + geom_line()
ggplot(tbl, aes(x, y, fill = group)) + geom_line()
ggplot(tbl, aes(x, y, color = group)) + geom_line()
data:image/s3,"s3://crabby-images/0ab64/0ab644c85ce00bf932e451fec67035c7c8e90cf0" alt="../_images/035522dc8166beb1d8489e9fb0e5b786d7f823cd4077203281c5b1485a4dd666.png"
data:image/s3,"s3://crabby-images/dc9c4/dc9c4f6d5a824ea63bcf16c9d80a236ae29603ff" alt="../_images/29e26a6eba3cba246d79b34af62fc9a9dedc861d13e447f94e1477f100c2ad6a.png"
data:image/s3,"s3://crabby-images/dc9c4/dc9c4f6d5a824ea63bcf16c9d80a236ae29603ff" alt="../_images/29e26a6eba3cba246d79b34af62fc9a9dedc861d13e447f94e1477f100c2ad6a.png"
data:image/s3,"s3://crabby-images/f452f/f452f4a8ba5c1f07e28765cc51ef8b09d4361b4e" alt="../_images/8d5e4db59bfada691c41a8c3e969f5b20c576a10befd6b5f8104294c3b5834e8.png"