{ "cells": [ { "cell_type": "markdown", "id": "b9e93184", "metadata": {}, "source": [ "# 语法函数1-基础介绍\n", "## 对象\n", "- 一切都是对象\n", " - R中存储的数据成为对象object,R语言数据处理实际上就是不断的创建和操控这些对象。\n", "- 对象的创建和使用\n", " - 创建一个R对象,首先确定一个名称,然后使用赋值操作符 <-,将数据赋值给它" ] }, { "cell_type": "markdown", "id": "0cca4ee2", "metadata": {}, "source": [ "#### 对象属性\n", "所有R对象都有其属性,其中最重要的两个属性是类型和长度" ] }, { "cell_type": "code", "execution_count": 5, "id": "bc2b28fa", "metadata": {}, "outputs": [ { "data": { "text/html": [ "'double'" ], "text/latex": [ "'double'" ], "text/markdown": [ "'double'" ], "text/plain": [ "[1] \"double\"" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "1" ], "text/latex": [ "1" ], "text/markdown": [ "1" ], "text/plain": [ "[1] 1" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x <- 123\n", "typeof(x) # 类型\n", "length(x) # 长度" ] }, { "cell_type": "markdown", "id": "f5ba2418", "metadata": {}, "source": [ "## 向量\n", "- 向量就像冰糖葫芦\n", " - 可以用 c() 函数实现类似结构" ] }, { "cell_type": "code", "execution_count": 6, "id": "0ba0b1ae", "metadata": {}, "outputs": [ { "data": { "text/html": [ "'double'" ], "text/latex": [ "'double'" ], "text/markdown": [ "'double'" ], "text/plain": [ "[1] \"double\"" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "5" ], "text/latex": [ "5" ], "text/markdown": [ "5" ], "text/plain": [ "[1] 5" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x <- c(3,4,5,6,7)\n", "typeof(x)\n", "length(x)\n", "\n", "# 这里的c就是 combine 或 concatenate 的意思\n", "# 它要求元素之间用英文的逗号分隔\n", "# 且元素的数据类型是统一的,比如这里都是数值\n", "# c() 函数把一组数据聚合到了一起,就构成了一个向量。" ] }, { "cell_type": "markdown", "id": "3813f644", "metadata": {}, "source": [ "- 聚合成新向量\n", " - c() 函数还可以把两个向量聚合成一个新的向量。" ] }, { "cell_type": "code", "execution_count": 7, "id": "aa38782d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 1\n", "\\item 2\n", "\\item 3\n", "\\item 4\n", "\\item 5\n", "\\item 6\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 1\n", "2. 2\n", "3. 3\n", "4. 4\n", "5. 5\n", "6. 6\n", "\n", "\n" ], "text/plain": [ "[1] 1 2 3 4 5 6" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "low <- c(1,2,3)\n", "high <- c(4,5,6)\n", "sequence <- c(low, high)\n", "sequence" ] }, { "cell_type": "markdown", "id": "3eaf00a1", "metadata": {}, "source": [ "- 命名向量\n", " - 相比与向量c(5, 6, 7, 8), 每个元素可以有自己的名字\n", " - 个人评价:类似于`python`中的字典——键值对" ] }, { "cell_type": "code", "execution_count": 8, "id": "8ac549fb", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
a
5
b
6
c
7
d
8
\n" ], "text/latex": [ "\\begin{description*}\n", "\\item[a] 5\n", "\\item[b] 6\n", "\\item[c] 7\n", "\\item[d] 8\n", "\\end{description*}\n" ], "text/markdown": [ "a\n", ": 5b\n", ": 6c\n", ": 7d\n", ": 8\n", "\n" ], "text/plain": [ "a b c d \n", "5 6 7 8 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x <- c('a'=5, 'b'=6, 'c'=7, 'd'=8)\n", "x" ] }, { "cell_type": "code", "execution_count": 9, "id": "41213992", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
a
5
b
6
c
7
d
8
\n" ], "text/latex": [ "\\begin{description*}\n", "\\item[a] 5\n", "\\item[b] 6\n", "\\item[c] 7\n", "\\item[d] 8\n", "\\end{description*}\n" ], "text/markdown": [ "a\n", ": 5b\n", ": 6c\n", ": 7d\n", ": 8\n", "\n" ], "text/plain": [ "a b c d \n", "5 6 7 8 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# 或者\n", "x <- c(5,6,7,8)\n", "names(x) <- c('a','b','c','d')\n", "x" ] }, { "cell_type": "markdown", "id": "3c5b524d", "metadata": {}, "source": [ "### 数值型向量\n", "- 向量的元素都是数值类型,因此也叫数值型向量。\n", " - 数值型的向量,有 integer 和 double 两种" ] }, { "cell_type": "code", "execution_count": 37, "id": "65bd341c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "'double'" ], "text/latex": [ "'double'" ], "text/markdown": [ "'double'" ], "text/plain": [ "[1] \"double\"" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "'numeric'" ], "text/latex": [ "'numeric'" ], "text/markdown": [ "'numeric'" ], "text/plain": [ "[1] \"numeric\"" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "'numeric'" ], "text/latex": [ "'numeric'" ], "text/markdown": [ "'numeric'" ], "text/plain": [ "[1] \"numeric\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x <- c(1,5,2,3)\n", "typeof(x)\n", "x <- c(1.5,-0.5,2,3)\n", "class(x)\n", "x <- c(3e+06, 1.23e2)\n", "class(x)" ] }, { "cell_type": "markdown", "id": "0d206a5b", "metadata": {}, "source": [ " - seq() 函数可以生成等差数列,\n", " - from 参数指定数列的起始值,to 参数指定数列的终止值,by 参数指定数值的间距" ] }, { "cell_type": "code", "execution_count": 15, "id": "06cec7b8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
  1. 0
  2. 0.5
  3. 1
  4. 1.5
  5. 2
  6. 2.5
  7. 3
  8. 3.5
  9. 4
  10. 4.5
  11. 5
  12. 5.5
  13. 6
  14. 6.5
  15. 7
  16. 7.5
  17. 8
  18. 8.5
  19. 9
  20. 9.5
  21. 10
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 0\n", "\\item 0.5\n", "\\item 1\n", "\\item 1.5\n", "\\item 2\n", "\\item 2.5\n", "\\item 3\n", "\\item 3.5\n", "\\item 4\n", "\\item 4.5\n", "\\item 5\n", "\\item 5.5\n", "\\item 6\n", "\\item 6.5\n", "\\item 7\n", "\\item 7.5\n", "\\item 8\n", "\\item 8.5\n", "\\item 9\n", "\\item 9.5\n", "\\item 10\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 0\n", "2. 0.5\n", "3. 1\n", "4. 1.5\n", "5. 2\n", "6. 2.5\n", "7. 3\n", "8. 3.5\n", "9. 4\n", "10. 4.5\n", "11. 5\n", "12. 5.5\n", "13. 6\n", "14. 6.5\n", "15. 7\n", "16. 7.5\n", "17. 8\n", "18. 8.5\n", "19. 9\n", "20. 9.5\n", "21. 10\n", "\n", "\n" ], "text/plain": [ " [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0\n", "[16] 7.5 8.0 8.5 9.0 9.5 10.0" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "s1 <- seq(from=0, to=10, by=0.5)\n", "s1" ] }, { "cell_type": "markdown", "id": "76c490fe", "metadata": {}, "source": [ " - rep() 是 repeat(重复)的意思,可以用于产生重复出现的数字序列:\n", " - x 用于重复的向量,times 参数可以指定要生成的个数,each 参数可以指定每个元素重复的次数" ] }, { "cell_type": "code", "execution_count": 16, "id": "8d1d09e8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
  1. 0
  2. 1
  3. 0
  4. 1
  5. 0
  6. 1
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 0\n", "\\item 1\n", "\\item 0\n", "\\item 1\n", "\\item 0\n", "\\item 1\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 0\n", "2. 1\n", "3. 0\n", "4. 1\n", "5. 0\n", "6. 1\n", "\n", "\n" ], "text/plain": [ "[1] 0 1 0 1 0 1" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. 0
  2. 0
  3. 0
  4. 1
  5. 1
  6. 1
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 0\n", "\\item 0\n", "\\item 0\n", "\\item 1\n", "\\item 1\n", "\\item 1\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 0\n", "2. 0\n", "3. 0\n", "4. 1\n", "5. 1\n", "6. 1\n", "\n", "\n" ], "text/plain": [ "[1] 0 0 0 1 1 1" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "s2 <- rep(x=c(0,1), times=3)\n", "s2\n", "\n", "s3 <- rep(x=c(0,1), each=3)\n", "s3" ] }, { "cell_type": "markdown", "id": "c8da6aa5", "metadata": {}, "source": [ " - m:n,如果单纯是要生成数值间距为1的数列,用 m:n 更快捷,\n", " - 它产生从 m 到 n 的间距为1的数列" ] }, { "cell_type": "code", "execution_count": 17, "id": "8811a6d9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
  1. 0
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 7
  9. 8
  10. 9
  11. 10
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 0\n", "\\item 1\n", "\\item 2\n", "\\item 3\n", "\\item 4\n", "\\item 5\n", "\\item 6\n", "\\item 7\n", "\\item 8\n", "\\item 9\n", "\\item 10\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 0\n", "2. 1\n", "3. 2\n", "4. 3\n", "5. 4\n", "6. 5\n", "7. 6\n", "8. 7\n", "9. 8\n", "10. 9\n", "11. 10\n", "\n", "\n" ], "text/plain": [ " [1] 0 1 2 3 4 5 6 7 8 9 10" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "s4 <- 0:10\n", "s4" ] }, { "cell_type": "markdown", "id": "e23c3729", "metadata": {}, "source": [ "### 字符串向量\n", "- 字符串(String)数据类型,实际上就是文本类型,必须用单引号或者是双引号包含" ] }, { "cell_type": "code", "execution_count": 18, "id": "932b1683", "metadata": {}, "outputs": [], "source": [ "x <- c(\"a\", \"b\", \"c\") \n", "x <- c('Alice', 'Bob', 'Charlie', 'Dave') \n", "x <- c(\"hello\", \"baby\", \"I love you!\") " ] }, { "cell_type": "code", "execution_count": 35, "id": "16ef3749", "metadata": {}, "outputs": [ { "data": { "text/html": [ "'character'" ], "text/latex": [ "'character'" ], "text/markdown": [ "'character'" ], "text/plain": [ "[1] \"character\"" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "'numeric'" ], "text/latex": [ "'numeric'" ], "text/markdown": [ "'numeric'" ], "text/plain": [ "[1] \"numeric\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "## x1是字符串型向量,x2是数值型向量\n", "x1 <- c(\"1\", \"2\", \"3\")\n", "x2 <- c(1, 2, 3)\n", "class(x1)\n", "class(x2)" ] }, { "cell_type": "markdown", "id": "91a33837", "metadata": {}, "source": [ "### 逻辑型向量\n", "- 逻辑型常称为布尔型(Boolean), 它的常量值只有`TRUE`和`FALSE`。\n", "- TRUE和FALSE必须都大写" ] }, { "cell_type": "markdown", "id": "7be37c4b", "metadata": {}, "source": [ "### 因子型向量\n", "- 因子型可以看作是字符串向量的增强版,它是带有层级(Levels)的字符串向量。\n", "- 使用`factor()`函数可以将字符串向量转换成因子型向量,即可用`levels`参数指定顺序" ] }, { "cell_type": "code", "execution_count": 24, "id": "9461f626", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
  1. 'spring'
  2. 'summer'
  3. 'autumn'
  4. 'winter'
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 'spring'\n", "\\item 'summer'\n", "\\item 'autumn'\n", "\\item 'winter'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 'spring'\n", "2. 'summer'\n", "3. 'autumn'\n", "4. 'winter'\n", "\n", "\n" ], "text/plain": [ "[1] \"spring\" \"summer\" \"autumn\" \"winter\"" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "'character'" ], "text/latex": [ "'character'" ], "text/markdown": [ "'character'" ], "text/plain": [ "[1] \"character\"" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. spring
  2. summer
  3. autumn
  4. winter
\n", "\n", "
\n", "\t\n", "\t\tLevels:\n", "\t\n", "\t\n", "\t
  1. 'summer'
  2. 'winter'
  3. 'spring'
  4. 'autumn'
\n", "
" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item spring\n", "\\item summer\n", "\\item autumn\n", "\\item winter\n", "\\end{enumerate*}\n", "\n", "\\emph{Levels}: \\begin{enumerate*}\n", "\\item 'summer'\n", "\\item 'winter'\n", "\\item 'spring'\n", "\\item 'autumn'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. spring\n", "2. summer\n", "3. autumn\n", "4. winter\n", "\n", "\n", "\n", "**Levels**: 1. 'summer'\n", "2. 'winter'\n", "3. 'spring'\n", "4. 'autumn'\n", "\n", "\n" ], "text/plain": [ "[1] spring summer autumn winter\n", "Levels: summer winter spring autumn" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "'integer'" ], "text/latex": [ "'integer'" ], "text/markdown": [ "'integer'" ], "text/plain": [ "[1] \"integer\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "four_seasons <- c(\"spring\", \"summer\", \"autumn\", \"winter\")\n", "four_seasons\n", "typeof(four_seasons)\n", "\n", "four_seasons_factor <- factor(four_seasons, \n", " levels = c(\"summer\", \"winter\", \"spring\", \"autumn\"))\n", "four_seasons_factor\n", "typeof(four_seasons_factor)" ] }, { "cell_type": "markdown", "id": "57ca82fa", "metadata": {}, "source": [ "### 强制转换\n", "- 矢量中的元素必须是相同的类型,但如果不一样呢,会发生什么? \n", "- 这个时候R会强制转换成相同的类型。这就涉及数据类型的转换层级\n", " - character > numeric > logical\n", " - double > integer" ] }, { "cell_type": "code", "execution_count": 28, "id": "edb1c1ec", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
  1. '1'
  2. 'foo'
  3. 'TRUE'
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item '1'\n", "\\item 'foo'\n", "\\item 'TRUE'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. '1'\n", "2. 'foo'\n", "3. 'TRUE'\n", "\n", "\n" ], "text/plain": [ "[1] \"1\" \"foo\" \"TRUE\"" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. 7
  2. 1
  3. 0
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 7\n", "\\item 1\n", "\\item 0\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 7\n", "2. 1\n", "3. 0\n", "\n", "\n" ], "text/plain": [ "[1] 7 1 0" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. 1
  2. 2
  3. 3.14159265358979
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 1\n", "\\item 2\n", "\\item 3.14159265358979\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 1\n", "2. 2\n", "3. 3.14159265358979\n", "\n", "\n" ], "text/plain": [ "[1] 1.000000 2.000000 3.141593" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "c(1, \"foo\", TRUE) # 强制转换成字符串类型\n", "c(7, TRUE, FALSE) # 强制转换成数值型\n", "c(1L, 2, pi) # 强制转换成双精度的数值型" ] }, { "cell_type": "markdown", "id": "57a1b078", "metadata": {}, "source": [ "## 数据结构\n", "- 向量\n", "- 矩阵 matrix\n", "- 列表 list\n", "- 数据框 dataframe" ] }, { "cell_type": "markdown", "id": "821210ab", "metadata": {}, "source": [ "### 矩阵\n", "- 矩阵可以存储行(row)和列(column)二维的数据。\n", "- 实际上是向量的另一种表现形式,也就说它的本质还是向量,一维的向量用二维的方式呈现\n", "- 矩阵可以用`matrix()`函数创建,第一个位置的参数是用于创建矩阵的向量\n", "- 默认情况是竖着排,第一列排完再第二列这样,可用`byrow=TRUE`指定横着排,第一行排完排第二行" ] }, { "cell_type": "code", "execution_count": 40, "id": "5c645061", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "
A matrix: 2 × 3 of type dbl
235
417
\n" ], "text/latex": [ "A matrix: 2 × 3 of type dbl\n", "\\begin{tabular}{lll}\n", "\t 2 & 3 & 5\\\\\n", "\t 4 & 1 & 7\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A matrix: 2 × 3 of type dbl\n", "\n", "| 2 | 3 | 5 |\n", "| 4 | 1 | 7 |\n", "\n" ], "text/plain": [ " [,1] [,2] [,3]\n", "[1,] 2 3 5 \n", "[2,] 4 1 7 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "
A matrix: 2 × 3 of type dbl
243
157
\n" ], "text/latex": [ "A matrix: 2 × 3 of type dbl\n", "\\begin{tabular}{lll}\n", "\t 2 & 4 & 3\\\\\n", "\t 1 & 5 & 7\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A matrix: 2 × 3 of type dbl\n", "\n", "| 2 | 4 | 3 |\n", "| 1 | 5 | 7 |\n", "\n" ], "text/plain": [ " [,1] [,2] [,3]\n", "[1,] 2 4 3 \n", "[2,] 1 5 7 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. 'matrix'
  2. 'array'
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 'matrix'\n", "\\item 'array'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 'matrix'\n", "2. 'array'\n", "\n", "\n" ], "text/plain": [ "[1] \"matrix\" \"array\" " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "6" ], "text/latex": [ "6" ], "text/markdown": [ "6" ], "text/plain": [ "[1] 6" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. 2
  2. 3
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 2\n", "\\item 3\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 2\n", "2. 3\n", "\n", "\n" ], "text/plain": [ "[1] 2 3" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "m <- matrix(c(2,4,3,1,5,7),\n", " nrow=2,\n", " ncol=3)\n", "m\n", "\n", "m1 <- matrix(c(2,4,3,1,5,7),\n", " nrow=2,\n", " ncol=3,\n", " byrow=TRUE)\n", "m1\n", "class(m1) # 类型\n", "length(m1) # 长度\n", "dim(m1) # 维度" ] }, { "cell_type": "markdown", "id": "07065034", "metadata": {}, "source": [ "### 列表\n", "- 如果我们想要装更多的东西,可以想象有一个小火车,小火车的每节车厢是独立的,因此每节车厢装的东西可以不一样。这种结构装载数据的能力很强大,称之为列表。\n", " - ![image.png](https://bookdown.org/wangminjie/R4DS/images/list_trian.png)\n", "- 我们可以使用`list()`函数创建列表,和python的list不太一样\n", "- `c()`函数创建向量 对比`list()`函数创建列表\n", " - ![image-2.png](https://bookdown.org/wangminjie/R4DS/images/c_vs_list.png)" ] }, { "cell_type": "code", "execution_count": 44, "id": "9bc7be82", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\t
$a
\n", "\t\t
\n", "
  1. 5
  2. 10
\n", "
\n", "\t
$b
\n", "\t\t
\n", "
  1. 'I'
  2. 'love'
  3. 'R'
  4. 'language'
  5. '!'
\n", "
\n", "\t
$c
\n", "\t\t
\n", "
  1. TRUE
  2. TRUE
  3. FALSE
  4. TRUE
\n", "
\n", "
\n" ], "text/latex": [ "\\begin{description}\n", "\\item[\\$a] \\begin{enumerate*}\n", "\\item 5\n", "\\item 10\n", "\\end{enumerate*}\n", "\n", "\\item[\\$b] \\begin{enumerate*}\n", "\\item 'I'\n", "\\item 'love'\n", "\\item 'R'\n", "\\item 'language'\n", "\\item '!'\n", "\\end{enumerate*}\n", "\n", "\\item[\\$c] \\begin{enumerate*}\n", "\\item TRUE\n", "\\item TRUE\n", "\\item FALSE\n", "\\item TRUE\n", "\\end{enumerate*}\n", "\n", "\\end{description}\n" ], "text/markdown": [ "$a\n", ": 1. 5\n", "2. 10\n", "\n", "\n", "\n", "$b\n", ": 1. 'I'\n", "2. 'love'\n", "3. 'R'\n", "4. 'language'\n", "5. '!'\n", "\n", "\n", "\n", "$c\n", ": 1. TRUE\n", "2. TRUE\n", "3. FALSE\n", "4. TRUE\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "$a\n", "[1] 5 10\n", "\n", "$b\n", "[1] \"I\" \"love\" \"R\" \"language\" \"!\" \n", "\n", "$c\n", "[1] TRUE TRUE FALSE TRUE\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. 5
  2. 10
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 5\n", "\\item 10\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 5\n", "2. 10\n", "\n", "\n" ], "text/plain": [ "[1] 5 10" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "'love'" ], "text/latex": [ "'love'" ], "text/markdown": [ "'love'" ], "text/plain": [ "[1] \"love\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "list1 <- list(a=c(5, 10),\n", " b=c(\"I\", \"love\", \"R\", \"language\", \"!\"),\n", " c=c(TRUE, TRUE, FALSE, TRUE))\n", "list1\n", "list1$a\n", "list1$`b[2]" ] }, { "cell_type": "markdown", "id": "15913b82", "metadata": {}, "source": [ "### 数据框\n", "- 列表可以想象成一个小火车,如果每节车厢装的都是向量而且等长,那么这种特殊形式的列表就变成了数据框\n", "- 数据框是一种特殊的列表,我们可以使用`data.frame()`函数构建数据框\n", "\n", "#### 属性" ] }, { "cell_type": "code", "execution_count": 48, "id": "00b0411a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 4 × 4
nameagemarriagecolor
<chr><dbl><lgl><chr>
Alice 3 TRUEred
Bob 34FALSEblue
Carl 23 TRUEorange
Dave 25FALSEpurple
\n" ], "text/latex": [ "A data.frame: 4 × 4\n", "\\begin{tabular}{llll}\n", " name & age & marriage & color\\\\\n", " & & & \\\\\n", "\\hline\n", "\t Alice & 3 & TRUE & red \\\\\n", "\t Bob & 34 & FALSE & blue \\\\\n", "\t Carl & 23 & TRUE & orange\\\\\n", "\t Dave & 25 & FALSE & purple\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 4 × 4\n", "\n", "| name <chr> | age <dbl> | marriage <lgl> | color <chr> |\n", "|---|---|---|---|\n", "| Alice | 3 | TRUE | red |\n", "| Bob | 34 | FALSE | blue |\n", "| Carl | 23 | TRUE | orange |\n", "| Dave | 25 | FALSE | purple |\n", "\n" ], "text/plain": [ " name age marriage color \n", "1 Alice 3 TRUE red \n", "2 Bob 34 FALSE blue \n", "3 Carl 23 TRUE orange\n", "4 Dave 25 FALSE purple" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "'data.frame'" ], "text/latex": [ "'data.frame'" ], "text/markdown": [ "'data.frame'" ], "text/plain": [ "[1] \"data.frame\"" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "4" ], "text/latex": [ "4" ], "text/markdown": [ "4" ], "text/plain": [ "[1] 4" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "4" ], "text/latex": [ "4" ], "text/markdown": [ "4" ], "text/plain": [ "[1] 4" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df <- data.frame(\n", "name = c(\"Alice\", \"Bob\", \"Carl\", \"Dave\"),\n", "age = c(3, 34, 23, 25),\n", "marriage = c(TRUE, FALSE, TRUE, FALSE),\n", "color = c(\"red\", \"blue\", \"orange\", \"purple\")\n", ")\n", "df\n", "class(df) # 类型\n", "nrow(df) # 维度\n", "ncol(df) # 维度\n" ] }, { "cell_type": "markdown", "id": "058465ab", "metadata": {}, "source": [ "![image.png](https://bookdown.org/wangminjie/R4DS/images/data_structure1.png)" ] }, { "cell_type": "markdown", "id": "245c74b5", "metadata": {}, "source": [ "## 运算符及向量运算\n", "- 算数运算符 + - x ➗\n", "- 关系运算符 = > < !=\n", "- 逻辑运算符 & |\n", "- 其他运算符" ] }, { "cell_type": "markdown", "id": "eee623a2", "metadata": {}, "source": [ "### 算数运算符\n", "- `+ - * / `\n", "- 循环补齐\n", " - 做算术运算时,当两个向量长度相等的时候,就一一对应的完成计算;\n", " - 当两个向量长度不相等的时候,短的向量会`循环补齐`,保持与长向量的长度一致后,再做运算\n", " - 需要长向量是短向量的整数倍才能运算" ] }, { "cell_type": "code", "execution_count": 50, "id": "5b2ee082", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
  1. 11
  2. 22
  3. 31
  4. 42
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 11\n", "\\item 22\n", "\\item 31\n", "\\item 42\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 11\n", "2. 22\n", "3. 31\n", "4. 42\n", "\n", "\n" ], "text/plain": [ "[1] 11 22 31 42" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x <- c(1,2)\n", "y <- c(10,20,30,40)\n", "x+y" ] }, { "cell_type": "markdown", "id": "372c0534", "metadata": {}, "source": [ "### 关系运算符\n", "- 将第一向量的每个元素与第二向量的相应元素进行比较,比较的结果是布尔值,布尔值是“真” TRUE 或“假” FALSE 中的一个。\n", "- `< > = !=`" ] }, { "cell_type": "code", "execution_count": 52, "id": "fd514a29", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
  1. FALSE
  2. FALSE
  3. FALSE
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item FALSE\n", "\\item FALSE\n", "\\item FALSE\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. FALSE\n", "2. FALSE\n", "3. FALSE\n", "\n", "\n" ], "text/plain": [ "[1] FALSE FALSE FALSE" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# 检查第一个向量的每个元素是否等于第二个向量的相应元素。\n", "a <- c(1, 2, 3)\n", "b <- c(4, 5, 6)\n", "a == b" ] }, { "cell_type": "markdown", "id": "80dc5600", "metadata": {}, "source": [ "### 逻辑运算符\n", "- 一般适用于逻辑类型的向量\n", "- `& | !` ! 为逻辑非运算符。 对于向量的每个元素,给出相反的逻辑值。\n", "- 逻辑运算符`&&`和`||`只考虑向量的第一个元素,给出单个元素的向量作为输出" ] }, { "cell_type": "code", "execution_count": 55, "id": "3b364dc9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
  1. FALSE
  2. TRUE
  3. TRUE
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item FALSE\n", "\\item TRUE\n", "\\item TRUE\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. FALSE\n", "2. TRUE\n", "3. TRUE\n", "\n", "\n" ], "text/plain": [ "[1] FALSE TRUE TRUE" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "Warning message in a && b:\n", "“'length(x) = 3 > 1' in coercion to 'logical(1)'”\n", "Warning message in a && b:\n", "“'length(x) = 3 > 1' in coercion to 'logical(1)'”\n" ] }, { "data": { "text/html": [ "FALSE" ], "text/latex": [ "FALSE" ], "text/markdown": [ "FALSE" ], "text/plain": [ "[1] FALSE" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "Warning message in a || b:\n", "“'length(x) = 3 > 1' in coercion to 'logical(1)'”\n" ] }, { "data": { "text/html": [ "TRUE" ], "text/latex": [ "TRUE" ], "text/markdown": [ "TRUE" ], "text/plain": [ "[1] TRUE" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "a <- c(TRUE, FALSE, FALSE)\n", "!a\n", "\n", "a <- c(TRUE, FALSE, FALSE)\n", "b <- c(FALSE, TRUE, FALSE)\n", "a && b\n", "\n", "a <- c(TRUE, FALSE, FALSE)\n", "b <- c(FALSE, TRUE, FALSE)\n", "a || b" ] }, { "cell_type": "markdown", "id": "cf5b51a1", "metadata": {}, "source": [ "### 其他运算符\n", "- `:`冒号运算符。按顺序创建一个**整数**序列\n", "- `%in%` 此运算符用于判断元素是否属于向量\n", "- `is.na()`判断是否为缺失值,`R` 里缺失值用`NA`表示,`NA`的意思就是 `not available` 或者 `not applicable`." ] }, { "cell_type": "code", "execution_count": 64, "id": "248f1da9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 1\n", "\\item 2\n", "\\item 3\n", "\\item 4\n", "\\item 5\n", "\\item 6\n", "\\item 7\n", "\\item 8\n", "\\item 9\n", "\\item 10\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 1\n", "2. 2\n", "3. 3\n", "4. 4\n", "5. 5\n", "6. 6\n", "7. 7\n", "8. 8\n", "9. 9\n", "10. 10\n", "\n", "\n" ], "text/plain": [ " [1] 1 2 3 4 5 6 7 8 9 10" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "'integer'" ], "text/latex": [ "'integer'" ], "text/markdown": [ "'integer'" ], "text/plain": [ "[1] \"integer\"" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. TRUE
  2. TRUE
  3. FALSE
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item TRUE\n", "\\item TRUE\n", "\\item FALSE\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. TRUE\n", "2. TRUE\n", "3. FALSE\n", "\n", "\n" ], "text/plain": [ "[1] TRUE TRUE FALSE" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. FALSE
  2. FALSE
  3. FALSE
  4. TRUE
  5. FALSE
  6. TRUE
  7. FALSE
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item FALSE\n", "\\item FALSE\n", "\\item FALSE\n", "\\item TRUE\n", "\\item FALSE\n", "\\item TRUE\n", "\\item FALSE\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. FALSE\n", "2. FALSE\n", "3. FALSE\n", "4. TRUE\n", "5. FALSE\n", "6. TRUE\n", "7. FALSE\n", "\n", "\n" ], "text/plain": [ "[1] FALSE FALSE FALSE TRUE FALSE TRUE FALSE" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. TRUE
  2. TRUE
  3. FALSE
  4. TRUE
  5. FALSE
  6. TRUE
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item TRUE\n", "\\item TRUE\n", "\\item FALSE\n", "\\item TRUE\n", "\\item FALSE\n", "\\item TRUE\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. TRUE\n", "2. TRUE\n", "3. FALSE\n", "4. TRUE\n", "5. FALSE\n", "6. TRUE\n", "\n", "\n" ], "text/plain": [ "[1] TRUE TRUE FALSE TRUE FALSE TRUE" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# :\n", "a <- 1:10\n", "a\n", "typeof(a)\n", "\n", "# %in%\n", "c(2,3,7) %in% c(1,2,3,4,5)\n", "\n", "# is.na()\n", "is.na(c(1,2,3,NA,4,NA,5))\n", "x <- c(1,2,NA,4,NA,6)\n", "!is.na(x)" ] }, { "cell_type": "markdown", "id": "4e647d0d", "metadata": {}, "source": [ "- `seq()`函数也能产生序列\n", "- 注意到a是整数类型,而b是双精度的数值型" ] }, { "cell_type": "code", "execution_count": 61, "id": "72b18bfa", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 1\n", "\\item 2\n", "\\item 3\n", "\\item 4\n", "\\item 5\n", "\\item 6\n", "\\item 7\n", "\\item 8\n", "\\item 9\n", "\\item 10\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 1\n", "2. 2\n", "3. 3\n", "4. 4\n", "5. 5\n", "6. 6\n", "7. 7\n", "8. 8\n", "9. 9\n", "10. 10\n", "\n", "\n" ], "text/plain": [ " [1] 1 2 3 4 5 6 7 8 9 10" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "'integer'" ], "text/latex": [ "'integer'" ], "text/markdown": [ "'integer'" ], "text/plain": [ "[1] \"integer\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "b <- seq(1,10,1)\n", "b\n", "typeof(b)" ] }, { "cell_type": "markdown", "id": "14d09e16", "metadata": {}, "source": [ "### 特殊值\n", "- R 语言里还有一些特殊的值: `Inf, NaN, NA 和 NULL`\n", "- `Inf`, 是Infinity的简写,表示无限大;`-Inf`表示无限小\n", "- `NaN`, 是Not a Number的简写,表示这个数字没有数学定义\n", "- `NA`, 是Not available的简写,表示缺失状态\n", "- `NULL`, 是No value的意思,表示没有值,或者空值的意思,表示变量实际上没有任何值,或者甚至不存在。" ] }, { "cell_type": "code", "execution_count": 71, "id": "4e078920", "metadata": {}, "outputs": [ { "data": { "text/html": [ "Inf" ], "text/latex": [ "Inf" ], "text/markdown": [ "Inf" ], "text/plain": [ "[1] Inf" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "-Inf" ], "text/latex": [ "-Inf" ], "text/markdown": [ "-Inf" ], "text/plain": [ "[1] -Inf" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "NaN" ], "text/latex": [ "NaN" ], "text/markdown": [ "NaN" ], "text/plain": [ "[1] NaN" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "NULL" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. 1
  2. <NA>
  3. 3
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 1\n", "\\item \n", "\\item 3\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 1\n", "2. <NA>\n", "3. 3\n", "\n", "\n" ], "text/plain": [ "[1] 1 NA 3" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Inf\n", "1/0\n", "-1/0\n", "\n", "# NaN\n", "0/0\n", "\n", "# NULL\n", "c( )\n", "\n", "# NA\n", "c(1,NA,3)" ] }, { "cell_type": "code", "execution_count": 72, "id": "95d2b3bf", "metadata": {}, "outputs": [ { "data": { "text/html": [ "FALSE" ], "text/latex": [ "FALSE" ], "text/markdown": [ "FALSE" ], "text/plain": [ "[1] FALSE" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "a <- 1:10\n", "b <- seq(1,10,1)\n", "\n", "identical(a,b)" ] }, { "cell_type": "markdown", "id": "d76e2f5e", "metadata": {}, "source": [ "## 函数\n", "### 基础函数\n", "- `print()` 打印\n", "- `sqrt()` 求开方\n", "- `sd()` 标准差函数\n", "- `log()` 求自然对数\n", "- `sum()` 求和\n", "- `mean()` 求平均值\n", "- `min()` 最小值\n", "- `max()` 最大值\n", "- `length()` 计算向量个数\n", "- `sort()` 排序\n", "- `nique()` 去重\n", "- `quantile()` 求分位数\n", "- `is.numeric()` 判断是否为数值型\n", "- `is.character()` 判断是否为字符串型\n", "- `as.character()` 转为字符串型" ] }, { "cell_type": "markdown", "id": "7d89b0cd", "metadata": {}, "source": [ "### 向量的函数\n", "- 用在向量上的函数,可以分为向量化函数(vectorized function)和汇总类函数(summary function)\n", "- ![image.png](https://bookdown.org/wangminjie/R4DS/images/vector-function.png)\n", "- 这两类函数在`Tidyverse`框架中,应用非常广泛。\n", "- 当然,也会有例外,比如`unique()`函数,它返回的向量通常不会与输入的向量等长,既不属于向量化函数,也不属于汇总类函数。" ] }, { "cell_type": "code", "execution_count": 76, "id": "cef52a29", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
  1. 9
  2. 54
  3. 69
  4. 86
  5. 14
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 9\n", "\\item 54\n", "\\item 69\n", "\\item 86\n", "\\item 14\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 9\n", "2. 54\n", "3. 69\n", "4. 86\n", "5. 14\n", "\n", "\n" ], "text/plain": [ "[1] 9 54 69 86 14" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. 2
  2. 3
  3. 7
  4. 8
  5. 9
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 2\n", "\\item 3\n", "\\item 7\n", "\\item 8\n", "\\item 9\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 2\n", "2. 3\n", "3. 7\n", "4. 8\n", "5. 9\n", "\n", "\n" ], "text/plain": [ "[1] 2 3 7 8 9" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. -3.8
  2. 1.2
  3. 2.2
  4. 3.2
  5. -2.8
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item -3.8\n", "\\item 1.2\n", "\\item 2.2\n", "\\item 3.2\n", "\\item -2.8\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. -3.8\n", "2. 1.2\n", "3. 2.2\n", "4. 3.2\n", "5. -2.8\n", "\n", "\n" ], "text/plain": [ "[1] -3.8 1.2 2.2 3.2 -2.8" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. -1.22010646822908
  2. 0.385296779440761
  3. 0.706377428974729
  4. 1.0274580785087
  5. -0.89902581869511
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item -1.22010646822908\n", "\\item 0.385296779440761\n", "\\item 0.706377428974729\n", "\\item 1.0274580785087\n", "\\item -0.89902581869511\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. -1.22010646822908\n", "2. 0.385296779440761\n", "3. 0.706377428974729\n", "4. 1.0274580785087\n", "5. -0.89902581869511\n", "\n", "\n" ], "text/plain": [ "[1] -1.2201065 0.3852968 0.7063774 1.0274581 -0.8990258" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x <- c(2,7,8,9,3)\n", "x^2 + 5\n", "\n", "sort(x)\n", "\n", "x - mean(x)\n", "\n", "# 向量标准化(向量减去其均值之后,除以标准差)\n", "(x -mean(x))/sd(x)" ] }, { "cell_type": "markdown", "id": "9adcce5e", "metadata": {}, "source": [ "### 自定义函数\n", "- **创建**,由`function(...)`创建一个函数\n", "- **参数**,由`(...)`里指定参数,比如`function(x)`中的参数为 `x`\n", "- **函数主体**,一般情况下,在`function(...)`后跟随一对大括号`{ }`,在大括号里声明具体函数功能,在代码最后一行,可以用`return`返回计算后的值。当然,如果函数的目的只是返回最后一行代码计算的值,这个`return`可以省略。\n", "- **函数名**,`function() { }`赋值给新对象,比如这里的`my_std`,相当于给函数取一名字,方便以后使用。\n", "- **函数调用**,现在这个函数名字叫`my_std`,需要用这个函数的时候,就调用它的名字`my_std()`。\n", "\n", "和`python`的`def`创建的函数差不多,创建形式不同,但应用方式一样,元素差不多" ] }, { "cell_type": "code", "execution_count": 79, "id": "3ab4d86e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
  1. -1.22010646822908
  2. 0.385296779440761
  3. 0.706377428974729
  4. 1.0274580785087
  5. -0.89902581869511
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item -1.22010646822908\n", "\\item 0.385296779440761\n", "\\item 0.706377428974729\n", "\\item 1.0274580785087\n", "\\item -0.89902581869511\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. -1.22010646822908\n", "2. 0.385296779440761\n", "3. 0.706377428974729\n", "4. 1.0274580785087\n", "5. -0.89902581869511\n", "\n", "\n" ], "text/plain": [ "[1] -1.2201065 0.3852968 0.7063774 1.0274581 -0.8990258" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "y <- c(2,7,8,9,3)\n", "\n", "my_std <- function(x){\n", " (x - mean(x)) / sd(x)\n", "}\n", "\n", "my_std(y)" ] }, { "attachments": { "image.png": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAasAAABSCAYAAADq+/sIAAAaU0lEQVR4Ae1dvWsqTxf+/YPbLAgiSAotJIUpTGMam622sYmNTSzSpQiBIAQECVyQgBCQW5hCUuQWWrymMI3V8zIfZ3b2S01cc9fcc4ur7s7OzjxnM8+eM+fjP/A/RoARYATyjMBqjG69jNNGB0+r7Ae6uKmj4Dg4v5ln37npcYS266Dg+ngyx/jLZxD47zONuS0jwAgwAj8LgSmuyg4KTgP3y8PO7LntSlJsDQ7AuIcdei56Z7LKhRh4EIwAI/BXEJh2UXUcFGo9vB56AI+eJKtCc4CPQ9/rB/bPZPUDhcpTyikCqwluvTpOhDnIcVCq+RgKy9PsDq2KeusuuDVcjo7nzXvRb6nF3nFx6vWxENAvx6F5FtwiztsjvItz6zcM2w2cagzkOX+grouIbTHwcWawaieYAGe4b5ZREmTjltG6m8ke3sc38OtFdVzgXGniepKM6ftdQ8miPY7cPf5zMWjjwsipjFZvgg+s8NQmmbqoNu/SSW95hws51jae493zkS0IMFltAYhPMwKZILB6hF90cOoN8CrWzVkPZ2Lhcly5qJ51p/iY9+EVxbEmhslrayZDyayTuVh8i/Af53hoKgKuNj2cuy5OfTHPNbCe4FKa2RxcdG/gl8WC3sPzfC1YLbiuOw0N6+PRw4lTw9V0jdduJVEj+d0po1D08bTsoyWxLKPl1VFyK2gPZlJ7WU/amkyTMf3lqXFf3G22Ab72aigUG7gez7HGCr+8oiI510Wh6GG4XOK5U5OyrEbmEkxsjEtJvjVcvwVH+dtuCDBZ7YYTt2IE9kBgJRflUss2/8xwXVMLZaF+gwVo70QcO/z+yR6TMZdKspBjB2jRFxrjWU9pONTQPnfiPYZMYJIEBNGEzHBjXBYDh4cnX2udtR7+UKergSQo5RTxCF+SlcCuhuvQ7YNz/iNdTJ+BDOLnqA2ASQdVQZx2v2TSc1y0R8DHoKkIVWhyqVqadb+R1T9/3QkBJqudYOJGjMAeCEgtqoyrkPIwgKcXWLW4z3HfUORVjSzoe9z5gJdO0C0TMc1wVdHE2+yHyEgMYNjS54px85ckvChZSSLw8EuOXnvRRb31ZBtN6rMuTjWWrX5EJV2R1uUgTkik6VTCRBRBTZJtZJ/JaHuOHue4jRMxBrceIctwZ4RFbJzhZvwrAQEmqwRQ+BAjkCkCqymeRtPwIi7f1sUiXkZ3sv1uH+MOzssVtHOzn7XCy2iEF8ENtBfjOPBjGgMRgoPTrq2aiDkvDUEXbJKbjzEcKzfyQGOJaJtWG9p3Kjhe3C183NZ7V0k4k9YV1cbC8liMB3gOebUrTVlokaFxhy9L/EVaZlT7TGzMB0MIMFmF4OAfjMD3IPBH7IFIbYA0iM33/d0WeyQVdCdir2f7v8VdwzgYqPto7UZrINuOleKsk35TYxJr4iGi2Nh7c5dRHwZL60neM1phqPfCNpECEUBSG6O5uXGtDtiNrOITHxmz42dJh8b62eviY/j3jjBZ/Xsy5xn/dQSsN/PGnfKS++tj+voADCGE9p1Ufx992suJE1mgNdVxG9Jc9FjmNzjX5Joem6TMkYJ84wQQnEsm3y+SldGKXcQIeAuMTFZbANpwmslqAzh8ihE4DALpprGP6QhP06h6cphRZNPrm3EUSXIseNLedsKBwjhHyBtbWpN20kDEXEqZJYR5T+5fifODcdjN/S3wqowRh0UqwglC/Aub9IjMNu9ZqSuD/wOzYwvD4DCAOZ6j4wudD/bveM8qAswOP5msdgCJmzAC+yDwMfK1+7Ted7EW0fACqzwCbddnEWt0XqmgasUR7TOWzK+1THlxBwYiAwfVTmRjzrqO0hzJDA9EXJZ3JGlFkrwi6YoCzS1KHABljDApjuTemvLeUzgERBsfOyE1w21deSOetMUc0rVipSkmu8ir3ixvwJhnIt2PP9MQYLJKQ4aPMwKZIGAtbtqt+qVb1vtV4Y19tfBa5jKRXUHEEa31ol/ppgecZjLWL3RiHBgiDhCiK9vxIro4m30ubQIUcWiuA2PuszQmpRUpDGwiF7cwhBQzpwbEQBqfNFeWu3ixpkmaX/KeGbmt6/0+7xHQLvNyz0/8Nv/0+KKkbM6LL6RRh+UeasI/UhFgskqFhk8wAtkgoBbEIrz+HB/TLs6cIqoyUNaFNxDBqCu8ykwQIsA2MAH+7tbQunvTcT4J2kk2w9urF+PCnUSkg5YmZYuA6W6U5shpYbgWQcXFcKyVIYUKrqZLDEUQbsz1PXCZj2luCMyMp90p3scdibsf8aYkkx4RGg3PfGrSLNU6eF7qcZTLyk291sWLiHuej3EltK+EPTvTj/hC5J3o7BFqyT8SEGCySgCFDzECmSKwekTbStPj9+fAeopbShVEKYHGyVkUVFBsJCg10wF+vTPlpeji/Cbqlh4Eylb9UdhtX95uhd+9Jqoyo4PIanEjF357JIu+Z9Iylepd/A54XDebyOBhEdt0G789MH9Eu2ansUrAl0hzA9G89hpBiiw9jvdR26SCopRRf7Y5apI2GYnZsufM39MRYLJKx+Yoz6ynd+j2xpl4mK3fBuj2dE63o0TjBwyaNIx61EHhB8wtF1OgzCEJZsyMx0cmS2PqzLj/n94dk9UPkvDHRJs6LFPSftNTdv+TxobknPvdgK/eggC5d1/0V/gYeDjbJYJ4S598OowAeR2So0f4bFa/uJ7VvkgyWe2L4M7XU0yHi5OyyggdjwvZubN4Q50YNW67jzf91BHttRXN6fapPrjxlxFQcTkNPCxnuKrl0xT45cnl5ULt3FEod/D7QGMir8VM/+YPNNa8dstk9W2SWeNjPsdrv6k2ZxODGL86GLGQOTjUHxv9oXFsyFfl8/XrRJqlM7eIaqWC1s0U27ZFvn6nf/vKwz7j2puzlkNvziMSO5PVtwuLNKykiPuvDYYyVx+OTPQfW9FLqCn0tTHzVYxAvhCgsh8Jnot7DlRl+GB39T1hBJPVvgh++vqsyep7bOHfY9f/NJh8ASOQIQIz3DciLvR79k51ubopxR/37P6fupzJ6tvFnS1Z0QZ8apxIVvOjGJFIUGVW3XM/jEA+EFjiqdfB8C0Lg+sSz70O7qdZ9JUPdP7mKJisvh39LMkqCHxMTxeT1QQpI0C0LlNW/XM/jAAjwAikI5AbsqJsxCURHS4CBb2+DBo8NUGDvXhQ4HyEq2bFBOwV3Aou2gNEg/PWb4+49uqoFkXmgDJKoux2o4VWJXnfSMQXXTZEPjadZkVucDdwOXgLb3Cv3zBsN3Ba1IGHIkO0W8a5d4PnhPhDJYZtZLXCS9/HeZn6dHFS93CbGDBK6Vu2x4h8TG7g15UXoiilfuYPZELQ17tmEHhZa2/ck6I4kdTUNOnPGZ9hBBgBRmAvBHJDVuvVHItZH62iVXen6GE48E1dHtstW9iCq4Icik08zERo+xLPHV0jqGK5oGqXblFS/N1AFbSNupK+9urqfsUGrsdzRU5WJdKLO6plYBWOM8k3V3i5aWypGLqJrGa41kkzzzpjLEQqlzfCpIjWgO6tJ6LnVtiSvkXZzSvwBzOZSYAcMgquIMQarqYrLPotNe4N0fWUmqbgxSrsGWT5CyPACDACh0AgN2RFkyMNS7z9yzxp8ztcSA2nHFRJNRmbi5F6MnPc1hXZUcJLqrUTN5Mpx4QQWVFp6kj1VrNIC3I0C3VQgM2UMJCTCExzBUNiNDvxmU5WNNZC4y5cBmFEhB3J6EzpWzakioHESuSgs3LVEMlRqXBKOSPmF0sIao3dul+43IPVJvrVqkm0reBf+nnxLLDdPwot/2YE/iUEckxW8ZT/JBiVKy05rsgk1tQBfob8dNJJ6kNoYg+tMrw+2esCoouRzGqCe7+Bi2YHvyzl5ne3IrWwqvcYzn1mEnjWcP0W3FF9SyEra1GPu6AP4AkicRzY5wyJtgbRm5jfUouKOkXY4xM51cQLgey/HEqkajqhL6RhHjB4km7Fn4wAI8AI2Ajkl6ySsjjLkdM+jYNC0iJNb/+Oipcgbzn11q72qi57fTxL06EFhSlJkEF2azOGpNiKZLIyxOMkFYIj54bw2Iw5L1SqwJoTgKRifkaD+yzpGI1st1Ls4ZF87690Lc0yM+sXAG7LmPAz4HzvH+gX7pZfskozbZEL9daFhrQaEexH9YPCf5TVppXzzhBMstNFOrZrLCYDXLebOC+XA6cMOb7dyYqcF7b+0RgzJLALWcXHHRScK2wgufh1AI6IrBLHzwcZAUbgaBE4PrIyC6bYP7KLn22WwfuUCIU84hRxnXi6fMEXyOpj0kNL1iVyUKp5uB5M8Ge5Akxfu5OVMVfqAn2bZ6POfomszH6fg0979Rns869Z7YIft2EEGIHjQeD4yMpyUCg0++G9ogTcXwdtXN5Nwu3W88Bz0NFu32Yhtp0oEjqkQ+O2LlXu4Kwbydn2BbIyxBNx7qDbJX1SPrNEc2jSBeKYqewaNTeu8DIa4cXyw4h1wXtWMUj4ACPACHwPAkdIVpa3XZrL9mqEa7+H5xWgNJYkTUDnuzOaDNW1SXbckOKY3qHt3+EFltt60r5PiKzecN8sq4qvspPkPStYHnlpGs9rv43LgVVlbuSpSqxpJlOs8ORrE6j28jP7VVHs5P23BPzSvFLvl/DQWo4jW02cqaZd9gZMQJYPMQL/FAJHSFb23knYO44kJxdk7QFH5jXbi06102Tl+njSF6p4JGUeDLm0y/MrDFsuStLzbzNZGUKQRKicI4L+UshKEIso3S0W7CQC1Oa7UOE2cgqJEg8BYZn8VMlti5BDhLPCQ9PBNk3VOIF8wvxKQ+FPRoARYAT2QSA3ZCWDguciKFY7QZR9/HqbYzGf4yMhxOZj1MapfBMvo9V9xOtKNKKg3GCviMhKBL92KchXtOs1pNt5QCIKRhMU7BRxIfsFQGbDoodf2kz2MfKCUh8dXZlXtBMlsIsVnRXChXd3B8910Br8T5YIWcx6ONcaRNV/xJ/5HO/G9KYTacoy5z4eJnNpvlzPH3FZcRCvKUWekWkZLHQsWLGFh/kKL90aCsWyMl+6LQyF1/5qJl34C9bc0h4ocgJJ0/zSruPjjAAjwAjsi0BuyCoglbDHntA04gG9etrLKR5EuiNKTZSQ6kj0W6r38Gvg49xKi1Qq19GOpk/S3b5P+zLdkkz7JIijnJzGSaUwKpsMGzLVkt+X+z4fj772DCzjvCOcOAKNKmoOCxPmGn8ee/DrOu2U4+KkkpDqSY41MImmYSTGoVJWiXl4eJgD6+kNWoSZ4+K02duQHooeMXKf32IqpOb8yQgwAoxAhgjkhqwynNM/1RXFkXHW9X9K7AmT5WzhCaAc4SGWY5rQmKzSkDma41zP6mhEdbCBzvHQPEQdpgouR8ZGfbDRc8eEAMuRkEj6ZLJKQuXIjpHbe9yJJKuJkDNKsGeXVc/cz74IcIXbfRHMx/Usx21yYLLahtBRnNdkkuRFmMH4VTxXJBluBv1yF/sj8DFoyT3Tw7yoHPa52n/239XDTIaflBwX1c74IDdlOW6HlclqO0bH0UIHNdtlVDIZuHZ/j3siZtI7d7IPAqtH+KIiwYFeUsTQKPA87AS0z6CP8FqKZxRevBtK6Hx5ZizHnaBjstoJpuNopNz5i5szp39qKsoD8KTZD5cs+VQf3PhQCCxu6jIu7/zGKgWQ+c2+Z08082Fn2eFqjG69jNNGZ2Nx0q/ekuW4G3JMVrvhdDSt1tM7dHs67mvPUYuKyd3eyCpauWeHfHmGCFCAd1qMXXa3ovi6UEB6dt3/4z2xHHd9AJisdkWK2zECeUKA0nOFMpEcaICUZusQJrADDfloumU57iwqJqudoeKGx4zAot/SiYddnHrarLkc49arg4K/C24R522tSa7fMBQB57JKtQN5zh9sNofOR7i2+iuVm7ifzvHS7+F6lGyqWwzauKi4Ks2WW0arJ5Iur/DUpnG5CJWy0UKg1Ffb4+vmoXmIMV1PVsBqhMu6rkAg7ntn5ZyMCprK8qSl9Yq2/8bfh5TrYuDjTMu/VGsnmADJ8UI8HwGG7+Mb+IStzEajMU/AheWYAErKISarFGD48A9CQFZCFnt5Io5FZUipNj2cuy5O/YFK1bWe4FKXe7no3sAvC5Lo4Xku0ngtg+u600RgRF7JqtiALzbx8KauGVKuR5leyzM5KKkDGXJQbOBapgEj12UHJdeFSH81XC5ldYCS46AauS9lfNmc+krvOTb0PMRGflHM30XJFem7BnhfjtGtCbLclJmE0npRjTiawV/+PKBcVZ7QGq6ma5jq4xHNUuYALfp4WvbRkjIuo+XVUXIraA9mKlXahKozNDFMCFljOe7+DDFZ7Y4VtzxSBOSiUr+RWhEtDiLlVdTDzT4X9X6kWDaVEDgCxLit80QGOSllCzuRcLSczaSDqlPDla3QkLnNcdEeAZSdRIw1rEFR6qsNqcgAyHnXuni1hmvmKBM4B+m6BIFdpnplW/cbWZ395a+Hk+sYl0UH5Ljy5GvNt9bDH5rzaiAJSrWxU6lFngErzVo8JZqF64bSfD9djgTptk8mq20I8fkjR0DFCilimuGqonNPRskDwLClzxXbeI7MWi4Y4u05tkekFjZJKH50JR/Ak2/cKcQYeVM3b/COLmlDJOjWcW2TGkjTidYkswetFtCwU4Q1f505/7mtMv2L/Jk2qdk9ie+EzWHiuaJ32+X3AeUqXxqorJD2hnQC8pKjk22oFl5XJ9VOqAJhvbDEyYrluIukqQ2TFSHBnz8UAauoJO29iOTIUV4xBODgtBtiBmkGvG8kkxy5HQszWncSgVBqT8rsFtVaFuMBnkPbWLpMiyC3BCIN90xv8tG3eLvVHM+DcXiPzVo4P0s6pJFFtVH7jt/7/YBynY8xHCvhBNptxOvSakP7TgUnbuoNip0mPB9G6/qX5bj7U8NktTtW3PLYETBmtiYeovsHplJ0gjnMWuTDe0SBGaeQ4HxgL2K/tmKny7kkmCfjl+5CVvGrYIJbNy2OCdeBipjGNcTk1sDiTpXgiVYY2PV3Kf42kXYrIHO50q0sM+mGFwgi8qSXDKORJzwf2ImsaCzW5zfK0brrX//KZPXXRcAD+C4EzMIRM+UFmRoKTpzIgrfrOm5tbcgisfhCZWlKukrzxnlu0MLi132NrMz8ExfO+F3sI7Qg50ezCkZn5pWVXKlrq8p12JxKDcSnTkmV+JIRnEsmX5ajjeS270xW2xDi8z8EgTdc15QpL+ysoKb35Gkzn72JLk9Zb9faSQOrKZ5GU3wYbUxoHG8RnGg/IjArxk1/wSWBFtbCMDgMIMGcZxbITXtWoU7CpszWIHzSMmmFTwS/8rdnRWM7gFx114GJV+9fCblHTatUrTvJQcV6AREOM+Jf+BkgMmM5asg3fjBZbYSHT/4YBCwtKL7RTYuGg1huRes68g6TGR0Ecdl7YFFvLmuhkvtVuh917xlu68rD7KQtNrrStTCl1UXdnoMFOj4XLbHZDc5ljFARl/IW5F7tIGzK1GQccfYIyz0wd6beL3zB9/2y5BMf2xflKkdPmSUckFYkyUt6UQbTo9yJBSf6kgFQ5o8CXSOfF+XpqXpgOQZIbv/GZLUdI27xExAYt3VF58hGuZjbJtIx+yHaBKiTjiqzULCgeSFlJVjcjVlR9qNJxxCZg4LwytNu0HI/R3vpKcjVYhsjUACkCYaJJxCUMY3pStuBKTPi7i7HkrT5H/QF43zy+b0uu5eDfD+IXAFYGpPSirQsIvFuhpBipt7gGSBNXsqk3MWLBQTL0QJjy1cmqy0A8emfgYBxC6+E447k7AYtlUEiYb8KlA5HvDmv48XxhCOBJJmGiuNaz8e4ElpTuaxjr+q4fVML10lbBzLphbBU6+B5uYQMHqb2tS5e1oDpJ2EfRoyZzIa0EEal9KdXk8G/Z50x3pcDGQxcLSs39bPuFGussRh3pfa1dR+KyPwLe13RcWX9+1ByDV4gKriaahnFQhqCUID4C0VgPj7tTvE+7uDMKcKPFLNkOe7+RDBZ7Y4VtzxiBH7LeCIX5zdRt/Qg+Lbqj2TWgfA0V/jda6IqTWoiq8WNJJOgzRp/rLQ8Iu3ORUelbFr0PX2dg2gc02uvYdI8lepd/F4B76O2Se9D6Z3+iGQYSf+IRFPIDJjhuqHTKTkuzrsijdMST+2a1jAdlIp1tAdvSLuFuS1plxtNhab1t345nFwBIT9Kt0UyCk9uIoOHC24dt/HHCpg/oi2zg4h0TDVcjpbhy8UvlmMck5QjTFYpwPBhRiDfCJAJMsGsmfHAydSV7hGX8Q3/qe5YjruKm8lqV6S4HSOQMwTIW40cPw4zPK5ndRhcg15ZjgEWm74xWW1Ch88xAnlGgCvM5lk6u4+N5bgTVkxWO8HEjRiBfCJArtOfTZ+022y063ckGe5u13KrzyDActyOFpPVdoy4BSOQYwSotEg888a+g1bu7zl0V993Yrm8nuW4TSxMVtsQ4vOMQO4RmOG+UUzICP/1gVM9p64o1Mj/vgkBluMmoJmsNqHD5xiBo0FgiadeB0NZ+HHfQS/x3OvgfrrVqX3fG/H1MQRYjjFI9AEmqzRk+DgjwAgwAoxAbhBgssqNKHggjAAjwAgwAmkIMFmlIcPHGQFGgBFgBHKDAJNVbkTBA2EEGAFGgBFIQ4DJKg0ZPs4IMAKMACOQGwSYrHIjCh4II8AIMAKMQBoCTFZpyPBxRoARYAQYgdwgwGSVG1HwQBgBRoARYATSEGCySkOGjzMCjAAjwAjkBgEmq9yIggfCCDACjAAjkIYAk1UaMnycEWAEGAFGIDcIMFnlRhQ8EEaAEWAEGIE0BJis0pDh44wAI8AIMAK5QYDJKjei4IEwAowAI8AIpCHAZJWGDB9nBBgBRoARyA0CTFa5EQUPhBFgBBgBRiANASarNGT4OCPACDACjEBuEGCyyo0oeCCMACPACDACaQgwWaUhw8cZAUaAEWAEcoMAk1VuRMEDYQQYAUaAEUhD4P+blA6NMIzX6AAAAABJRU5ErkJggg==" } }, "cell_type": "markdown", "id": "a3c1726a", "metadata": {}, "source": [ "- ![image.png](attachment:image.png)" ] }, { "cell_type": "code", "execution_count": 81, "id": "acb03440", "metadata": {}, "outputs": [], "source": [ "rescale <- function(x){\n", " (x - min(x)) / (max(x) - min(x))\n", "}" ] }, { "cell_type": "markdown", "id": "9a39ced6", "metadata": {}, "source": [ "### 使用宏包的函数\n", "#### 安装宏包与使用宏包\n", "- 安装宏包 `install.packages(\"dplyr\")`,也就是说宏包就是别人已经写好了的函数的集合\n", "- 加载 `library(\"dplyr\")`,和python中的`import pandas`一样的效果‘pandas’为包\n", "#### 指定函数的所属宏包\n", "- 其它宏包可能也有`select()`函数,比如`MASS`和`skimr`,如果同时加载了`dplyr`,`MASS`和`skimr`三个宏包,在程序中使用`select()`函数,就会造成混淆和报错。\n", "- 这个时候就需要给每个函数指定是来源哪个宏包,具体方法就是在宏包和函数之间添加::,\n", "- 比如`dplyr::select()`,`skimr::select()` 或者`MASS::select()`。" ] }, { "cell_type": "markdown", "id": "4b4ac114", "metadata": {}, "source": [ "### 获取帮助" ] }, { "cell_type": "code", "execution_count": 84, "id": "fd83004a", "metadata": {}, "outputs": [], "source": [ "?sqrt()\n", "?scale()" ] }, { "cell_type": "code", "execution_count": 97, "id": "5b1903f6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "10" ], "text/latex": [ "10" ], "text/markdown": [ "10" ], "text/plain": [ "[1] 10" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "10" ], "text/latex": [ "10" ], "text/markdown": [ "10" ], "text/plain": [ "[1] 10" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "get_var <- function(x){\n", " n <- length(x)\n", " y <- mean(x)\n", " var2 <- sum((x - y)^2) / (n - 1)\n", " return(var2)\n", "}\n", "\n", "var(c(2,4,6,8,10))\n", "get_var(c(2,4,6,8,10))" ] }, { "cell_type": "code", "execution_count": 91, "id": "c13cd5f6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "0.00238751147842057" ], "text/latex": [ "0.00238751147842057" ], "text/markdown": [ "0.00238751147842057" ], "text/plain": [ "[1] 0.002387511" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "get_bmi <- function(height, weight){\n", " weight / height^2\n", "}\n", "get_bmi(165, 65)" ] }, { "cell_type": "code", "execution_count": 108, "id": "64c10e1f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "'NA'" ], "text/latex": [ "'NA'" ], "text/markdown": [ "'NA'" ], "text/plain": [ "[1] \"NA\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "mean_above_threshod <- function(vector, threshod){\n", " upthreshod <- vector[vector > threshod]\n", " if(length(upthreshod) == 0){\n", " return(\"NA\")\n", " }else{return(mean(upthreshod))\n", " }\n", "}\n", "\n", "vector <- c(2,3,4,5,6)\n", "threshod <- 6\n", "mean_above_threshod(vector, threshod)" ] }, { "cell_type": "markdown", "id": "dc7be121", "metadata": {}, "source": [ "## 函数应用\n", "### 灵活的语法\n", "- R语言中,完成一件事情往往有很多种方法\n", "\n", "### 多个参数\n", "\n", "### 条件语句\n", "- `if-else`语句\n", " - `if(condition){\n", " Do something\n", "} else{\n", " Alternative something\n", "}`\n", "\n", "### 返回多个结果\n", "- 如果要返回多个统计结果,可以把结果先放在`list`或者`data.frame`中,然后再返回\n", "\n", "### 函数默认值\n", "- `function(multi = 10)`这样的表达式" ] }, { "cell_type": "code", "execution_count": 109, "id": "0ea283eb", "metadata": {}, "outputs": [], "source": [ "### 灵活的语法\n", "mysquare <- function(x){\n", " y <- x^2\n", " return(y)\n", "}\n", "\n", "mysquare2 <- function(x){\n", " return(x^2)\n", "}\n", "\n", "mysquare3 <- function(x){ return(x^2) }\n", "\n", "mysquare4 <- function(x) return(x^2)\n", "\n", "mysquare5 <- function(x){\n", " x^2\n", "}\n", "\n", "mysquare6 <- function(x) x^2" ] }, { "cell_type": "code", "execution_count": 111, "id": "64b1d30a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "3" ], "text/latex": [ "3" ], "text/markdown": [ "3" ], "text/plain": [ "[1] 3" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "3" ], "text/latex": [ "3" ], "text/markdown": [ "3" ], "text/plain": [ "[1] 3" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "### 多个参数\n", "sum_two <- function(num1, num2){\n", " sum <- num1 + num2\n", " return(sum)\n", "}\n", "sum_two(num1 = 1, num2 = 2)\n", "sum_two(1, 2)" ] }, { "cell_type": "code", "execution_count": 114, "id": "71ba20ff", "metadata": {}, "outputs": [ { "data": { "text/html": [ "'Your input is not numeric.'" ], "text/latex": [ "'Your input is not numeric.'" ], "text/markdown": [ "'Your input is not numeric.'" ], "text/plain": [ "[1] \"Your input is not numeric.\"" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "9" ], "text/latex": [ "9" ], "text/markdown": [ "9" ], "text/plain": [ "[1] 9" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "### 条件语句\n", "square_if <- function(num){\n", " if (is.numeric(num)){\n", " num^2\n", " } else{\n", " \"Your input is not numeric.\"\n", " }\n", "}\n", "\n", "square_if(\"a\")\n", "square_if(3)" ] }, { "cell_type": "code", "execution_count": 119, "id": "dca917c0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "3" ], "text/latex": [ "3" ], "text/markdown": [ "3" ], "text/plain": [ "[1] 3" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Your input is not numeric.\"\n" ] } ], "source": [ "sum_two <- function(num1, num2){\n", " if(is.numeric(num1) & is.numeric(num2)){\n", " sum <- num1 + num2\n", " return(sum)\n", " } else{\n", " print(\"Your input is not numeric.\")\n", " } \n", "}\n", "sum_two(1, 2)\n", "sum_two(1,\"x\")" ] }, { "cell_type": "code", "execution_count": 124, "id": "710f870f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Zero\"\n" ] } ], "source": [ "### 多个条件\n", "check_number <- function(x){\n", " if (x < 0){\n", " print(\"Negative number\")\n", " } else if (x > 0){\n", " print(\"Positive number\")\n", " } else{\n", " print(\"Zero\")\n", " }\n", "}\n", "\n", "x <- 0\n", "check_number(x)" ] }, { "cell_type": "code", "execution_count": 126, "id": "f294a534", "metadata": {}, "outputs": [], "source": [ "### 返回多个结果\n", "mystat <- function(x){\n", " meanval <- mean(x)\n", " sdval <- sd(x)\n", " list(sd=sdval, mean=meanval)\n", "}\n", "\n", "mystat2 <- function(x){\n", " meanval <- mean(x)\n", " sdval <- sd(x)\n", " data.frame(\n", " sd=sdval,\n", " mean=meanval\n", " )\n", "}" ] }, { "cell_type": "markdown", "id": "11793a30", "metadata": {}, "source": [ "## 子集选取\n", "- 对象就是我们在计算机里新建了存储空间,好比一个盒子, 我们可以往盒子里面装东西(赋值),可以查看里面的内容或者对里面的内容做计算(函数),也可以从盒子里取出**部分东西**(子集选取)\n", "- **子集选取**,就是从盒子里取东西出来\n", "\n", "\n", "### 向量\n", "- 对于原子型向量,我们有至少四种选取子集的方法\n", " - 正整数: 指定向量元素中的位置 `x[1]`\n", " - 负整数:删除指定位置的元素 `x[-2]`\n", " - 逻辑向量:将`TRUE`对应位置的元素提取出来`x[x > 3]`\n", " - 我们可以用命名向量,返回其对应位置的向量\n", "### 列表\n", "- 对列表取子集,和向量的方法一样。向量的子集仍然是向量,使用`[`提取列表的子集,总是返回列表\n", "- 如果想提取列表某个元素的值,需要使用`[[`\n", "- 取出`one`位置上的元素,需要写`[[\"one\"]]`,太麻烦了,所以用`$`来简写\n", "- `x$y` 是 `x[[\"y\"]]`的简写" ] }, { "cell_type": "code", "execution_count": 137, "id": "da5d4409", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\t
$one
\n", "\t\t
\n", "
  1. 'a'
  2. 'b'
  3. 'c'
\n", "
\n", "\t
$two
\n", "\t\t
\n", "
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
\n", "
\n", "\t
$three
\n", "\t\t
\n", "
  1. TRUE
  2. FALSE
\n", "
\n", "
\n" ], "text/latex": [ "\\begin{description}\n", "\\item[\\$one] \\begin{enumerate*}\n", "\\item 'a'\n", "\\item 'b'\n", "\\item 'c'\n", "\\end{enumerate*}\n", "\n", "\\item[\\$two] \\begin{enumerate*}\n", "\\item 1\n", "\\item 2\n", "\\item 3\n", "\\item 4\n", "\\item 5\n", "\\end{enumerate*}\n", "\n", "\\item[\\$three] \\begin{enumerate*}\n", "\\item TRUE\n", "\\item FALSE\n", "\\end{enumerate*}\n", "\n", "\\end{description}\n" ], "text/markdown": [ "$one\n", ": 1. 'a'\n", "2. 'b'\n", "3. 'c'\n", "\n", "\n", "\n", "$two\n", ": 1. 1\n", "2. 2\n", "3. 3\n", "4. 4\n", "5. 5\n", "\n", "\n", "\n", "$three\n", ": 1. TRUE\n", "2. FALSE\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "$one\n", "[1] \"a\" \"b\" \"c\"\n", "\n", "$two\n", "[1] 1 2 3 4 5\n", "\n", "$three\n", "[1] TRUE FALSE\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "$one = \n", "
  1. 'a'
  2. 'b'
  3. 'c'
\n" ], "text/latex": [ "\\textbf{\\$one} = \\begin{enumerate*}\n", "\\item 'a'\n", "\\item 'b'\n", "\\item 'c'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "**$one** = 1. 'a'\n", "2. 'b'\n", "3. 'c'\n", "\n", "\n" ], "text/plain": [ "$one\n", "[1] \"a\" \"b\" \"c\"\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "$one = \n", "
  1. 'a'
  2. 'b'
  3. 'c'
\n" ], "text/latex": [ "\\textbf{\\$one} = \\begin{enumerate*}\n", "\\item 'a'\n", "\\item 'b'\n", "\\item 'c'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "**$one** = 1. 'a'\n", "2. 'b'\n", "3. 'c'\n", "\n", "\n" ], "text/plain": [ "$one\n", "[1] \"a\" \"b\" \"c\"\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. 'a'
  2. 'b'
  3. 'c'
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 'a'\n", "\\item 'b'\n", "\\item 'c'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 'a'\n", "2. 'b'\n", "3. 'c'\n", "\n", "\n" ], "text/plain": [ "[1] \"a\" \"b\" \"c\"" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. 'a'
  2. 'b'
  3. 'c'
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 'a'\n", "\\item 'b'\n", "\\item 'c'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 'a'\n", "2. 'b'\n", "3. 'c'\n", "\n", "\n" ], "text/plain": [ "[1] \"a\" \"b\" \"c\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "l <- list(\n", "\"one\" = c(\"a\", \"b\", \"c\"),\n", "\"two\" = c(1:5),\n", "\"three\" = c(TRUE, FALSE)\n", ")\n", "l \n", "l[1] # 使用位置索引\n", "l[\"one\"] # 使用元素名\n", "l[[1]] # 提取列表某个元素的值\n", "l$one" ] }, { "cell_type": "markdown", "id": "a089a0cf", "metadata": {}, "source": [ "### 矩阵\n", "- `x[a,b]`用于选取,`a`为行方向,`b`为列方向\n", "- `x[a, ]`,表示列方向选取所有列,反之\n", "- `x[, ]` `x[]`,表示取整个矩阵" ] }, { "cell_type": "code", "execution_count": 144, "id": "2a691774", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
A matrix: 3 × 3 of type int
123
456
789
\n" ], "text/latex": [ "A matrix: 3 × 3 of type int\n", "\\begin{tabular}{lll}\n", "\t 1 & 2 & 3\\\\\n", "\t 4 & 5 & 6\\\\\n", "\t 7 & 8 & 9\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A matrix: 3 × 3 of type int\n", "\n", "| 1 | 2 | 3 |\n", "| 4 | 5 | 6 |\n", "| 7 | 8 | 9 |\n", "\n" ], "text/plain": [ " [,1] [,2] [,3]\n", "[1,] 1 2 3 \n", "[2,] 4 5 6 \n", "[3,] 7 8 9 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "
A matrix: 2 × 2 of type int
23
56
\n" ], "text/latex": [ "A matrix: 2 × 2 of type int\n", "\\begin{tabular}{ll}\n", "\t 2 & 3\\\\\n", "\t 5 & 6\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A matrix: 2 × 2 of type int\n", "\n", "| 2 | 3 |\n", "| 5 | 6 |\n", "\n" ], "text/plain": [ " [,1] [,2]\n", "[1,] 2 3 \n", "[2,] 5 6 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. 1
  2. 2
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 1\n", "\\item 2\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 1\n", "2. 2\n", "\n", "\n" ], "text/plain": [ "[1] 1 2" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "
A matrix: 2 × 3 of type int
123
456
\n" ], "text/latex": [ "A matrix: 2 × 3 of type int\n", "\\begin{tabular}{lll}\n", "\t 1 & 2 & 3\\\\\n", "\t 4 & 5 & 6\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A matrix: 2 × 3 of type int\n", "\n", "| 1 | 2 | 3 |\n", "| 4 | 5 | 6 |\n", "\n" ], "text/plain": [ " [,1] [,2] [,3]\n", "[1,] 1 2 3 \n", "[2,] 4 5 6 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
A matrix: 3 × 3 of type int
123
456
789
\n" ], "text/latex": [ "A matrix: 3 × 3 of type int\n", "\\begin{tabular}{lll}\n", "\t 1 & 2 & 3\\\\\n", "\t 4 & 5 & 6\\\\\n", "\t 7 & 8 & 9\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A matrix: 3 × 3 of type int\n", "\n", "| 1 | 2 | 3 |\n", "| 4 | 5 | 6 |\n", "| 7 | 8 | 9 |\n", "\n" ], "text/plain": [ " [,1] [,2] [,3]\n", "[1,] 1 2 3 \n", "[2,] 4 5 6 \n", "[3,] 7 8 9 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "a <- matrix(1:9, nrow=3, byrow=TRUE)\n", "a\n", "\n", "a[1:2, 2:3] # 我们取第1行到第2行的2-3列\n", "a[1, 1:2] # 默认情况下, [ 会将获取的数据,以尽可能低的维度形式呈现\n", "a[1:2, ] # 列方向,选取所有列\n", "a[]" ] }, { "cell_type": "markdown", "id": "04cf8106", "metadata": {}, "source": [ "### 数据框\n", "- 数据框具有`list`和`matrix`的双重属性,因此\n", " - 当选取数据框的某几列的时候,可以和list一样,指定元素位置索引,比如`df[1:2]`选取前两列\n", " - 也可以像矩阵一样,按照行和列的标识选取,比如`df[1:3, ]`选取前三行的所有列\n", " - 当遇到单行或单列的时候,也和矩阵一样,数据会降维\n", " - 如果想避免降维,需要多写一句话`df[, \"x\", drop = FALSE]`" ] }, { "cell_type": "code", "execution_count": 16, "id": "009a6add", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 4 × 3
xyz
<int><int><chr>
14a
23b
32c
41d
\n" ], "text/latex": [ "A data.frame: 4 × 3\n", "\\begin{tabular}{lll}\n", " x & y & z\\\\\n", " & & \\\\\n", "\\hline\n", "\t 1 & 4 & a\\\\\n", "\t 2 & 3 & b\\\\\n", "\t 3 & 2 & c\\\\\n", "\t 4 & 1 & d\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 4 × 3\n", "\n", "| x <int> | y <int> | z <chr> |\n", "|---|---|---|\n", "| 1 | 4 | a |\n", "| 2 | 3 | b |\n", "| 3 | 2 | c |\n", "| 4 | 1 | d |\n", "\n" ], "text/plain": [ " x y z\n", "1 1 4 a\n", "2 2 3 b\n", "3 3 2 c\n", "4 4 1 d" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 4 × 1
x
<int>
1
2
3
4
\n" ], "text/latex": [ "A data.frame: 4 × 1\n", "\\begin{tabular}{l}\n", " x\\\\\n", " \\\\\n", "\\hline\n", "\t 1\\\\\n", "\t 2\\\\\n", "\t 3\\\\\n", "\t 4\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 4 × 1\n", "\n", "| x <int> |\n", "|---|\n", "| 1 |\n", "| 2 |\n", "| 3 |\n", "| 4 |\n", "\n" ], "text/plain": [ " x\n", "1 1\n", "2 2\n", "3 3\n", "4 4" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df <- data.frame(\n", "x=1:4,\n", "y=4:1,\n", "z=c(\"a\", \"b\", \"c\", \"d\")\n", ")\n", "df\n", "df[\"x\"]" ] }, { "cell_type": "code", "execution_count": 2, "id": "8a5f279d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 4 × 2
xy
<int><int>
14
23
32
41
\n" ], "text/latex": [ "A data.frame: 4 × 2\n", "\\begin{tabular}{ll}\n", " x & y\\\\\n", " & \\\\\n", "\\hline\n", "\t 1 & 4\\\\\n", "\t 2 & 3\\\\\n", "\t 3 & 2\\\\\n", "\t 4 & 1\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 4 × 2\n", "\n", "| x <int> | y <int> |\n", "|---|---|\n", "| 1 | 4 |\n", "| 2 | 3 |\n", "| 3 | 2 |\n", "| 4 | 1 |\n", "\n" ], "text/plain": [ " x y\n", "1 1 4\n", "2 2 3\n", "3 3 2\n", "4 4 1" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. 1
  2. 2
  3. 3
  4. 4
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 1\n", "\\item 2\n", "\\item 3\n", "\\item 4\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 1\n", "2. 2\n", "3. 3\n", "4. 4\n", "\n", "\n" ], "text/plain": [ "[1] 1 2 3 4" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. 1
  2. 2
  3. 3
  4. 4
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 1\n", "\\item 2\n", "\\item 3\n", "\\item 4\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 1\n", "2. 2\n", "3. 3\n", "4. 4\n", "\n", "\n" ], "text/plain": [ "[1] 1 2 3 4" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# like a list\n", "df[1:2]\n", "df[[\"x\"]]\n", "df$x" ] }, { "cell_type": "code", "execution_count": 9, "id": "5f94f1b8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 4 × 2
xz
<int><chr>
1a
2b
3c
4d
\n" ], "text/latex": [ "A data.frame: 4 × 2\n", "\\begin{tabular}{ll}\n", " x & z\\\\\n", " & \\\\\n", "\\hline\n", "\t 1 & a\\\\\n", "\t 2 & b\\\\\n", "\t 3 & c\\\\\n", "\t 4 & d\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 4 × 2\n", "\n", "| x <int> | z <chr> |\n", "|---|---|\n", "| 1 | a |\n", "| 2 | b |\n", "| 3 | c |\n", "| 4 | d |\n", "\n" ], "text/plain": [ " x z\n", "1 1 a\n", "2 2 b\n", "3 3 c\n", "4 4 d" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 4 × 3
xyz
<int><int><chr>
14a
23b
32c
41d
\n" ], "text/latex": [ "A data.frame: 4 × 3\n", "\\begin{tabular}{lll}\n", " x & y & z\\\\\n", " & & \\\\\n", "\\hline\n", "\t 1 & 4 & a\\\\\n", "\t 2 & 3 & b\\\\\n", "\t 3 & 2 & c\\\\\n", "\t 4 & 1 & d\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 4 × 3\n", "\n", "| x <int> | y <int> | z <chr> |\n", "|---|---|---|\n", "| 1 | 4 | a |\n", "| 2 | 3 | b |\n", "| 3 | 2 | c |\n", "| 4 | 1 | d |\n", "\n" ], "text/plain": [ " x y z\n", "1 1 4 a\n", "2 2 3 b\n", "3 3 2 c\n", "4 4 1 d" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 4 × 1
x
<int>
1
2
3
4
\n" ], "text/latex": [ "A data.frame: 4 × 1\n", "\\begin{tabular}{l}\n", " x\\\\\n", " \\\\\n", "\\hline\n", "\t 1\\\\\n", "\t 2\\\\\n", "\t 3\\\\\n", "\t 4\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 4 × 1\n", "\n", "| x <int> |\n", "|---|\n", "| 1 |\n", "| 2 |\n", "| 3 |\n", "| 4 |\n", "\n" ], "text/plain": [ " x\n", "1 1\n", "2 2\n", "3 3\n", "4 4" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# like a matrix\n", "df[,c(\"x\",\"z\")]\n", "df[,1:3]\n", "df[, \"x\", drop=FALSE]" ] }, { "cell_type": "markdown", "id": "195f5f78", "metadata": {}, "source": [ "### 增强型数据框\n", "- `tibble`是增强型的`data.frame`,\n", "- 选取`tibble`的行或者列,即使遇到单行或者单列的时候,数据也不会降维,总是返回`tibble`,即仍然是数据框的形式。" ] }, { "cell_type": "code", "execution_count": 10, "id": "26c87dd8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 4 × 3
xyz
<int><int><chr>
14a
23b
32c
41d
\n" ], "text/latex": [ "A tibble: 4 × 3\n", "\\begin{tabular}{lll}\n", " x & y & z\\\\\n", " & & \\\\\n", "\\hline\n", "\t 1 & 4 & a\\\\\n", "\t 2 & 3 & b\\\\\n", "\t 3 & 2 & c\\\\\n", "\t 4 & 1 & d\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 4 × 3\n", "\n", "| x <int> | y <int> | z <chr> |\n", "|---|---|---|\n", "| 1 | 4 | a |\n", "| 2 | 3 | b |\n", "| 3 | 2 | c |\n", "| 4 | 1 | d |\n", "\n" ], "text/plain": [ " x y z\n", "1 1 4 a\n", "2 2 3 b\n", "3 3 2 c\n", "4 4 1 d" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tb <- tibble::tibble(\n", "x=1:4,\n", "y=4:1,\n", "z=c(\"a\", \"b\", \"c\", \"d\"))\n", "tb" ] }, { "cell_type": "code", "execution_count": 13, "id": "d5458ba9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 4 × 1
x
<int>
1
2
3
4
\n" ], "text/latex": [ "A tibble: 4 × 1\n", "\\begin{tabular}{l}\n", " x\\\\\n", " \\\\\n", "\\hline\n", "\t 1\\\\\n", "\t 2\\\\\n", "\t 3\\\\\n", "\t 4\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 4 × 1\n", "\n", "| x <int> |\n", "|---|\n", "| 1 |\n", "| 2 |\n", "| 3 |\n", "| 4 |\n", "\n" ], "text/plain": [ " x\n", "1 1\n", "2 2\n", "3 3\n", "4 4" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 4 × 1
x
<int>
1
2
3
4
\n" ], "text/latex": [ "A tibble: 4 × 1\n", "\\begin{tabular}{l}\n", " x\\\\\n", " \\\\\n", "\\hline\n", "\t 1\\\\\n", "\t 2\\\\\n", "\t 3\\\\\n", "\t 4\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 4 × 1\n", "\n", "| x <int> |\n", "|---|\n", "| 1 |\n", "| 2 |\n", "| 3 |\n", "| 4 |\n", "\n" ], "text/plain": [ " x\n", "1 1\n", "2 2\n", "3 3\n", "4 4" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tb[\"x\"]\n", "tb[,\"x\"]" ] }, { "cell_type": "code", "execution_count": 19, "id": "2620c5cd", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
A matrix: 3 × 3 of type int
147
258
369
\n" ], "text/latex": [ "A matrix: 3 × 3 of type int\n", "\\begin{tabular}{lll}\n", "\t 1 & 4 & 7\\\\\n", "\t 2 & 5 & 8\\\\\n", "\t 3 & 6 & 9\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A matrix: 3 × 3 of type int\n", "\n", "| 1 | 4 | 7 |\n", "| 2 | 5 | 8 |\n", "| 3 | 6 | 9 |\n", "\n" ], "text/plain": [ " [,1] [,2] [,3]\n", "[1,] 1 4 7 \n", "[2,] 2 5 8 \n", "[3,] 3 6 9 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "7" ], "text/latex": [ "7" ], "text/markdown": [ "7" ], "text/plain": [ "[1] 7" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "mt <- matrix(1:9, nrow=3)\n", "mt\n", "mt[1,3]" ] }, { "cell_type": "code", "execution_count": 29, "id": "dbebc12a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
  1. 3
  2. 5
  3. 0
  4. 2
  5. 0
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 3\n", "\\item 5\n", "\\item 0\n", "\\item 2\n", "\\item 0\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 3\n", "2. 5\n", "3. 0\n", "4. 2\n", "5. 0\n", "\n", "\n" ], "text/plain": [ "[1] 3 5 0 2 0" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x <- c(3, 5, NA, 2, NA)\n", "x_missing <- is.na(x)\n", "x[x_missing] <- 0\n", "x" ] }, { "cell_type": "markdown", "id": "a0004a37", "metadata": {}, "source": [ "## 可重复性文档\n", "### 什么是`Rmarkdown`\n", "- ![imag](image/rmarkdown.png)\n", "### `markdown` 基本语法\n", "- 如果想制作简单表格,\n", " - 列与列之间用 `|` 分隔,表格的首行下面添加`--------`\n", "- 如果想添加网页链接,可以用方括号和圆括号\n", " - `[Download R](http://www.r-project.org/)`" ] }, { "cell_type": "code", "execution_count": null, "id": "96140bd6", "metadata": {}, "outputs": [], "source": [ "# This is a title\n", "\n", "# 第一章 (注意 \"#\" 与 \"第一章\"之间有空格)\n", "## 第一节 (同上,\"##\" 与 \"第一节\"之间有空格)\n", "\n", "This is a sentence.\n", "\n", "Now a list begins:\n", " \n", "- no importance\n", "- again\n", "- repeat\n", " \n", "A numbered list:\n", " \n", "1. first\n", "2. second\n", "\n", "__bold__, _italic_, ~~strike through~~" ] }, { "cell_type": "markdown", "id": "896e8afd", "metadata": {}, "source": [ "Table Header | Second Header\n", "------------- | -------------\n", "Cell 1, 1 | Cell 2, 1\n", "Cell 1, 2 | Cell 2, 2" ] }, { "cell_type": "markdown", "id": "bdf6725d", "metadata": {}, "source": [ "### R Code chunks\n", "- 在`Rmd`文档中写`R`代码,需要插入代码块(`Code chunks`),具体以 ` ```{r} ` 开始,以 ` ``` `结尾。可以用快捷方式`Ctrl + Alt + I` (OS X: `Cmd + Option + I`)创建代码块。" ] }, { "cell_type": "markdown", "id": "b259931d", "metadata": {}, "source": [ "### 生成`html`文档\n", "- 希望`html`文档有章节号、目录或者更好显示表格,可以修改`yaml`头文件(用下面的内容替换`Rmarkdown`的头文件)" ] }, { "cell_type": "code", "execution_count": null, "id": "3563059f", "metadata": {}, "outputs": [], "source": [ "---\n", "title: Habits\n", "author: John Doe\n", "date: \"2023-07-18\"\n", "output: \n", " html_document:\n", " df_print: paged\n", " toc: yes\n", " number_sections: yes\n", "---" ] }, { "cell_type": "markdown", "id": "86f96673", "metadata": {}, "source": [ "### 生成`word`文档\n", "### 生成`pdf`文档" ] }, { "cell_type": "markdown", "id": "a1cc5b23", "metadata": {}, "source": [ "## 读取数据\n", "- 在学习R语言过程中,除了使用内置的数据集外,我们更多的需要导入外部数据\n", "### 数据科学中的文件管理\n", "- 把项目所需的文件(代码、数据、图片等),放在一个文件夹里\n", "- 项目文件结构\n", " - ![image.png](image/Rproject.png)" ] }, { "cell_type": "markdown", "id": "a7bc587b", "metadata": {}, "source": [ "- 读取文件\n", " - 事实上,R语言提供了很多读取数据的函数。下表列出了常见文件格式的读取方法\n", "\n", "文件格式 | R函数\n", "----------- | ------------\n", "txt\t| read.table()\n", ".csv\t| read.csv() and readr::read_csv()\n", ".xls and .xlsx\t| readxl::read_excel() and openxlsx::read.xlsx()\n", ".sav(SPSS files)\t| haven::read_sav() and foreign::read.spss()\n", ".Rdata or rda\t| load()\n", ".rds\t| readRDS() and readr::read_rds()\n", ".dta\t| haven::read_dta() and haven::read_stata()\n", ".sas7bdat(SAS files)\t| haven::read_sas()\n", "Internet\t| download.file()" ] }, { "cell_type": "markdown", "id": "9001bfb3", "metadata": {}, "source": [ "### `here`宏包\n", "- 强大的`here`宏包,`here()`会告诉我们当前所在的目录\n", "- 指向某个文件的路径信息\n", "- `here`宏包的好处还在于,在不同的电脑和文件结构下,代码都能运行" ] }, { "cell_type": "code", "execution_count": 32, "id": "c7b969de", "metadata": {}, "outputs": [ { "data": { "text/html": [ "'/public/home/sll'" ], "text/latex": [ "'/public/home/sll'" ], "text/markdown": [ "'/public/home/sll'" ], "text/plain": [ "[1] \"/public/home/sll\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "here::here()" ] }, { "cell_type": "code", "execution_count": 36, "id": "f3cfd0b9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "'/public/home/sll/software/smoove'" ], "text/latex": [ "'/public/home/sll/software/smoove'" ], "text/markdown": [ "'/public/home/sll/software/smoove'" ], "text/plain": [ "[1] \"/public/home/sll/software/smoove\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "here::here(\"software\", \"smoove\")\n", "d <- read.table(file= \"./data/txt_file.txt\", header = TRUE)" ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "4.2.0" } }, "nbformat": 4, "nbformat_minor": 5 }