目录

  • 1 概率论的基本概念
    • 1.1 随机试验
    • 1.2 样本空间、随机事件
    • 1.3 频率与概率
    • 1.4 等可能概型(古典概型)
    • 1.5 条件概率
    • 1.6 独立性
  • 2 随机变量及其分布
    • 2.1 随机变量
    • 2.2 离散型随机变量及其分布
    • 2.3 随机变量的分布函数
    • 2.4 连续型随机变量及其概率密度
    • 2.5 随机变量的函数的分布
  • 3 多维随机变量及其分布
    • 3.1 二维随机变量
    • 3.2 边缘分布
    • 3.3 新建目录
    • 3.4 新建目录
    • 3.5 二维随机变量的特征数
  • 4 随机变量的数字特征
    • 4.1 数学期望
    • 4.2 随机变量的数字特征
    • 4.3 协方差及相关系数
    • 4.4 矩、协方差矩阵
  • 5 大数定律与中心极限定理
    • 5.1 大数定律
    • 5.2 中心极限定理
  • 6 统计量及其分布
    • 6.1 样本数据的整理与显示
    • 6.2 统计量及其分布
  • 7 参数估计
    • 7.1 点估计得几种方法
    • 7.2 点估计的评价标准
    • 7.3 区间估计
  • 8 假设检验
    • 8.1 假设检验的基本思想与概念
    • 8.2 正态总体参数假设检验
  • 9 基于R语言的实验
    • 9.1 R语言介绍
    • 9.2 R软件下载与安装
    • 9.3 初识R软件
    • 9.4 蒲丰投针的计算
    • 9.5 同一天生日的计算
    • 9.6 抛硬币和骰子
    • 9.7 两点分布
    • 9.8 二项分布
    • 9.9 泊松分布
    • 9.10 正态分布
    • 9.11 指数分布
正态分布

9.10 The Normal Distribution

The normal distribution, also known as a Gaussian distribution, is a very important distribution in statistics (we will see why below). It is known for its iconic bell shape.

Random Samples

We can sample from the normal distribution using the rnorm() function. It takes three arguments:

  • n: how many data points we want to sample

  • mean: the population (theoretic) mean

  • sd: the population (theoretic) standard deviation

rnorm(n = 5, mean = 5, sd = 2)
## [1] 4.564656 7.496730 6.937214 5.514461 5.985538

There is a special normal distribution called the standard normal which is simply a normal distribution with mean 0 and standard deviation 1. These are the default values of norm functions, so we can sample from the standard normal very easily.

rnorm(n = 5)
## [1]  2.2130967  1.7633171  0.2901826 -1.3893091  2.1754976

The normal distribution is a continuous distribution, so we shouldn’t visualize samples from it with a standard bar plot. (Why not?) Instead, a histogram will be more suitable.

samp <- rnorm(1000)
hist(samp, freq = FALSE, 
     main = "Histogram of Normal data")

We can also plot an approximation to the continuous density based on our sample.

plot(density(samp), 
     xlab = "x", ylab = "Density",
     main = "Approximate Distribution")

This approximation is achieved by choosing a “kernel” along with an appropriately sized bandwidth (similar to choosing the width of bars in a histogram). We won’t worry much about the details, but just know that different kernels and bandwidths will give different density approximations, so be aware of this, and try out some choices to see what differences you get (?density to see how to modify the default values).

Density Functions

If our sample had an infinite number of draws from the normal distribution, we would get an extremely smooth bell-shaped curve. It looks like this:

curve(dnorm(x), 
      xlim = c(-3, 3),
      main = "The Standard Normal Distribution", ylab = "Density")

Here dnorm() is the density function of normal distribution. The curve() function is used to plot a smooth curve. It takes a function f as an input (here the normal PDF), computes the function values f(x) at numerous different x values selected from its domain, and then plots these (x,f(x)) pairs and connects them with line segments.

This distribution is always bell-shaped (it is sometimes called the normal curve). However the location and spread are determined by two parameters: The mean parameter (or μ) specifies the center of the distribution, and the sd parameter (or σ) controls the spread (tall and narrow vs. wide and flatter).

curve(dnorm(x, mean = 2, sd = 0.5), 
      xlim = c(-4, 4), col = "red",
      main = "The Normal Distribution", ylab = "Density")
curve(dnorm(x, mean = -1, sd = 1),
      add = TRUE, 
      col = "blue")
text(x = c(-1, 2), y = c(0.2, 0.4),         # adds some text to the plot
     labels = c("N(-1, 1.0)", "N(2, 0.5)"),
     col = c("blue", "red"))

Cumulative Distribution Functions

The (standard) normal CDF looks like this,

curve(pnorm(x), xlim = c(-3, 3),
      main = "The Standard Normal CDF", ylab = "Probability")

Normality

We can also check whether our data looks normal using a Q-Q plot.

qqnorm(samp, pch = 16, 
       col = rgb(0, 0, 0, alpha = 0.5)) #transparent grey
qqline(samp, 
       col = "red")

Here our data closely follows the ideal line, which confirms that our sampled data is approximately normally distributed.