R 语言学习笔记(3)

hxy    2019-06-15 19:23

在学习R语言之前,了解一下R语言:

“一个用于统计计算和绘图的自由软件环境”——R官方网站

目前,R语言支持四套图形系统,基础图形(base),网格图形(grid),lattice图形和ggplot2( since 2005).

CRAN : comprehensive R Archive Network的简称,主站在Australia,如果需要最新的package, 可以选择这个。国内的几个镜像是Hong Kang, Guangdong, Shanghai, Lanzhou. 有些时候,R安装包报错,基本上有条件的同学可以直接翻墙连接主站安装就好了

了解了一点,来看看读数据吧。

# 读取文件内容,不包含第一行定义,分隔符为空格
data <-read.csv("datafile.csv", header=FALSE, sep=" ")

如果简单的数组,可以直接用下面的方法读取:

# 为读取的数据手动赋值
names(data)<- c("c1", "c2", "c3")

那么问题来了,向量的表示是c, 为什么是c呢,它是 “concatenate”(连结)的简写,含义是把各分项首位相连。
---
接下来看几个绘图的例子:

 示例1:R语言绘制散点图

# Example 1: 散点图
library(ggplot2)
qplot(mtcars$wt, mtcars$mpg)

有的小伙伴可能会有疑问,这个mtcars数据时哪里来的?为什么两行代码就能画出数据图?

Reply: mtcars是R语言的内置数据集,在R语言的控制台输入mtcars就可以查看数据集了。而 $ 符号可以读取其后面对应的那一列数据。

> mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
>

那么问题来了,怎么知道有哪些内置的数据集呢?

——使用data()函数,显示内置数据集和当前调用的library里的数据集。

Data sets in package 'datasets':

AirPassengers           Monthly Airline Passenger Numbers 1949-1960
BJsales                 Sales Data with Leading Indicator
BJsales.lead (BJsales)
                        Sales Data with Leading Indicator
BOD                     Biochemical Oxygen Demand
CO2                     Carbon Dioxide Uptake in Grass Plants
ChickWeight             Weight versus age of chicks on different diets
DNase                   Elisa assay of DNase
EuStockMarkets          Daily Closing Prices of Major European Stock
                        Indices, 1991-1998
Formaldehyde            Determination of Formaldehyde
HairEyeColor            Hair and Eye Color of Statistics Students
Harman23.cor            Harman Example 2.3
Harman74.cor            Harman Example 7.4
Indometh                Pharmacokinetics of Indomethacin
InsectSprays            Effectiveness of Insect Sprays
JohnsonJohnson          Quarterly Earnings per Johnson & Johnson Share
LakeHuron               Level of Lake Huron 1875-1972
LifeCycleSavings        Intercountry Life-Cycle Savings Data
Loblolly                Growth of Loblolly pine trees
Nile                    Flow of the River Nile
Orange                  Growth of Orange Trees
OrchardSprays           Potency of Orchard Sprays
PlantGrowth             Results from an Experiment on Plant Growth
Puromycin               Reaction Velocity of an Enzymatic Reaction
Seatbelts               Road Casualties in Great Britain 1969-84
Theoph                  Pharmacokinetics of Theophylline
Titanic                 Survival of passengers on the Titanic
ToothGrowth             The Effect of Vitamin C on Tooth Growth in
                        Guinea Pigs
UCBAdmissions           Student Admissions at UC Berkeley
UKDriverDeaths          Road Casualties in Great Britain 1969-84
UKgas                   UK Quarterly Gas Consumption
USAccDeaths             Accidental Deaths in the US 1973-1978
USArrests               Violent Crime Rates by US State
USJudgeRatings          Lawyers' Ratings of State Judges in the US
                        Superior Court
USPersonalExpenditure   Personal Expenditure Data
UScitiesD               Distances Between European Cities and Between
                        US Cities
VADeaths                Death Rates in Virginia (1940)
WWWusage                Internet Usage per Minute
WorldPhones             The World's Telephones
ability.cov             Ability and Intelligence Tests
airmiles                Passenger Miles on Commercial US Airlines,
                        1937-1960
airquality              New York Air Quality Measurements
anscombe                Anscombe's Quartet of 'Identical' Simple Linear
                        Regressions
attenu                  The Joyner-Boore Attenuation Data
attitude                The Chatterjee-Price Attitude Data
austres                 Quarterly Time Series of the Number of
                        Australian Residents
beaver1 (beavers)       Body Temperature Series of Two Beavers
beaver2 (beavers)       Body Temperature Series of Two Beavers
cars                    Speed and Stopping Distances of Cars
chickwts                Chicken Weights by Feed Type
co2                     Mauna Loa Atmospheric CO2 Concentration
crimtab                 Student's 3000 Criminals Data
discoveries             Yearly Numbers of Important Discoveries
esoph                   Smoking, Alcohol and (O)esophageal Cancer
euro                    Conversion Rates of Euro Currencies
euro.cross (euro)       Conversion Rates of Euro Currencies
eurodist                Distances Between European Cities and Between
                        US Cities
faithful                Old Faithful Geyser Data
fdeaths (UKLungDeaths)
                        Monthly Deaths from Lung Diseases in the UK
freeny                  Freeny's Revenue Data
freeny.x (freeny)       Freeny's Revenue Data
freeny.y (freeny)       Freeny's Revenue Data
infert                  Infertility after Spontaneous and Induced
                        Abortion
iris                    Edgar Anderson's Iris Data
iris3                   Edgar Anderson's Iris Data
islands                 Areas of the World's Major Landmasses
ldeaths (UKLungDeaths)
                        Monthly Deaths from Lung Diseases in the UK
lh                      Luteinizing Hormone in Blood Samples
longley                 Longley's Economic Regression Data
lynx                    Annual Canadian Lynx trappings 1821-1934
mdeaths (UKLungDeaths)
                        Monthly Deaths from Lung Diseases in the UK
morley                  Michelson Speed of Light Data
mtcars                  Motor Trend Car Road Tests
nhtemp                  Average Yearly Temperatures in New Haven
nottem                  Average Monthly Temperatures at Nottingham,
                        1920-1939
npk                     Classical N, P, K Factorial Experiment
occupationalStatus      Occupational Status of Fathers and their Sons
precip                  Annual Precipitation in US Cities
presidents              Quarterly Approval Ratings of US Presidents
pressure                Vapor Pressure of Mercury as a Function of
                        Temperature
quakes                  Locations of Earthquakes off Fiji
randu                   Random Numbers from Congruential Generator
                        RANDU
rivers                  Lengths of Major North American Rivers
rock                    Measurements on Petroleum Rock Samples
sleep                   Student's Sleep Data
stack.loss (stackloss)
                        Brownlee's Stack Loss Plant Data
stack.x (stackloss)     Brownlee's Stack Loss Plant Data
stackloss               Brownlee's Stack Loss Plant Data
state.abb (state)       US State Facts and Figures
state.area (state)      US State Facts and Figures
state.center (state)    US State Facts and Figures
state.division (state)
                        US State Facts and Figures
state.name (state)      US State Facts and Figures
state.region (state)    US State Facts and Figures
state.x77 (state)       US State Facts and Figures
sunspot.month           Monthly Sunspot Data, from 1749 to "Present"
sunspot.year            Yearly Sunspot Data, 1700-1988
sunspots                Monthly Sunspot Numbers, 1749-1983
swiss                   Swiss Fertility and Socioeconomic Indicators
                        (1888) Data
treering                Yearly Treering Data, -6000-1979
trees                   Diameter, Height and Volume for Black Cherry
                        Trees
uspop                   Populations Recorded by the US Census
volcano                 Topographic Information on Auckland's Maunga
                        Whau Volcano
warpbreaks              The Number of Breaks in Yarn during Weaving
women                   Average Heights and Weights for American Women

Data sets in package 'forecast':

gas                     Australian monthly gas production
gold                    Daily morning gold prices
taylor                  Half-hourly electricity demand
wineind                 Australian total wine sales
woolyrnq                Quarterly production of woollen yarn in
                        Australia

Data sets in package 'ggplot2':

diamonds                Prices of 50,000 round cut diamonds
economics               US economic time series
economics_long          US economic time series
faithfuld               2d density estimate of Old Faithful data
luv_colours             'colors()' in Luv space
midwest                 Midwest demographics
mpg                     Fuel economy data from 1999 and 2008 for 38
                        popular models of car
msleep                  An updated and expanded version of the mammals
                        sleep dataset
presidential            Terms of 11 presidents from Eisenhower to Obama
seals                   Vector field of seal movements
txhousing               Housing sales in TX

Data sets in package 'plyr':

baseball                Yearly batting records for all major league
                        baseball players
ozone                   Monthly ozone measurements over Central
                        America.


Use 'data(package = .packages(all.available = TRUE))'
to list the data sets in all *available* packages.

 

示例2:绘制折线图

# Example 2: 折线图
plot(pressure$temperature, pressure$pressure, type="l")
# 添加数据点
points(pressure$temperature, pressure$pressure)
# 添加另一条
lines(pressure$temperature, pressure$pressure/2, col="red")
# 添加数据点
points(pressure$temperature, pressure$pressure/2, col="red")

 

示例3:条形图

  1. 基础版(使用fill=Region进行分组填充颜色)

为了避免提示upc赋值错误,首先在控制台安装gcckbook的包,命令如下:

> install.packages("gcookbook")
试开URL’https://mirror.lzu.edu.cn/CRAN/bin/windows/contrib/3.6/gcookbook_2.0.zip'
Content type 'application/zip' length 4012802 bytes (3.8 MB)
downloaded 3.8 MB

程序包‘gcookbook’打开成功,MD5和检查也通过

下载的二进制程序包在
        C:\Users\Administrator\AppData\Local\Temp\RtmpYnphvC\downloaded_packages里
> # 条形图
> library(gcookbook)
> upc <- subset(uspopchange, rank(Change)>40)
> upc
            State Abb Region Change
3         Arizona  AZ   West   24.6
6        Colorado  CO   West   16.9
10        Florida  FL  South   17.6
11        Georgia  GA  South   18.3
13          Idaho  ID   West   21.1
29         Nevada  NV   West   35.1
34 North Carolina  NC  South   18.5
41 South Carolina  SC  South   15.3
44          Texas  TX  South   20.6
45           Utah  UT   West   23.8
> ggplot(upc, aes(x=reorder(Abb, Change), y=Change, fill=Region))+
+ geom_bar(stat="identity", colour="black") + 
+ scale_fill_manual(values=c("#5DADE2","#76D7C4")) +
+ xlab("State")
> 

我们想要对数据分组着色,代码如下: 

# 使用数据
library(gcookbook)
# 筛选数据集
upc <- subset(uspopchange, rank(Change)>40)
# 打印数据
upc
# 绘制条形图, x轴是缩写,有轴是变化的数值,分组填充地区(南部,西部)
ggplot(upc, aes(x=Abb, y=Change, fill=Region)) + geom_bar(stat="identity")

  1. 自定义版

If the colors are not ideal, we can choose some favourate colors manuly, the source code is displayed below.

# 条形图
library(gcookbook)
upc <- subset(uspopchange, rank(Change)>40)
upc
ggplot(upc, aes(x=reorder(Abb, Change), y=Change, fill=Region))+
	geom_bar(stat="identity", colour="black") + 
	scale_fill_manual(values=c("#5DADE2","#76D7C4")) +
	xlab("State")

我们修改了X轴的标签,这样看起来更清楚,这回绘制的图就这样了:

示例4:给散点图添加回归模型拟合线

# 给散点图添加回归模型拟合线
library(gcookbook)
# 列出数据
heightweight[, c("ageYear", "heightIn")]
# 绘制基本散点图
ggplot(heightweight, aes(x=ageYear, y=heightIn)) + geom_point()
sp <- ggplot(heightweight, aes(x=ageYear, y=heightIn))
sp + geom_point() + stat_smooth(method=lm)
# 置信区间0.99
sp + geom_point() + stat_smooth(method=lm, level=0.99)
# 不要置信区域
sp + geom_point() + stat_smooth(method=lm, se=FALSE)
# 修改颜色
sp + geom_point(colour="grey60") + stat_smooth(method=lm, se=FALSE, colour="black")
# 使用局部加权多项式拟合
sp + geom_point(colour="grey60") + stat_smooth(method=loess)
基本散点图
lm()方法拟合
没有置信区域
修改拟合线的颜色
局部加权多项式拟合

附:常见R语言不规范格式:

  1. 文件名使用file_name_1.R
  1. 变量名使用df.name.set1
  1. 操作符(=,+,-,<-,...)前后要留有空格
  1. 逗号前不要留有空格 ,逗号后一定要有空格
  1. 函数文档格式应该规范,如:
CalculateSampleCovariance <- function(x, y, verbose = TRUE) {
  # Computes the sample covariance between two vectors.
  #
  # Args:
  #   x: One of two vectors whose sample covariance is to be calculated.
  #   y: The other vector. x and y must have the same length, greater than one,
  #      with no missing values.
  #   verbose: If TRUE, prints sample covariance; if not, not. Default is TRUE.
  #
  # Returns:
  #   The sample covariance between x and y.
  n <- length(x)
  # Error handling
  if (n <= 1 || n != length(y)) {
    stop("Arguments x and y have different lengths: ",
         length(x), " and ", length(y), ".")
  }
  if (TRUE %in% is.na(x) || TRUE %in% is.na(y)) {
    stop(" Arguments x and y must not have missing values.")
  }
  covariance <- var(x, y)
  if (verbose)
    cat("Covariance = ", round(covariance, 4), ".\n", sep = "")
  return(covariance)
}

 摘自:

  1. https://zhuanlan.zhihu.com/p/47718164
  2. https://google.github.io/styleguide/Rguide.xml

顺便找到了一些优秀的资源:

  1. https://www.jianshu.com/p/bd1ed40919f4 
Last Modified: 2019-06-24 14:34
Views: 2.5K

[[total]] comments

Post your comment
  1. [[item.time]]
    [[item.user.username]] [[item.floor]]Floor
  2. Click to load more...
  3. Post your comment