在学习R语言之前,了解一下R语言:
“一个用于统计计算和绘图的自由软件环境”——R官方网站
目前,R语言支持四套图形系统,基础图形(base),网格图形(grid),lattice图形和ggplot2( since 2005).
CRAN : comprehensive R Archive Network的简称,主站在Australia,如果需要最新的package, 可以选择这个。国内的几个镜像是Hong Kang, Guangdong, Shanghai, Lanzhou. 有些时候,R安装包报错,基本上有条件的同学可以直接翻墙连接主站安装就好了。
了解了一点,来看看读数据吧。
# 读取文件内容,不包含第一行定义,分隔符为空格
data <-read.csv("datafile.csv", header=FALSE, sep=" ")
如果简单的数组,可以直接用下面的方法读取:
# 为读取的数据手动赋值
names(data)<- c("c1", "c2", "c3")
那么问题来了,向量的表示是c, 为什么是c呢,它是 “concatenate”(连结)的简写,含义是把各分项首位相连。
---
接下来看几个绘图的例子:
示例1:R语言绘制散点图
# Example 1: 散点图
library(ggplot2)
qplot(mtcars$wt, mtcars$mpg)
有的小伙伴可能会有疑问,这个mtcars数据时哪里来的?为什么两行代码就能画出数据图?
Reply: mtcars是R语言的内置数据集,在R语言的控制台输入mtcars就可以查看数据集了。而 $ 符号可以读取其后面对应的那一列数据。
> mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
>
那么问题来了,怎么知道有哪些内置的数据集呢?
——使用data()函数,显示内置数据集和当前调用的library里的数据集。
Data sets in package 'datasets':
AirPassengers Monthly Airline Passenger Numbers 1949-1960
BJsales Sales Data with Leading Indicator
BJsales.lead (BJsales)
Sales Data with Leading Indicator
BOD Biochemical Oxygen Demand
CO2 Carbon Dioxide Uptake in Grass Plants
ChickWeight Weight versus age of chicks on different diets
DNase Elisa assay of DNase
EuStockMarkets Daily Closing Prices of Major European Stock
Indices, 1991-1998
Formaldehyde Determination of Formaldehyde
HairEyeColor Hair and Eye Color of Statistics Students
Harman23.cor Harman Example 2.3
Harman74.cor Harman Example 7.4
Indometh Pharmacokinetics of Indomethacin
InsectSprays Effectiveness of Insect Sprays
JohnsonJohnson Quarterly Earnings per Johnson & Johnson Share
LakeHuron Level of Lake Huron 1875-1972
LifeCycleSavings Intercountry Life-Cycle Savings Data
Loblolly Growth of Loblolly pine trees
Nile Flow of the River Nile
Orange Growth of Orange Trees
OrchardSprays Potency of Orchard Sprays
PlantGrowth Results from an Experiment on Plant Growth
Puromycin Reaction Velocity of an Enzymatic Reaction
Seatbelts Road Casualties in Great Britain 1969-84
Theoph Pharmacokinetics of Theophylline
Titanic Survival of passengers on the Titanic
ToothGrowth The Effect of Vitamin C on Tooth Growth in
Guinea Pigs
UCBAdmissions Student Admissions at UC Berkeley
UKDriverDeaths Road Casualties in Great Britain 1969-84
UKgas UK Quarterly Gas Consumption
USAccDeaths Accidental Deaths in the US 1973-1978
USArrests Violent Crime Rates by US State
USJudgeRatings Lawyers' Ratings of State Judges in the US
Superior Court
USPersonalExpenditure Personal Expenditure Data
UScitiesD Distances Between European Cities and Between
US Cities
VADeaths Death Rates in Virginia (1940)
WWWusage Internet Usage per Minute
WorldPhones The World's Telephones
ability.cov Ability and Intelligence Tests
airmiles Passenger Miles on Commercial US Airlines,
1937-1960
airquality New York Air Quality Measurements
anscombe Anscombe's Quartet of 'Identical' Simple Linear
Regressions
attenu The Joyner-Boore Attenuation Data
attitude The Chatterjee-Price Attitude Data
austres Quarterly Time Series of the Number of
Australian Residents
beaver1 (beavers) Body Temperature Series of Two Beavers
beaver2 (beavers) Body Temperature Series of Two Beavers
cars Speed and Stopping Distances of Cars
chickwts Chicken Weights by Feed Type
co2 Mauna Loa Atmospheric CO2 Concentration
crimtab Student's 3000 Criminals Data
discoveries Yearly Numbers of Important Discoveries
esoph Smoking, Alcohol and (O)esophageal Cancer
euro Conversion Rates of Euro Currencies
euro.cross (euro) Conversion Rates of Euro Currencies
eurodist Distances Between European Cities and Between
US Cities
faithful Old Faithful Geyser Data
fdeaths (UKLungDeaths)
Monthly Deaths from Lung Diseases in the UK
freeny Freeny's Revenue Data
freeny.x (freeny) Freeny's Revenue Data
freeny.y (freeny) Freeny's Revenue Data
infert Infertility after Spontaneous and Induced
Abortion
iris Edgar Anderson's Iris Data
iris3 Edgar Anderson's Iris Data
islands Areas of the World's Major Landmasses
ldeaths (UKLungDeaths)
Monthly Deaths from Lung Diseases in the UK
lh Luteinizing Hormone in Blood Samples
longley Longley's Economic Regression Data
lynx Annual Canadian Lynx trappings 1821-1934
mdeaths (UKLungDeaths)
Monthly Deaths from Lung Diseases in the UK
morley Michelson Speed of Light Data
mtcars Motor Trend Car Road Tests
nhtemp Average Yearly Temperatures in New Haven
nottem Average Monthly Temperatures at Nottingham,
1920-1939
npk Classical N, P, K Factorial Experiment
occupationalStatus Occupational Status of Fathers and their Sons
precip Annual Precipitation in US Cities
presidents Quarterly Approval Ratings of US Presidents
pressure Vapor Pressure of Mercury as a Function of
Temperature
quakes Locations of Earthquakes off Fiji
randu Random Numbers from Congruential Generator
RANDU
rivers Lengths of Major North American Rivers
rock Measurements on Petroleum Rock Samples
sleep Student's Sleep Data
stack.loss (stackloss)
Brownlee's Stack Loss Plant Data
stack.x (stackloss) Brownlee's Stack Loss Plant Data
stackloss Brownlee's Stack Loss Plant Data
state.abb (state) US State Facts and Figures
state.area (state) US State Facts and Figures
state.center (state) US State Facts and Figures
state.division (state)
US State Facts and Figures
state.name (state) US State Facts and Figures
state.region (state) US State Facts and Figures
state.x77 (state) US State Facts and Figures
sunspot.month Monthly Sunspot Data, from 1749 to "Present"
sunspot.year Yearly Sunspot Data, 1700-1988
sunspots Monthly Sunspot Numbers, 1749-1983
swiss Swiss Fertility and Socioeconomic Indicators
(1888) Data
treering Yearly Treering Data, -6000-1979
trees Diameter, Height and Volume for Black Cherry
Trees
uspop Populations Recorded by the US Census
volcano Topographic Information on Auckland's Maunga
Whau Volcano
warpbreaks The Number of Breaks in Yarn during Weaving
women Average Heights and Weights for American Women
Data sets in package 'forecast':
gas Australian monthly gas production
gold Daily morning gold prices
taylor Half-hourly electricity demand
wineind Australian total wine sales
woolyrnq Quarterly production of woollen yarn in
Australia
Data sets in package 'ggplot2':
diamonds Prices of 50,000 round cut diamonds
economics US economic time series
economics_long US economic time series
faithfuld 2d density estimate of Old Faithful data
luv_colours 'colors()' in Luv space
midwest Midwest demographics
mpg Fuel economy data from 1999 and 2008 for 38
popular models of car
msleep An updated and expanded version of the mammals
sleep dataset
presidential Terms of 11 presidents from Eisenhower to Obama
seals Vector field of seal movements
txhousing Housing sales in TX
Data sets in package 'plyr':
baseball Yearly batting records for all major league
baseball players
ozone Monthly ozone measurements over Central
America.
Use 'data(package = .packages(all.available = TRUE))'
to list the data sets in all *available* packages.
示例2:绘制折线图
# Example 2: 折线图
plot(pressure$temperature, pressure$pressure, type="l")
# 添加数据点
points(pressure$temperature, pressure$pressure)
# 添加另一条
lines(pressure$temperature, pressure$pressure/2, col="red")
# 添加数据点
points(pressure$temperature, pressure$pressure/2, col="red")
示例3:条形图
- 基础版(使用fill=Region进行分组填充颜色)
为了避免提示upc赋值错误,首先在控制台安装gcckbook的包,命令如下:
> install.packages("gcookbook")
试开URL’https://mirror.lzu.edu.cn/CRAN/bin/windows/contrib/3.6/gcookbook_2.0.zip'
Content type 'application/zip' length 4012802 bytes (3.8 MB)
downloaded 3.8 MB
程序包‘gcookbook’打开成功,MD5和检查也通过
下载的二进制程序包在
C:\Users\Administrator\AppData\Local\Temp\RtmpYnphvC\downloaded_packages里
> # 条形图
> library(gcookbook)
> upc <- subset(uspopchange, rank(Change)>40)
> upc
State Abb Region Change
3 Arizona AZ West 24.6
6 Colorado CO West 16.9
10 Florida FL South 17.6
11 Georgia GA South 18.3
13 Idaho ID West 21.1
29 Nevada NV West 35.1
34 North Carolina NC South 18.5
41 South Carolina SC South 15.3
44 Texas TX South 20.6
45 Utah UT West 23.8
> ggplot(upc, aes(x=reorder(Abb, Change), y=Change, fill=Region))+
+ geom_bar(stat="identity", colour="black") +
+ scale_fill_manual(values=c("#5DADE2","#76D7C4")) +
+ xlab("State")
>
我们想要对数据分组着色,代码如下:
# 使用数据
library(gcookbook)
# 筛选数据集
upc <- subset(uspopchange, rank(Change)>40)
# 打印数据
upc
# 绘制条形图, x轴是缩写,有轴是变化的数值,分组填充地区(南部,西部)
ggplot(upc, aes(x=Abb, y=Change, fill=Region)) + geom_bar(stat="identity")
- 自定义版
If the colors are not ideal, we can choose some favourate colors manuly, the source code is displayed below.
# 条形图
library(gcookbook)
upc <- subset(uspopchange, rank(Change)>40)
upc
ggplot(upc, aes(x=reorder(Abb, Change), y=Change, fill=Region))+
geom_bar(stat="identity", colour="black") +
scale_fill_manual(values=c("#5DADE2","#76D7C4")) +
xlab("State")
我们修改了X轴的标签,这样看起来更清楚,这回绘制的图就这样了:
示例4:给散点图添加回归模型拟合线
# 给散点图添加回归模型拟合线
library(gcookbook)
# 列出数据
heightweight[, c("ageYear", "heightIn")]
# 绘制基本散点图
ggplot(heightweight, aes(x=ageYear, y=heightIn)) + geom_point()
sp <- ggplot(heightweight, aes(x=ageYear, y=heightIn))
sp + geom_point() + stat_smooth(method=lm)
# 置信区间0.99
sp + geom_point() + stat_smooth(method=lm, level=0.99)
# 不要置信区域
sp + geom_point() + stat_smooth(method=lm, se=FALSE)
# 修改颜色
sp + geom_point(colour="grey60") + stat_smooth(method=lm, se=FALSE, colour="black")
# 使用局部加权多项式拟合
sp + geom_point(colour="grey60") + stat_smooth(method=loess)
附:常见R语言不规范格式:
- 文件名使用file_name_1.R
- 变量名使用df.name.set1
- 操作符(=,+,-,<-,...)前后要留有空格
- 逗号前不要留有空格 ,逗号后一定要有空格
- 函数文档格式应该规范,如:
CalculateSampleCovariance <- function(x, y, verbose = TRUE) {
# Computes the sample covariance between two vectors.
#
# Args:
# x: One of two vectors whose sample covariance is to be calculated.
# y: The other vector. x and y must have the same length, greater than one,
# with no missing values.
# verbose: If TRUE, prints sample covariance; if not, not. Default is TRUE.
#
# Returns:
# The sample covariance between x and y.
n <- length(x)
# Error handling
if (n <= 1 || n != length(y)) {
stop("Arguments x and y have different lengths: ",
length(x), " and ", length(y), ".")
}
if (TRUE %in% is.na(x) || TRUE %in% is.na(y)) {
stop(" Arguments x and y must not have missing values.")
}
covariance <- var(x, y)
if (verbose)
cat("Covariance = ", round(covariance, 4), ".\n", sep = "")
return(covariance)
}
摘自:
顺便找到了一些优秀的资源: