If you shut down R, you will need to change your working directory and run the following commands to load libraries,
library('ggplot2')
library('dplyr')
library('tidyr')
library('ISDSWorkshop')
workshop(launch_index=FALSE)
Read in the data files and create additional variables
# Read csv files and create additional variables
icd9df = read.csv("icd9.csv")
GI = read.csv("GI.csv") %>%
mutate(
date = as.Date(date),
weekC = cut(date, breaks="weeks"),
week = as.numeric(weekC),
weekD = as.Date(weekC),
facility = as.factor(facility),
icd9class = factor(cut(icd9,
breaks = icd9df$code_cutpoint,
labels = icd9df$classification[-nrow(icd9df)],
right = TRUE)),
ageC = cut(age,
breaks = c(-Inf, 5, 18, 45 ,60, Inf)),
zip3 = trunc(zipcode/100))
There are many ways to export tables (for use in Word). Here I will cover two basic approachess that are simple and work well.
There are more sophisticated approaches that I will not cover:
You can cut-and-paste directly from R.
ga_l <- GI %>%
group_by(gender, ageC) %>%
summarize(count = n())
ga_w <- ga_l %>%
spread(ageC, count) %>%
print(row.names = FALSE)
## Source: local data frame [2 x 6]
## Groups: gender [2]
##
## gender `(-Inf,5]` `(5,18]` `(18,45]` `(45,60]` `(60, Inf]`
## * <fctr> <int> <int> <int> <int> <int>
## 1 Female 2161 1942 3710 1590 1250
## 2 Male 2163 1860 3766 1543 1259
Cut-and-pasting this table is done in ASCII format. This looks good in a plain text document, e.g. Notepad, TextEdit, and some email, but will not look good in other formats, e.g. docx.
library('xtable')
tab = xtable(ga_w,
caption = "Total GI cases by Sex and Age Category",
label = "myHTMLanchor",
align = "ll|rrrrr") # rownames gets a column
Save the table to a file
print(tab, file="table.html", type="html", include.rownames=FALSE)
The HTML code looks like
print(tab, type="html", include.rownames=FALSE)
## <!-- html table generated in R 3.3.1 by xtable 1.8-2 package -->
## <!-- Thu Jan 19 12:50:30 2017 -->
## <table border=1>
## <caption align="bottom"> Total GI cases by Sex and Age Category </caption>
## <tr> <th> gender </th> <th> (-Inf,5] </th> <th> (5,18] </th> <th> (18,45] </th> <th> (45,60] </th> <th> (60, Inf] </th> </tr>
## <tr> <td> Female </td> <td align="right"> 2161 </td> <td align="right"> 1942 </td> <td align="right"> 3710 </td> <td align="right"> 1590 </td> <td align="right"> 1250 </td> </tr>
## <tr> <td> Male </td> <td align="right"> 2163 </td> <td align="right"> 1860 </td> <td align="right"> 3766 </td> <td align="right"> 1543 </td> <td align="right"> 1259 </td> </tr>
## <a name=myHTMLanchor></a>
## </table>
and the resulting table looks like
print(tab, type="html", include.rownames=FALSE)
gender | (-Inf,5] | (5,18] | (18,45] | (45,60] | (60, Inf] |
---|---|---|---|---|---|
Female | 2161 | 1942 | 3710 | 1590 | 1250 |
Male | 2163 | 1860 | 3766 | 1543 | 1259 |
Now you can
table.html
)Create a Word table for the number of cases by facility and age category.
# Summarize data by facility and age category
# Reshape data from long to wide format
# Create HTML table
# Save HTML to file
# Copy-and-paste table into Word
When you have completed the activity, compare your results to the solutions.
The packages ggplot2
and maps
can be used together to make maps.
Map of the continental US
library('maps')
states = map_data("state")
ggplot(states, aes(x = long, y = lat, group = group)) +
geom_polygon(color="white")
Map of the counties in the continental US
counties = map_data("county")
ggplot(counties, aes(x = long, y = lat, group = group)) +
geom_polygon(color = "white")
Map of the counties in Iowa
ggplot(counties %>% filter(region=="iowa"),
aes(x = long, y = lat, group = group)) +
geom_polygon(fill = "gray", color = "black")
To make an informative map, we need to add data.
fluTrends = read.csv("fluTrends.csv", check.names=FALSE)
For simplicity, only keep the most recent 12 weeks on states.
nr = nrow(fluTrends)
flu_w = fluTrends[(nr-11):nr, c(1,3:53)]
dim(flu_w)
## [1] 12 52
Reshape to long format
flu_l <- flu_w %>%
gather(region, index, -Date)
The region names in map_data
are lower case, so use tolower()
to convert all the region names in flu_l to lowercase. Then the map_data
and fluTrends data are merged using merge()
.
head(unique(states$region))
## [1] "alabama" "arizona" "arkansas" "california" "colorado"
## [6] "connecticut"
flu_l$region = tolower(flu_l$region)
states_merged = merge(states, flu_l, sort=FALSE, by='region')
Some Google Flu Trend data.
states_merged$Date = as.Date(states_merged$Date)
mx_date = max(states_merged$Date)
g = ggplot(states_merged %>% filter(Date == mx_date),
aes(x = long, y = lat, group = group, fill = index)) +
geom_polygon() +
labs(title=paste('Google Flu Trends on', mx_date), x='', y='') +
theme_minimal() +
theme(legend.title = element_blank()) +
coord_map("cylindrical")
if (require('RColorBrewer'))
g = g + scale_fill_gradientn(colours=RColorBrewer::brewer.pal(9,"Reds")) # Thanks Annie Chen
## Loading required package: RColorBrewer
g
Last 12 weeks.
ggplot(states_merged,
aes(x=long, y=lat, group=group, fill=index)) +
geom_polygon() +
labs(title='Google Flu Trends', x='', y='') +
theme_minimal() +
theme(legend.title = element_blank()) +
facet_wrap(~Date) +
coord_map("cylindrical") +
scale_fill_gradientn(colours=brewer.pal(9,"Reds"))
Modify the code to determine what elements of the map are affected.
Advanced: Download the data directly from http://www.google.org/flutrends/about/data/flu/historic/us-historic-v3.txt using `read.csv’ and then construct a map of the most recent Google Flu Trends data.
# Construct Google Flu Trends map
When you have completed the activity, compare your results to the solutions.
A package provides additional functionality besides what base installation of R provides. You have already used a number of additional packages:
The Comprehensive R Archive Network (CRAN) has over 9,594 packages. These packages can be installed using the install.packages()
function, e.g.
install.packages("ggplot2")
For many reasons, packages may not be on CRAN but may be available from other sources:
To install a package from github, you can use the install_github()
function from the devtools
package. For example,
# install.packages("devtools")
library('devtools')
install_github('nandadorea/vetsyn')
Bioconductor provides tools for high-throughput genomic data. There are over 1,296 packages available from bioconductor. To install bioconductor, use
source("http://bioconductor.org/biocLite.R")
biocLite()
Then to install a package from bioconductor use
biocLite("edgeR")
The bioconductor repository may be useful in the future for public health biosurveillance.
All packages can be installed from their source. If a package is not available in a repository, then this is the only way to install the package. To install from source download the source file (.tar.gz) and then use the install.packages()
function with arguments repos=NULL
and type="source"
. For example, from a source version of this workshop, you would use the following command:
install.packages("ISDSWorkshop_0.3.tar.gz",
repos = NULL,
type = "source")
Some packages for performing surveillance are
The SpatialEpi
package implements the Kulldorff cluster detection method in the kulldorff()
function.
Setting up the analysis (taken directly from the example)
library('SpatialEpi')
## Loading required package: sp
## Load Pennsylvania Lung Cancer Data
data(pennLC)
data <- pennLC$data
## Process geographical information and convert to grid
geo <- pennLC$geo[,2:3]
geo <- latlong2grid(geo)
## Get aggregated counts of population and cases for each county
population <- tapply(data$population, data$county, sum)
cases <- tapply(data$cases, data$county, sum)
## Based on the 16 strata levels, compute expected numbers of disease
n.strata <- 16
expected.cases <- expected(data$population, data$cases, n.strata)
## Set Parameters
pop.upper.bound <- 0.5
n.simulations <- 999
alpha.level <- 0.05
plot <- TRUE
Plot the data
clr = cases/population
clr = 1-clr/max(clr) # make sure range is (0,1)
plot(pennLC$spatial.polygon, axes=TRUE, col=rgb(1, clr, clr))
Run the analysis and plot the results (directly from the example)
## Kulldorff using Binomial likelihoods
binomial <- kulldorff(geo, cases, population, NULL, pop.upper.bound, n.simulations,
alpha.level, plot)
cluster <- binomial$most.likely.cluster$location.IDs.included
## plot
plot(pennLC$spatial.polygon,axes=TRUE)
plot(pennLC$spatial.polygon[cluster],add=TRUE,col="red")
title("Most Likely Cluster")
surveillance
packageIf you are connected to the internet, try installing the surveillance
package. If you are successful, look at its help file.
# Install the surveillance package
When you have completed the activity, compare your results to the solutions.
Packages are typically a collection of functions. But you can write your own. For example,
add = function(a,b) {
return(a+b)
}
add(1,2)
## [1] 3
or
add_vector = function(v) {
sum = 0
for (i in 1:length(v)) sum = sum + v[i]
return(sum)
}
Here is a simple outbreak detection function
alert = function(y,threshold=100) {
# y is the time series
# an alert is issue for any y above the threshold (default is 100)
factor(ifelse(y>threshold, "Yes", "No"))
}
Run this on our weekly GI data.
GI_w <- GI %>%
group_by(weekD) %>%
summarize(count = n())
GI_w$alert = alert(y = GI_w$count, threshold = 150)
ggplot(GI_w, aes(x = weekD, y = count, color = alert)) +
geom_point() +
scale_color_manual(values = c("black","red"))
For routine analysis, it can be helpful to run R in batch mode rather than interactively. To do this, create a script and save the script with a .R extension, e.g. script.R
:
# Read in the data perhaps as a csv file
# Create some table and save them to html files
# Create some figures and save them to jpegs
# Run some outbreak detection algorithms and produce figures
Now, from the command line run
Rscript script.R
Or, from within R run
source("script.R")
Now each week you can update the csv file and then just run the script which will create all new tables and figures.
R can be used to create HTML applications through the shiny
package. The code is written entirely in R, although you can use cascading style sheets (CSS) if you want to update how it looks. Here is an example that allows visualization of CDC data.
These apps can also be run locally, e.g.
install.packages('shiny')
library('shiny')
runGitHub('NLMichaud/WeeklyCDCPlot')