Intermediate R

If you shut down R, you will need to change your working directory and run the following commands to load libraries,

library('ggplot2')
library('dplyr')
library('tidyr')
library('ISDSWorkshop')
workshop(launch_index=FALSE)

Read in the data files and create additional variables

# Read csv files and create additional variables
icd9df = read.csv("icd9.csv")
GI     = read.csv("GI.csv") %>%
  mutate(
    date      = as.Date(date),
    weekC     = cut(date, breaks="weeks"),
    week      = as.numeric(weekC),
    weekD     = as.Date(weekC),
    facility  = as.factor(facility),
    icd9class = factor(cut(icd9, 
                           breaks = icd9df$code_cutpoint, 
                           labels = icd9df$classification[-nrow(icd9df)], 
                           right  = TRUE)),
    ageC      = cut(age, 
                    breaks = c(-Inf, 5, 18, 45 ,60, Inf)),
    zip3      = trunc(zipcode/100))

Exporting tables

There are many ways to export tables (for use in Word). Here I will cover two basic approachess that are simple and work well.

cut-and-paste
create HTML table and cut-and-paste to Word

There are more sophisticated approaches that I will not cover:

Cut-and-paste

You can cut-and-paste directly from R.

ga_l <- GI %>%
  group_by(gender, ageC) %>%
  summarize(count = n())

ga_w <- ga_l %>%
  spread(ageC, count) %>%
  print(row.names = FALSE)

## Source: local data frame [2 x 6]
## Groups: gender [2]
## 
##   gender `(-Inf,5]` `(5,18]` `(18,45]` `(45,60]` `(60, Inf]`
## * <fctr>      <int>    <int>     <int>     <int>       <int>
## 1 Female       2161     1942      3710      1590        1250
## 2   Male       2163     1860      3766      1543        1259

Cut-and-pasting this table is done in ASCII format. This looks good in a plain text document, e.g. Notepad, TextEdit, and some email, but will not look good in other formats, e.g. docx.

Create HTML table

library('xtable')
tab = xtable(ga_w,
             caption = "Total GI cases by Sex and Age Category",
             label   = "myHTMLanchor",
             align   = "ll|rrrrr") # rownames gets a column

Save the table to a file

print(tab, file="table.html", type="html", include.rownames=FALSE)

Output for this HTML table

The HTML code looks like

print(tab, type="html", include.rownames=FALSE)

## <!-- html table generated in R 3.3.1 by xtable 1.8-2 package -->
## <!-- Thu Jan 19 12:50:30 2017 -->
## <table border=1>
## <caption align="bottom"> Total GI cases by Sex and Age Category </caption>
## <tr> <th> gender </th> <th> (-Inf,5] </th> <th> (5,18] </th> <th> (18,45] </th> <th> (45,60] </th> <th> (60, Inf] </th>  </tr>
##   <tr> <td> Female </td> <td align="right"> 2161 </td> <td align="right"> 1942 </td> <td align="right"> 3710 </td> <td align="right"> 1590 </td> <td align="right"> 1250 </td> </tr>
##   <tr> <td> Male </td> <td align="right"> 2163 </td> <td align="right"> 1860 </td> <td align="right"> 3766 </td> <td align="right"> 1543 </td> <td align="right"> 1259 </td> </tr>
##    <a name=myHTMLanchor></a>
## </table>

and the resulting table looks like

print(tab, type="html", include.rownames=FALSE)

Total GI cases by Sex and Age Category
gender	(-Inf,5]	(5,18]	(18,45]	(45,60]	(60, Inf]
Female	2161	1942	3710	1590	1250
Male	2163	1860	3766	1543	1259

Copy-and-paste table to Word

Now you can

Open the file (table.html)
Copy-and-paste this table to Word.

Activity - exporting tables

Create a Word table for the number of cases by facility and age category.

# Summarize data by facility and age category

# Reshape data from long to wide format

# Create HTML table

# Save HTML to file

# Copy-and-paste table into Word

When you have completed the activity, compare your results to the solutions.

Maps

The packages ggplot2 and maps can be used together to make maps.

Map of the continental US

library('maps')
states = map_data("state")
ggplot(states, aes(x = long, y = lat, group = group)) + 
  geom_polygon(color="white")

Map of the counties in the continental US

counties = map_data("county")
ggplot(counties, aes(x = long, y = lat, group = group)) + 
  geom_polygon(color = "white")

Map of the counties in Iowa

ggplot(counties %>% filter(region=="iowa"), 
       aes(x = long, y = lat, group = group)) + 
  geom_polygon(fill = "gray", color = "black")

Construct an appropriate data set

To make an informative map, we need to add data.

fluTrends = read.csv("fluTrends.csv", check.names=FALSE)

For simplicity, only keep the most recent 12 weeks on states.

nr = nrow(fluTrends)
flu_w = fluTrends[(nr-11):nr, c(1,3:53)]
dim(flu_w)

## [1] 12 52

Reshape to long format

flu_l <- flu_w %>%
  gather(region, index, -Date)

Merge fluTrends data with map_data

The region names in map_data are lower case, so use tolower() to convert all the region names in flu_l to lowercase. Then the map_data and fluTrends data are merged using merge().

head(unique(states$region))

## [1] "alabama"     "arizona"     "arkansas"    "california"  "colorado"   
## [6] "connecticut"

flu_l$region = tolower(flu_l$region)
states_merged = merge(states, flu_l, sort=FALSE, by='region')

Make the plots

Some Google Flu Trend data.

states_merged$Date = as.Date(states_merged$Date)
mx_date = max(states_merged$Date)
g = ggplot(states_merged %>% filter(Date == mx_date), 
       aes(x = long, y = lat, group = group, fill = index)) + 
  geom_polygon() + 
  labs(title=paste('Google Flu Trends on', mx_date), x='', y='') +
  theme_minimal() + 
  theme(legend.title = element_blank()) +
  coord_map("cylindrical")

if (require('RColorBrewer'))
  g = g + scale_fill_gradientn(colours=RColorBrewer::brewer.pal(9,"Reds")) # Thanks Annie Chen

## Loading required package: RColorBrewer

Last 12 weeks.

ggplot(states_merged, 
       aes(x=long, y=lat, group=group, fill=index)) + 
  geom_polygon() + 
  labs(title='Google Flu Trends', x='', y='') +
  theme_minimal() + 
  theme(legend.title = element_blank()) +
  facet_wrap(~Date) + 
  coord_map("cylindrical") + 
  scale_fill_gradientn(colours=brewer.pal(9,"Reds"))

Activity

Modify the code to determine what elements of the map are affected.

Advanced: Download the data directly from http://www.google.org/flutrends/about/data/flu/historic/us-historic-v3.txt using `read.csv’ and then construct a map of the most recent Google Flu Trends data.

# Construct Google Flu Trends map

When you have completed the activity, compare your results to the solutions.

Packages

A package provides additional functionality besides what base installation of R provides. You have already used a number of additional packages:

ggplot2
ISDSWorkshop
maps
dplyr
tidyr
xtable

The Comprehensive R Archive Network (CRAN) has over 9,594 packages. These packages can be installed using the install.packages() function, e.g.

install.packages("ggplot2")

For many reasons, packages may not be on CRAN but may be available from other sources:

Github
Bioconductor
Source file (tar.gz)

Github

To install a package from github, you can use the install_github() function from the devtools package. For example,

# install.packages("devtools")
library('devtools')
install_github('nandadorea/vetsyn')

Bioconductor

Bioconductor provides tools for high-throughput genomic data. There are over 1,296 packages available from bioconductor. To install bioconductor, use

source("http://bioconductor.org/biocLite.R")
biocLite()

Then to install a package from bioconductor use

biocLite("edgeR")

The bioconductor repository may be useful in the future for public health biosurveillance.

Source

All packages can be installed from their source. If a package is not available in a repository, then this is the only way to install the package. To install from source download the source file (.tar.gz) and then use the install.packages() function with arguments repos=NULL and type="source". For example, from a source version of this workshop, you would use the following command:

install.packages("ISDSWorkshop_0.3.tar.gz", 
                 repos = NULL, 
                 type  = "source")

Packages for surveillance

Some packages for performing surveillance are

SpatialEpi - Kulldorff example

The SpatialEpi package implements the Kulldorff cluster detection method in the kulldorff() function.

Setting up the analysis (taken directly from the example)

library('SpatialEpi')

## Loading required package: sp

## Load Pennsylvania Lung Cancer Data
data(pennLC)
data <- pennLC$data

## Process geographical information and convert to grid
geo <- pennLC$geo[,2:3]
geo <- latlong2grid(geo)

## Get aggregated counts of population and cases for each county
population <- tapply(data$population, data$county, sum)
cases      <- tapply(data$cases,      data$county, sum)

## Based on the 16 strata levels, compute expected numbers of disease
n.strata       <- 16
expected.cases <- expected(data$population, data$cases, n.strata)

## Set Parameters
pop.upper.bound <- 0.5
n.simulations   <- 999
alpha.level     <- 0.05
plot            <- TRUE

SpatialEpi - Kulldorff example

Plot the data

clr = cases/population
clr = 1-clr/max(clr) # make sure range is (0,1)
plot(pennLC$spatial.polygon, axes=TRUE, col=rgb(1, clr, clr))

SpatialEpi - Kulldorff example

Run the analysis and plot the results (directly from the example)

## Kulldorff using Binomial likelihoods
binomial <- kulldorff(geo, cases, population, NULL, pop.upper.bound, n.simulations, 
  alpha.level, plot)

cluster <- binomial$most.likely.cluster$location.IDs.included

## plot
plot(pennLC$spatial.polygon,axes=TRUE)
plot(pennLC$spatial.polygon[cluster],add=TRUE,col="red")
title("Most Likely Cluster")

Activity - Install the `surveillance` package

If you are connected to the internet, try installing the surveillance package. If you are successful, look at its help file.

# Install the surveillance package

When you have completed the activity, compare your results to the solutions.

Functions

Packages are typically a collection of functions. But you can write your own. For example,

add = function(a,b) {
  return(a+b)
}
add(1,2)

## [1] 3

add_vector = function(v) {
  sum = 0
  for (i in 1:length(v)) sum = sum + v[i]
  return(sum)
}

A simple outbreak detection function

Here is a simple outbreak detection function

alert = function(y,threshold=100) {
  # y is the time series 
  # an alert is issue for any y above the threshold (default is 100)
  factor(ifelse(y>threshold, "Yes", "No"))
}

Run this on our weekly GI data.

GI_w <- GI %>%
  group_by(weekD) %>%
  summarize(count = n())

GI_w$alert = alert(y = GI_w$count, threshold = 150)

ggplot(GI_w, aes(x = weekD, y = count, color = alert)) + 
  geom_point() + 
  scale_color_manual(values = c("black","red"))

R in batch

For routine analysis, it can be helpful to run R in batch mode rather than interactively. To do this, create a script and save the script with a .R extension, e.g. script.R:

# Read in the data perhaps as a csv file

# Create some table and save them to html files

# Create some figures and save them to jpegs

# Run some outbreak detection algorithms and produce figures

Now, from the command line run

Rscript script.R

Or, from within R run

source("script.R")

Now each week you can update the csv file and then just run the script which will create all new tables and figures.

Shiny apps

R can be used to create HTML applications through the shiny package. The code is written entirely in R, although you can use cascading style sheets (CSS) if you want to update how it looks. Here is an example that allows visualization of CDC data.

These apps can also be run locally, e.g.

install.packages('shiny')
library('shiny')
runGitHub('NLMichaud/WeeklyCDCPlot')

Intermediate R

Jarad Niemi

2017-01-19

Exporting tables

Cut-and-paste

Create HTML table

Output for this HTML table

Copy-and-paste table to Word

Activity - exporting tables

Maps

Construct an appropriate data set

Merge fluTrends data with map_data

Make the plots

Activity

Packages

Github

Bioconductor

Source

Packages for surveillance

SpatialEpi - Kulldorff example

SpatialEpi - Kulldorff example

SpatialEpi - Kulldorff example

Activity - Install the `surveillance` package

Functions

A simple outbreak detection function

R in batch

Shiny apps

Intermediate R

Jarad Niemi

2017-01-19

Exporting tables

Cut-and-paste

Create HTML table

Output for this HTML table

Copy-and-paste table to Word

Activity - exporting tables

Maps

Construct an appropriate data set

Merge fluTrends data with map_data

Make the plots

Activity

Packages

Github

Bioconductor

Source

Packages for surveillance

SpatialEpi - Kulldorff example

SpatialEpi - Kulldorff example

SpatialEpi - Kulldorff example

Activity - Install the surveillance package

Functions

A simple outbreak detection function

R in batch

Shiny apps

Activity - Install the `surveillance` package