This website is designed to host course material for STAT 330 - Probability and Statistics for Computer Science at Iowa State University. These materials will be available even after the course is over, although they may be updated.

Course Description

Topics from probability and statistics applicable to computer science. Basic probability; Random variables and their distributions; Stochastic processes including Markov chains; Queuing models; Basic statistical inference; Introduction to regression.

Prerequisite

This course requires MATH 166 (Calculus II). Integrals are needed to compute expectations of continuous random variables and derivatives are needed to obtain maximum likelihood estimators (MLEs).

Textbook

The optional textbook for this course is Probability and Statistics for Computer Scientists by Michael Baron (3rd ed).

The main improvement in the 3rd edition is the inclusion of R code in addition to the MATLAB code. Since I use R for any code that I write, this is an appreciated inclusion.

Here are some free resources that can be used:

Other resources:

Software

No software is required for this course, but the instructor will use the Statistical Software R when demonstrating concepts in class. The instructor will be using RStudio as the interface to R.

Install links:

I am currently recording videos to help new R users.

Videos

I have not recorded videos specifically for STAT 330, but my series of videos used in STAT 587 has a lot of overlap. Thus, the links to videos below are relevant to students in STAT 330. The main difference between these two courses is that, as a graduate course, STAT 587 moves much more quickly through the content and therefore has time within the semester to spend a lot of time on multiple regression models. In contrast, STAT 330 just barely has time to discuss simple linear regression models.

Probability

For basic probability, I have a 1 hour video containing all the probability required for the course. Alternatively, I have a playlist that has the basic probability topics separated into individual videos.

Discrete random variables

For random variables, I have an introductory video that describes the difference between discrete and continuous random variables. Then, I have a video about general discrete random variables and videos about

random variables and their distributions as well as a video about multiple discrete random variables.

Continuous random variables

Moving on to continuous random variables, I have a video about general continuous random variables and videos specifically about

random variables and their distributions.

The normal distribution is extremely important due to the result of the Central Limit Theorem (CLT) introduced in this video and made a bit more practical in this follow up video. There is a 3rd video in this CLT series, but I don’t think it is too helpful (in retrospect).

Stochastic processes

At the moment, I don’t have videos on Markov chains, Poisson processes, and queuing systems.

Statistical inference

The videos I have on statistical inference a much more in depth than we have time for in STAT 330. If you are interested in this depth, then you can following along with this playlist. From this playlist, the most relevant videos are

In the playlist, I also introduce Bayesian approaches to statistical inference. This approach is not required for this course, but some students may be interested in this alternative to p-values, confidence intervals, and hypothesis testing.

Simple linear regression

For the topic of simple linear regression, the main videos are

While this is the extent of the simple linear regression model in STAT 330, there are many other relevant videos in my regression playlist including the first 7 videos. The remainder of the videos in the playlist move in to multiple regression which, again, may be of interest to some students. Multiple regression models form the basis for many statistical and machine learning methods.