We used R to analyze all examples in Chapter 20. We put the code here so that you can too.
Binomial log-likelihood of a value for the probability of success:
dbinom(23, size = 32, p = 0.5, log = TRUE)
We only need ggplot2
for ordinary graphs.
library(ggplot2)
Maximum likelihood estimation of the proportion of parasitic wasp individuals that choose the mated butterflies in a choice test. Below, we also apply the log-likelihood ratio test to these data.
wasps <- read.csv(url("https://whitlockschluter3e.zoology.ubc.ca/Data/chapter20/chap20e3UnrulyPassengers.csv"))
head(wasps)
## choice
## 1 mated
## 2 mated
## 3 mated
## 4 mated
## 5 mated
## 6 mated
Frequency table of wasp choice.
table(wasps)
## wasps
## mated unmated
## 23 9
The likelihood and log-likelihood of \(p\) = 0.5.
dbinom(23, size = 32, p = 0.5)
## [1] 0.00653062
dbinom(23, size = 32, p = 0.5, log = TRUE)
## [1] -5.031253
The log-likelihood of a range of different values of \(p\) (Table 20.3-1) is obtained as follows.
proportion <- seq(0.1, 0.9, by = 0.1)
logLike <- dbinom(23, size = 32, p = proportion, log = TRUE)
data.frame(Proportion = proportion, Loglikelihood = logLike)
## Proportion Loglikelihood
## 1 0.1 -36.758245
## 2 0.2 -21.875908
## 3 0.3 -13.751993
## 4 0.4 -8.522661
## 5 0.5 -5.031253
## 6 0.6 -2.846150
## 7 0.7 -1.889823
## 8 0.8 -2.467786
## 9 0.9 -5.997101
The log-likelihood calculated using a narrower range of values for \(p\) (Table 20.3-2). The additional quantity dlogLike
is the difference between each likelihood and the maximum.
proportion <- seq(0.4, 0.9, by = 0.01)
logLike <- dbinom(23, size = 32, p = proportion, log = TRUE)
dlogLike <- logLike - max(logLike)
Let’s put the result into a data frame.
likeResults <- data.frame(Proportion = proportion, Loglikelihood = logLike,
diffLogLike = dlogLike)
head(likeResults)
## Proportion Loglikelihood diffLogLike
## 1 0.40 -8.522661 -6.659833
## 2 0.41 -8.105995 -6.243167
## 3 0.42 -7.705601 -5.842773
## 4 0.43 -7.320925 -5.458097
## 5 0.44 -6.951463 -5.088635
## 6 0.45 -6.596754 -4.733925
pHat <- proportion[logLike == max(logLike)]
pHat
## [1] 0.72
The log-likelihood curve (Figure 20.3-1) is plotted below.
ggplot(data = likeResults, aes(x = Proportion, y = Loglikelihood)) +
geom_line() +
geom_hline(yintercept = max(logLike), linetype = "dashed") +
geom_vline(xintercept = pHat, color = "firebrick", size = 1.5) +
labs(y = "Log-likelihood",
x = expression(paste("Hypothesized proportion, ", italic(p)))) +
theme_classic()
An approximate likelihood-based 95% confidence interval is obtained as follows.
lower <- max( proportion[proportion <= pHat & logLike <= max(logLike) - 1.92] )
upper <- min( proportion[proportion <= pHat & logLike <= max(logLike) - 1.92] )
c(lower = lower, upper = upper)
## lower upper
## 0.55 0.40
The log-likelihood ratio test to test the null hypothesis that the population proportion of wasps choosing mated female butterflies is 0.5. G
is the test statistic and P
is the P-value.
G <- 2 * ( dbinom(23, size = 32, p = pHat, log = TRUE)
- dbinom(23, size = 32, p = 0.5, log = TRUE) )
G
## [1] 6.33685
P <- 1 - pchisq(G, 1)
P
## [1] 0.01182547
Maximum likelihood estimation of elephant population size using mark-recapture data.
The data are on the 74 individuals collected in the second sample.
elephant <- read.csv(url("https://whitlockschluter3e.zoology.ubc.ca/Data/chapter20/chap20e4ConservationScoop.csv"))
head(elephant)
## class
## 1 recaptured
## 2 recaptured
## 3 recaptured
## 4 recaptured
## 5 recaptured
## 6 recaptured
The frequency table of the 74 recaptured or unmarked individuals in the second sample.
table(elephant)
## elephant
## recaptured unmarked
## 15 59
These are the quantities needed for the likelihood calculations.
m <- 27 # size of first sample (total marked individuals)
n2 <- 74 # size of second sample
Y <- 15 # number of recaptures in second sample
Choose a range of values of N
to try. N
is the (unknown) total number of individuals in the population. N
must be at least n2 + m - Y
(86, in the current example).
N <- 90:250 # The values of N to try
These are the likelihoods and log-likelihood of each N
.
like <- choose(m, Y) * choose(N - m, n2 - Y) / choose(N, n2)
logLike <- lchoose(m, Y) + lchoose(N - m, n2 - Y) - lchoose(N, n2)
Likelihood and log-likelihood can also be obtained using R’s built-in function for the hypergeometric distribution.
like <- dhyper(Y, m, N - m, n2)
logLike <- dhyper(Y, m, N - m, n2, log = TRUE)
The maximum likelihood estimate of elephant population size.
Nhat <- N[logLike == max(logLike)]
Nhat
## [1] 133
The log-likelihood curve (Figure 20.4-1) is produced as follows.
likeResults <- data.frame(N, like, logLike)
ggplot(data = likeResults, aes(x = N, y = logLike)) +
geom_line() +
labs(y = "Log-likelihood",
x = expression(paste("Population size estimate, ", italic(N)))) +
theme_classic()
The likelihood-based 95% confidence interval for elephant population size is approximately as follows.
lower <- max(N[N <= Nhat & logLike <= max(logLike) - 1.92])
upper <- min(N[N <= Nhat & logLike <= max(logLike) - 1.92])
c(lower = lower, upper = upper)
## lower upper
## 104 90