Chao et al. (2001) report data on an outbreak of Hepatitis A virus among students of a college. Investigators want to estimate N, the total number of students with Hepatitis A. Cases were reported from three sources: (1) a serum test conducted by the Institute of Preventive Medicine (P list), (2) local hospital records from the National Quarantine Service (Q list), and (3) records collected by epidemiologists (E list). The following table gives the counts from the three sources:
a. Suppose that only the P list and Q list had been collected, with n1 = 135, n2 = 122, and m = 49. Calculate ˆN, Chapman’s estimate ˜N, and the standard error for each estimate.
b. Fit log linear models to the data. Using the deviance, evaluate the fit of these models. Is there evidence that the lists are dependent?

