Atnaujinkite slapukų nuostatas

Incomplete Categorical Data Design: Non-Randomized Response Techniques for Sensitive Questions in Surveys [Minkštas viršelis]

  • Formatas: Paperback / softback, 322 pages, aukštis x plotis: 234x156 mm, weight: 508 g
  • Išleidimo metai: 07-Oct-2019
  • Leidėjas: Chapman & Hall/CRC
  • ISBN-10: 0367379627
  • ISBN-13: 9780367379629
Kitos knygos pagal šią temą:
  • Formatas: Paperback / softback, 322 pages, aukštis x plotis: 234x156 mm, weight: 508 g
  • Išleidimo metai: 07-Oct-2019
  • Leidėjas: Chapman & Hall/CRC
  • ISBN-10: 0367379627
  • ISBN-13: 9780367379629
Kitos knygos pagal šią temą:

Respondents to survey questions involving sensitive information, such as sexual behavior, illegal drug usage, tax evasion, and income, may refuse to answer the questions or provide untruthful answers to protect their privacy. This creates a challenge in drawing valid inferences from potentially inaccurate data. Addressing this difficulty, non-randomized response approaches enable sample survey practitioners and applied statisticians to protect the privacy of respondents and properly analyze the gathered data.



Incomplete Categorical Data Design: Non-Randomized Response Techniques for Sensitive Questions in Surveys is the first book on non-randomized response designs and statistical analysis methods. The techniques covered integrate the strengths of existing approaches, including randomized response models, incomplete categorical data design, the EM algorithm, the bootstrap method, and the data augmentation algorithm.





A self-contained, systematic introduction, the book shows you how to draw valid statistical inferences from survey data with sensitive characteristics. It guides you in applying the non-randomized response approach in surveys and new non-randomized response designs. All R codes for the examples are available at www.saasweb.hku.hk/staff/gltian/.

Preface xvii
1 Introduction 1(22)
1.1 Randomized Response Models
1(9)
1.1.1 The Warner model
1(1)
1.1.2 Other randomized response models
2(5)
1.1.3 Limitations of the randomized response models
7(3)
1.2 Item Count Techniques
10(3)
1.2.1 Basic idea for the item count techniques
10(2)
1.2.2 Some applications and generalizations
12(1)
1.2.3 Limitations of the item count techniques
12(1)
1.3 Non-randomized Response Models
13(5)
1.3.1 Swensson's non-randomized response model
13(1)
1.3.2 Takahasi and Sakasegawa's non-randomized response model
14(3)
1.3.3 Non-randomized response models from a viewpoint of incomplete categorical data design
17(1)
1.4 Scope of the Rest of the Book
18(5)
2 The Crosswise Model 23(20)
2.1 The Warner Model
23(4)
2.1.1 The survey design
23(1)
2.1.2 Point estimation
24(1)
2.1.3 Relative efficiency
24(2)
2.1.4 Degree of privacy protection
26(1)
2.2 A Non-randomized Warner Model: The Crosswise Model
27(7)
2.2.1 The survey design
28(1)
2.2.2 Connection with the Warner model
28(2)
2.2.3 Two asymptotic confidence intervals
30(1)
2.2.4 Bootstrap confidence intervals
31(1)
2.2.5 An asymptotic property of the modified MLE
32(2)
2.3 Bayesian Methods for the Crosswise Model
34(2)
2.3.1 Posterior moments
34(1)
2.3.2 Posterior mode
34(1)
2.3.3 Generation of i.i.d. posterior samples via the exact IBF sampling
35(1)
2.4 Analyzing the Induced Abortion Data
36(1)
2.5 An Experimental Survey Measuring Plagiarism
37(6)
2.5.1 Survey data
37(2)
2.5.2 Analyzing the survey data for partial plagiarism
39(2)
2.5.3 Analyzing the survey data for severe plagiarism
41(2)
3 The Triangular Model 43(22)
3.1 The Triangular Design
43(4)
3.1.1 The survey design
43(1)
3.1.2 Alternative formulation
44(1)
3.1.3 Variance of the estimator
45(1)
3.1.4 Relative efficiency
45(1)
3.1.5 Degree of privacy protection
46(1)
3.2 Comparison with the Warner Model
47(4)
3.2.1 The difference of two variances
47(2)
3.2.2 Relative efficiency of the Warner model to the triangular model
49(1)
3.2.3 Degree of privacy protection
50(1)
3.3 Asymptotic Properties of the MLE
51(3)
3.3.1 An alternative derivation of the MLE
51(1)
3.3.2 Two asymptotic confidence intervals
51(1)
3.3.3 Bootstrap confidence intervals
52(1)
3.3.4 A modified MLE of π
53(1)
3.4 Bayesian Methods for the Triangular Model
54(2)
3.4.1 Posterior moments
54(1)
3.4.2 Posterior mode
55(1)
3.4.3 Generation of i.i.d. posterior samples via the exact IBF sampling
56(1)
3.5 Analyzing the Sexual Behavior Data
56(3)
3.6 Case Studies on Premarital Sexual Behavior
59(6)
3.6.1 Questionnaire at Hong Kong Baptist University
59(3)
3.6.2 Questionnaire at the Northeast Normal University
62(3)
4 Sample Sizes for the Crosswise and Triangular Models 65(14)
4.1 Precision and Power Analysis Methods
65(2)
4.1.1 Type I error rate, Type II error rate and power
65(2)
4.1.2 Precision analysis
67(1)
4.1.3 Power analysis
67(1)
4.2 The Triangular Model for One-sample Problem
67(5)
4.2.1 A one-sided test
68(1)
4.2.2 A two-sided test
69(1)
4.2.3 Evaluation of the performance by comparing exact power with asymptotic power
69(1)
4.2.4 Evaluation of the performance by calculating nT and nT/nD
70(2)
3 The Crosswise Model for One-sample Problem
72(1)
4.3.1 A one-sided test
72(1)
4.3.2 Evaluation of the performance by calculating nC and nC/nD
73(1)
4.4 Comparison for the Crosswise and Triangular Models
73(3)
4.4.1 Comparison via the calculation of the ratio nc/nT
73(2)
4.4.2 A theoretical justification
75(1)
4.5 The Triangular Model for Two-sample Problem
76(2)
4.6 An Example
78(1)
5 The Multi-category Triangular Model 79(12)
5.1 A Brief Literature Review
79(1)
5.2 The Survey Design
80(2)
5.2.1 Design of questionnaire
80(1)
5.2.2 Determination of the non-sensitive question
81(1)
5.3 Likelihood-based Inferences
82(4)
5.3.1 MLEs via the EM algorithm
82(2)
5.3.2 Asymptotic confidence intervals
84(2)
5.3.3 Bootstrap confidence intervals
86(1)
5.4 Bayesian Inferences
86(1)
5.5 Questionnaire on Sexual Activities in Korean Adolescents
87(4)
6 The Hidden Sensitivity Model 91(28)
6.1 Background
91(1)
6.2 The Survey Design
92(2)
6.2.1 The issue
92(1)
6.2.2 The design of questionnaire
92(2)
6.3 Likelihood-based Inferences
94(4)
6.3.1 MLEs via the EM algorithm
95(1)
6.3.2 Bootstrap confidence intervals
95(1)
6.3.3 Testing of association
96(2)
6.4 Information Loss and Design Consideration
98(2)
6.4.1 Information loss due to the introduction of the non-sensitive variate
98(1)
6.4.2 Design of the cooperative parameters
99(1)
6.5 Simulation Studies
100(6)
6.5.1 Comparison of the likelihood ratio test with the chi-squared test
100(4)
6.5.2 The probability of obtaining valid estimates
104(2)
6.6 Bayesian Inferences under Dirichlet Prior
106(1)
6.6.1 Posterior moments
106(1)
6.6.2 Posterior mode
107(1)
6.6.3 Generation of posterior samples via the DA algorithm
107(1)
6.7 Bayesian Inferences under Other Priors
107(7)
6.7.1 Orthogonal parameter space
108(1)
6.7.2 Joint prior for modeling independence with constraints
109(1)
6.7.3 Joint prior for modeling negative correlation structure
109(1)
6.7.4 Joint prior for modeling positive correlation structure
110(4)
6.8 Analyzing HIV Data in an AIDS Study
114(5)
6.8.1 Likelihood-based methods
114(3)
6.8.2 Bayesian methods
117(2)
7 The Parallel Model 119(38)
7.1 The Unrelated Question Model
120(6)
7.1.1 The survey design
120(1)
7.1.2 Estimation
121(2)
7.1.3 Relative efficiency
123(1)
7.1.4 Degree of privacy protection
124(2)
7.2 A Non-randomized Unrelated Question Model: The Parallel Model
126(8)
7.2.1 The survey design for the parallel model
127(1)
7.2.2 Connection between the parallel model and the unrelated question model
128(1)
7.2.3 Asymptotic properties of the MLE
129(5)
7.3 Comparison with the Crosswise Model
134(7)
7.3.1 The difference between variances
135(4)
7.3.2 Relative efficiency of the crosswise model to the parallel model
139(2)
7.3.3 Degree of privacy protection
141(1)
7.4 Comparison with the Triangular Model
141(5)
7.4.1 The difference between variances
141(3)
7.4.2 Relative efficiency of the triangular model to the parallel model
144(2)
7.4.3 Degree of privacy protection
146(1)
7.5 Bayesian Inferences
146(2)
7.5.1 Posterior moments in closed-form
146(1)
7.5.2 Calculation of the posterior mode via the EM algorithm
147(1)
7.5.3 Generation of i.i.d. posterior samples via the exact IBF sampling
148(1)
7.6 An Example: Induced Abortion in Mexico
148(2)
7.7 A Case Study on College Students' Premarital Sexual Behavior at Wuhan
150(3)
7.8 A Case Study on Plagiarism at The University of Hong Kong
153(2)
7.9 Discussion
155(2)
8 Sample Size Calculation for the Parallel Model 157(20)
8.1 Sample Sizes for One-sample Problem
157(6)
8.1.1 A one-sided test
158(1)
8.1.2 A two-sided test
159(1)
8.1.3 Evaluation of the performance by comparing exact power with asymptotic power
160(1)
8.1.4 Evaluation of the performance by calculating np and np/nD
160(3)
8.2 Comparison with the Crosswise Model
163(5)
8.2.1 Numerical comparisons
163(1)
8.2.2 A theoretical justification
163(5)
8.3 Comparison with the Triangular Model
168(4)
8.3.1 Numerical comparisons
169(1)
8.3.2 A theoretical justification
169(3)
8.4 Sample Size for Two-sample Problem
172(2)
8.5 An Example
174(3)
9 The Multi-category Parallel Model 177(30)
9.1 The Survey Design
177(2)
9.2 Likelihood-based Inferences
179(6)
9.2.1 MLEs via the EM algorithm
179(1)
9.2.2 Two bootstrap confidence intervals
180(1)
9.2.3 Explicit solutions to the valid estimators
181(1)
9.2.4 Three asymptotic confidence intervals
182(3)
9.3 Bayesian Inferences
185(3)
9.3.1 Posterior moments
185(1)
9.3.2 Calculation of the posterior mode via the EM algorithm
186(1)
9.3.3 Generation of posterior samples via the data augmentation algorithm
187(1)
9.4 A Special Case of the Multi-category Parallel Model
188(6)
9.4.1 A four-category parallel model
188(1)
9.4.2 Testing hypotheses for association
188(3)
9.4.3 Comparison of the likelihood ratio test with the chi-squared test
191(3)
9.5 Comparison with the Multi-category Triangular Model
194(5)
9.5.1 The difference between the trace of two variance-covariance matrices
194(4)
9.5.2 Degree of privacy protection
198(1)
9.6 An Example
199(4)
9.6.1 The income and sexual partner data
199(2)
9.6.2 Likelihood-based analysis
201(1)
9.6.3 Bayesian analysis
202(1)
9.7 Discussion
203(4)
10 A Variant of the Parallel Model 207(34)
10.1 The Survey Design and Basic Properties
207(6)
10.1.1 The survey design
207(2)
10.1.2 Estimation
209(2)
10.1.3 Relative efficiency
211(1)
10.1.4 Degree of privacy protection
211(2)
10.2 Statistical Inferences on π
213(6)
10.2.1 An unbiased estimator of the variance of πV
214(1)
10.2.2 Three asymptotic confidence intervals of π for large sample sizes
215(2)
10.2.3 The exact (Clopper-Pearson) confidence interval
217(1)
10.2.4 A modified MLE of π and its asymptotic property
217(2)
10.3 Statistical Inferences on Θ
219(4)
10.3.1 Three asymptotic confidence intervals of 9 for large sample sizes
219(2)
10.3.2 The exact (Clopper-Pearson) confidence interval
221(1)
10.3.3 Testing Hypotheses
222(1)
10.4 Bootstrap Confidence Intervals
223(1)
10.5 Bayesian Inferences
224(2)
10.5.1 Posterior moments with explicit expressions
224(1)
10.5.2 Calculation of the posterior modes via the EM algorithm
225(1)
10.5.3 Generation of i.i.d. posterior samples via the exact IBF sampling
226(1)
10.6 Comparison with the Crosswise Model
226(3)
10.6.1 The difference of variances
226(2)
10.6.2 Relative efficiency of the crosswise model to the variant of the parallel model
228(1)
10.7 Comparison with the Triangular Model
229(2)
10.7.1 The difference of variances
229(1)
10.7.2 Relative efficiency of the triangular model to the variant of the parallel model
230(1)
10.8 The Noncompliance Behavior
231(1)
10.9 An Illustrative Example of Sexual Practices
232(4)
10.10 Case Studies on Cheating Behavior in Examinations
236(4)
10.10.1 Design and analysis under the assumption of complete compliance
236(3)
10.10.2 Design and analysis under the consideration of noncompliance
239(1)
10.11 Discussion
240(1)
11 The Combination Questionnaire Model 241(16)
11.1 The Survey Design
241(3)
11.2 Likelihood-based Inferences
244(5)
11.2.1 MLEs via the EM algorithm
244(1)
11.2.2 Asymptotic confidence intervals
245(2)
11.2.3 Bootstrap confidence intervals
247(1)
11.2.4 The likelihood ratio test for testing association
248(1)
11.3 Bayesian Inferences
249(1)
11.4 Analyzing Cervical Cancer Data in Atlanta
250(4)
11.4.1 Likelihood-based inferences
251(2)
11.4.2 Bayesian inferences
253(1)
11.5 Group Dirichlet Distribution
254(3)
11.5.1 The mode of a group Dirichlet density
254(1)
11.5.2 Sampling from a group Dirichlet distribution
255(2)
Appendix A: The EM and DA Algorithms 257(6)
Appendix B: The Exact IBF Sampling 263(2)
Appendix C: Some Statistical Distributions 265(6)
List of Figures 271(6)
List of Tables 277(4)
References 281(14)
Author Index 295(4)
Subject Index 299
Guo-Liang Tian is an associate professor of statistics in the Department of Statistics and Actuarial Science at the University of Hong Kong. Dr. Tian has published more than 60 (bio)statistical and medical papers in international peer-reviewed journals on missing data analysis, constrained parameter models and variable selection, sample surveys with sensitive questions, and cancer clinical trial and design. He is also the co-author of two books. He received a PhD in statistics from the Institute of Applied Mathematics, Chinese Academy of Science.





Man-Lai Tang is an associate professor in the Department of Mathematics at Hong Kong Baptist University. Dr. Tang is an editorial board member of Advances and Applications in Statistical Sciences and the Journal of Probability and Statistics; associate editor of Communications in Statistics-Theory and Methods and Communications in Statistics-Simulation and Computation; and editorial advisory board member of the Open Medical Informatics Journal. His research interests include exact methods for discrete data, equivalence/non-inferiority trials, and biostatistics. He received a PhD in biostatistics from UCLA.