Data Management Decisions

This page shows the part of my research project for the Data Management and Visualization course on Coursera (Week 3 Assignment: Making Data Management Decisions). The research topic and data set are described here.

SAS Program Code


SAS Program Output


Summary of the Data Management Decisions made

Three new variables, AgeGroup, AgeAlcUse, and AlcDisorder, have been created by collapsing the responses for AGE, S2AQ16A, and ALCABDEP12DX, respectively.

The distribution of the survey participants’ age is as follows: Of the sample, 40.86% are younger than 40 years old, 34.60% are between the age of 40-59, and 24.54% are 60 years or older.

The participants of the NESARC study were asked “About how old were you when you first started drinking, not counting small tastes or sips of alcohol?”: The most common response is the age range between 18 and 20 (29.06% of the people who have ever drunk). 30.91% started drinking at the age of 17 or younger. 31.23% started drinking at the age of 21 and older while 2.69% didn’t remember the age when they started drinking. Note that the frequency of missing value (8,266) represents the number of people who have never drunk.

The third frequency distribution table shows the prevalence of alcohol use disorder (alcohol abuse and/or dependence). Of the respondents, about 7.72% have alcohol use disorder while the rest, about 92.28%, do not have any alcohol use disorder. [Note that the ALCABDEP12DX variable (ALCOHOL ABUSE/DEPENDENCE IN LAST 12 MONTHS) is from a DSM–IV diagnosis based on an extensive list of symptom questions, not just a single question.]