Can the covariate values be taken at face value as integers, or was domain knowledge incorporated into the mechanisms used to generate treatment assignments and outcomes? Here are two examples 1) An unordered categorical variable (e.g., MBRACE, FBRACE ) is represented in the data by an integer. For the purpose of this competition, should these types of covariates be viewed as numeric or categorical? 2) For some variables "99" encodes a missing value (e.g., PRECARE) For this competition, should this value be interpreted as "missing" or as a valid entry? Thank you.

Created by Susan Gruber sgruber
Dear Susan, First, apologies for my late reply (there's been a holiday and a weekend, but here we are now). As for your question, the covariate values **are** taken at face value as integers. The data is presented as is. We discussed it between us, so here our reasoning in a nutshell: 1. We wanted to keep the data closest to its original form. 2. We didn't want to do any preprocessing that wouldn't be supplied to the participants (that would be unfair, you shouldn't guess how we munged the data). 3. We didn't want to force any specific design matrix that would render the original data obsolete (relates to 1). As a result, we used the covariates as is. For example, missing data were given an outlier-ish value so the model could adjust for. Best regards, Ehud

Do data generating mechanisms conform with the underlying meaning of the variables encodings? page is loading…