Similarly the estimate for ‘A’ versus ‘B’ is the same as the other model ( 10.034082). The intercept continues to be the same as for the model with only ‘A’ and ‘B’, since it is still the mean of ‘A’ ( 39.7730349). groupĪs for the two-level data set, the first model here uses dummy coding.
SPSS CODE CATEGORICAL VARIABLES FULL
First, here is a summary of our full data set with three levels for ‘group’. Where this gets more complicated is when trying to recreate dummy coding estimates with contrast coding when you have a variable with three levels. ab_contrast.lm = lm(score ~ contrast_AvB, data_twolevel_contrast) The estimate for ‘group’ (here ‘contrast_AvB’) is the same as for the model with dummy coding, but the intercept is the mean of ‘A’ and ‘B’ together ( 44.7900759), not just the mean of ‘A’. The model below is the same as the first model, but using contrast coding. data_twolevel_contrast = data_twolevel %>% Other tutorials on contrast coding often use the ‘contrasts()’ call, both methods produce the same results in the model. Note, here I did this by making a new numeric column.
SPSS CODE CATEGORICAL VARIABLES CODE
The code below does this by making a new column where ‘A’ is equal to -0.5 and ‘B’ to 0.5. With contrast coding we can recode the number values for our levels so that 0 is in between each level, instead of being equal to a level. ab_dummy.lm = lm(score ~ group, data_twolevel) The estimate for ‘group’ is the difference between group ‘A’ and group ‘B’ ( 10.034082). As you can see, the intercept is 39.7730349, the same value as the mean of level ‘A’. As a result the intercept (when x is 0) is the value of the default level, here the mean of ‘A’. Since R codes variables alphabetically ‘A’ is our default (or 0) and ‘B’ is our non-default (or 1). With dummy coding the default of a level is coded as 0 and the non-default level is coded as 1. To see the power of contrast coding we’ll compare a model with dummy coding (the default method of coding in R for factors) to a model with contrast coding when ‘group’ only has two levels (‘A’ and ‘B’). The code for generating the data is below. For the first analysis we’ll only compare two levels, ‘A’ and ‘B’. Dataįor this problem we’ll generate some fake data with a variable ‘group’ with three levels, ‘A’, ‘B’, and ‘C’. Today’s post will focus on what happens when a categorical variable has three levels and when contrast coding doesn’t directly match the default coding system. This can also be useful when running models with an interaction of two variables, as it gets rid of the problem of baselines when trying to interpret model results (see blog post on Linear Models Part 1 for more details). Contrast coding allows for recentering of categorical variables such that the intercept of a model is not the mean of one level of a category, but instead the mean of all data points in the data set. One method to recode categorical variables that has recently become more popular is ‘contrast coding’.