This document is under a CC BY 4.0 license. The vignette was writen in R markdown and the original script is available on my GitHub page. Comments and pull requests are welcome.
Reference:
Joly, Philippe. 2018. “Generations and Protest in Eastern Germany: Between Revolution and Apathy.” WZB Discussion Paper SP V 2018-101 (June). doi: 10.17605/OSF.IO/GJ53P.
Almost 30 years after the fall of the Berlin Wall, traces of the former east-west division are still visible everywhere in Germany: in the city landscapes, in the economy, and in the political culture. But how can we study differences in attitudes, beliefs, and behavior between East and West Germans?
In this vignette, I would like to introduce a technique I have developed in a recent paper to compare the political orientations of East and West Germans using the European Social Survey (ESS).
If you have ever worked with the ESS, you might know that there is one variable in the core dataset, intewde
, indicating whether a respondent was interviewed in Eastern or Western Germany. For simple regional comparisons, this variable might work. However, if you are interested in how citizens of Germany have been marked by the historical division of their country, this variable is insufficient.
One major problem is that the current location of a respondent says little about his or her background. A respondent might just have moved to one region after having lived his or her entire life in the other. This approach does not take into account the massive east-west migration that took place during and after the Cold War.
When scholars compare East and West Germans, they usually want to assess differences produced by the experience of living in two different states with distinct economic and political systems. Furthermore, most observers agree that the length of exposure to a given system and the age at which a person was exposed matter. The literature on political socialization suggests that the period between mid-adolescence and early adulthood is crucial for the development of one’s political orientation and habits. Ideally, we would like to know at what point during their lifetime did certain citizens live in Eastern or Western Germany.
The ESS has all the information needed to determine whether a respondent was socialized in Eastern or Western Germany, but a lot of data manipulation is necessary to get there. This difficulty stems from three problems:
The script I present here reproduces in R the procedure I have implemented in my paper on “Protest and generations in Eastern Germany.” This paper compares the protest behavior of East and West Germans across generations and over time. It concludes that East Germans, especially those who grew up during the Cold War, participate less in protest activities than West Germans from the same generation after controlling for other individual characteristics. The paper defines an East German as someone who spent the majority of his or her early formative years, that is, between 15 and 25 years old, in Eastern Germany.
Before starting to work with R, go on the ESS website and then on Germany’s page to download the “country-specific data” for the rounds you need. For rounds 1 to 7, you will only have the option to download a POR datafile (click “Download SPSS”). For round 8, download the SAV file (again, click “Download SPSS”). Save and decompress all the datafiles in data/raw/
.
In R, load the necessary packages for this demonstration (if needed, install them with install.packages
).
# install.packages("dplyr")
# install.packages("essurvey")
# install.packages("foreign")
# install.packages("ggplot2")
# install.packages("magrittr")
# install.packages("stringr")
# install.packages("tibble")
library(dplyr) # Used for data wrangling
library(essurvey) # Downloads main ESS datafiles
library(foreign) # Converts SPSS files to R objects
library(ggplot2) # Used for Data visualization
library(magrittr) # Allows pipe operator
library(stringr) # Performs string operations
library(tibble) # Works with tibble dataframes
The script below browses the files in data/raw/
and produces a character vector of the names of the files saved as ESS*csDE
with a .por
or a .sav
extension.
spssfiles <- file.path("data", "raw") %>%
list.files() %>%
.[(str_detect(., "ESS[:digit:]csDE.(por|sav)"))]
If you downloaded and decompressed the ESS country-specific data properly, you should get the following vector:
spssfiles
## [1] "ESS1csDE.por" "ESS2csDE.por" "ESS3csDE.por" "ESS4csDE.por"
## [5] "ESS5csDE.por" "ESS6csDE.por" "ESS7csDE.por" "ESS8csDE.sav"
We let R loop over the vector of file names. We load the SPSS datafiles in the R environment and save them as RDA files in data/
.
for (i in seq_along(spssfiles)) {
rootname <- str_sub(spssfiles[i], end = -5)
spssfilepath <- file.path("data", "raw", spssfiles[i])
rdafilepath <- file.path("data", paste0(rootname, ".Rda"))
read.spss(spssfilepath, use.value.labels = F, to.data.frame = T) %>%
as_tibble() %>%
saveRDS(file = rdafilepath)
}
If everything went smoothly, you should now have the following files in the data/
folder:
file.path("data") %>%
list.files()
## [1] "ESS1csDE.Rda" "ESS2csDE.Rda" "ESS3csDE.Rda" "ESS4csDE.Rda"
## [5] "ESS5csDE.Rda" "ESS6csDE.Rda" "ESS7csDE.Rda" "ESS8csDE.Rda"
## [9] "raw"
Next, we take advantage of the essurvey
package, which allows downloading the main ESS datafiles, directly from the ESS website. Save your ESS email as an environment variable with the essurvey::set_email
function (make sure to register your email on the ESS website beforehand).
# set_email("your@email.com")
The function essurvey::show_country_rounds
displays the ESS rounds available for Germany. We save them as a numeric vector (alternatively, you can select the rounds you need for your own analysis).
rounds <- show_country_rounds("Germany")
rounds
## [1] 1 2 3 4 5 6 7 8
We then loop over the selected rounds, load the datasets in the R environment (with the function essurvey::import_country
), and save them as separate RDA files.
for (i in seq_along(rounds)) {
rootname <- paste0("ESS", i, "DE")
rdafilepath <- file.path("data", paste0(rootname, ".Rda"))
import_country(country = "Germany", rounds = i) %>%
saveRDS(file = rdafilepath)
}
## Downloading ESS1
## Downloading ESS2
## Downloading ESS3
## Downloading ESS4
## Downloading ESS5
## Downloading ESS6
## Downloading ESS7
## Downloading ESS8
We now have the following files in the data/
folder:
file.path("data") %>%
list.files()
## [1] "ESS1csDE.Rda" "ESS1DE.Rda" "ESS2csDE.Rda" "ESS2DE.Rda"
## [5] "ESS3csDE.Rda" "ESS3DE.Rda" "ESS4csDE.Rda" "ESS4DE.Rda"
## [9] "ESS5csDE.Rda" "ESS5DE.Rda" "ESS6csDE.Rda" "ESS6DE.Rda"
## [13] "ESS7csDE.Rda" "ESS7DE.Rda" "ESS8csDE.Rda" "ESS8DE.Rda"
## [17] "raw"
We are ready to merge the main datafiles with the country-specific data. For the rest of this demonstration, we will only work with the eighth ESS round, but the procedure would be valid for any round.
We start by loading the main datafile and the respective country-specific data in two separate R objects.
ess8main <- file.path("data", "ESS8DE.Rda") %>%
readRDS()
ess8cs <- file.path("data", "ESS8csDE.Rda") %>%
readRDS()
We define the function merge_ess_cs
, which takes as arguments two datasets: the main country-file and the country-specific data. The function renames the variables in the country-specific data in lowercase, merges the two datasets using respondents’ id and country as keys, and recodes the “labelled” variables as “numeric” (numeric variables are better handled by some R functions).
In the last part of the function, we rename variables whose name varies depending on the ESS round. We will work with three variables: wherebefore1990
, yrmovedwest
, and yrmovedeast
.
merge_ess_cs <- function(main, cs) {
# Rename variable names to lowercase
names(cs) <- names(cs) %>%
tolower()
# Merge main and country-specific data by respondent id and country
merged <- left_join(main, cs, by=c("idno", "cntry")) %>%
recode_missings()
# Recode variables with class "labelled" to "numeric"
for (i in 1:ncol(merged)) {
if (class(merged[[i]])=="labelled") {
merged[i] <- merged[[i]] %>%
as.numeric()
}
}
# Rename variables so that everything is clearer and harmonized across ESS
# rounds
if (any(names(merged) == "splow5de")) { # Names for ESS 1, 2, 3, 4, 6, 7, 8
merged <- merged %>%
mutate(
wherebefore1990 = splow2de,
yrmovedwest = splow4de,
yrmovedeast = splow5de
)
} else if (any(names(merged) == "n3")) { # Names for ESS 5
merged <- merged %>%
mutate(
wherebefore1990 = n3,
yrmovedwest = n5a_1,
yrmovedeast = n5b_1
)
} else {
print("Wrong data!")
break
}
return(merged)
}
The three variables are based on the following questions.
wherebefore1990
(named splow2de
or n3
in the original datasets) asked respondents “where did you live before 1990?” with four possible answers:
If the respondent was interviewed in Eastern Germany, but answered ‘2’ to the previous question, a follow-up question, yrmovedwest
(named splow4de
or n5a_1
in the original datasets), asked “when did you move to Western Germany?”
If the respondent was interviewed in Western Germany, but answered ‘1’ to the previous question, a follow-up question, yrmovedeast
(named splow5de
or n5b_1
in the original datasets), asked “when did you move to Eastern Germany?”
We apply the function merge_ess_cs to the main ESS 8 country file and its respective country-specific data (you can repeat this operation for the rounds you need).
ess8merged <- merge_ess_cs(ess8main, ess8cs)
We now turn to the core of this demonstration. Below, we define a function get_east_west_var
, which transforms the ESS data and returns a dataset containing a variable that categorizes respondents as East or West Germans. The function has three arguments.
essdata
is the merged ESS datafile (the main country-file merged with country-specific data).agemin
is the age at which respondents begin their early formative years.agemax
is the age at which the early formative years end.agemin
and agemax
are guided by theory. Based on mainstream political socialization literature, the early formative years are set, by default, between 15 and 25 years old. Users, however, are free to change these values. The important point here is that being an “East German” or a “West German” means having spent a given number of years, at a certain age, in one region of Germany. You can define the age bracket that makes more sense for your own research.
Let’s go through the function step by step.
(1) We define a maximum number of years of early political socialization. By default, the age bracket that matters most has 11 years (agemax
is included as a full year, that’s why we add 1).
(2) To prevent future errors, we save in the R environment the last year during which the survey was conducted.
(3) We generate a series of new variables by transforming existing ones.
(3.1) eastintv
indicates whether the respondent was interviewed in Eastern Germany.
(3.2) eastbefore1990
indicates whether the respondent lived in East Germany before 1990.
(3.3) agemovedeast
indicates the age at which a respondent moved to Eastern Germany (if he or she did so).
(3.4) agemovedwest
indicates the age at which a respondent moved to Western Germany (if he or she did so).
(3.5) soctotyears
counts the number of years of socialization the respondent went through. This variable takes different values depending on the age of the respondent.
(3.6) socyearseast
counts, of these years of socialization, how many were experienced in Eastern Germany. There are four patterns to consider.
Case 1: The respondent lived in the GDR and is still living in Eastern Germany.
Case 2: The respondent lived in the GDR, but moved to Western Germany.
Case 3: The respondent lived in Western Germany before 1990 and is still living in Western Germany.
Case 4: The respondent lived in Western Germany, but moved to Eastern Germany.
(3.7) eastsoc
indicates whether the respondent spent the majority of his or her formative years in Eastern Germany. If the respondent spent the same number of years in Eastern and Western Germany, more weight is given to the current location of the respondent. This variable is non-missing, for native East or West Germans born before 1990 and older than the minimum age of socialization (agemin
).
(3.8) eastsocall
adds other categories to eastsoc
for younger and non-native citizens. In the end, the variable has 5 categories:
You can save the function in the R environment.
get_east_west_var <- function(essdata, agemin = 15, agemax = 25) {
# (1) Define the maximum number of years of early political socialization
# (by default: 11)
rangemax <- agemax - agemin + 1
# (2) Look for the last survey year
yrsurveymax <- max(essdata[["inwyye"]])
# (3) Generate new variables
essdata <- essdata %>%
mutate(
# (3.1) Respondent interviewed in Eastern Germany? yes (1) / no (0)
eastintv = case_when(
intewde == 1 ~ 1,
intewde == 2 ~ 0),
# (3.2) Lived in East Germany before 1990? yes (1) / no (0)
eastbefore1990 = case_when(
# Lived in East Germany / East Berlin
wherebefore1990 == 1 ~ 1,
# Lived in West Germany / West Berlin
wherebefore1990 == 2 ~ 0),
# (3.3) Age when moved to East Germany
agemovedeast = case_when(
yrmovedeast <= yrsurveymax & (yrmovedeast - yrbrn) > 0
~ yrmovedeast - yrbrn),
# (3.4) Age when moved to West Germany
agemovedwest = case_when(
yrmovedwest <= yrsurveymax & (yrmovedwest - yrbrn) > 0
~ yrmovedwest - yrbrn),
# (3.5) Total years of early political socialization [0-rangemax]
soctotyears = case_when(
agea > agemax ~ rangemax,
agea >= agemin & agea <= agemax ~ agea - agemin,
!is.na(agea) ~ 0),
# (3.6) Years of political socialization in Eastern Germany [0-rangemax]
socyearseast = case_when(
# Case 1: Lived in the GDR before 1990, still living in Eastern Germany
agea > agemax & eastintv == 1 & eastbefore1990 == 1 ~ rangemax,
agea >= agemin & agea <= agemax & eastintv == 1 & eastbefore1990 == 1
~ agea - agemin,
# Case 2: Lived in the GDR before 1990, moved to Western Germany
agemovedwest < agemin ~ 0,
agemovedwest >= agemin & agemovedwest <= agemax ~ agemovedwest - agemin,
agemovedwest > agemax ~ rangemax,
# Case 3: Lived in West Germany before 1990, still living in West Germ.
eastintv == 0 & eastbefore1990 == 0 ~ 0,
# Case 4: Lived in West Germany before 1990, moved to Eastern Germany
agemovedeast < agemin & agea > agemax ~ rangemax,
agemovedeast < agemin & agea >= agemin & agea <= agemax ~ agea - agemin,
agemovedeast >= agemin & agemovedeast <= agemax &
agea > agemax ~ rangemax - (agemovedeast - agemin),
agemovedeast >= agemin & agemovedeast <= agemax & agea >= agemin &
agea <= agemax ~ agea - agemovedeast,
agemovedeast > agemax ~ 0
),
# (3.7) Lived most formative years in Eastern Germany? yes (1) / no (0)
eastsoc = case_when(
soctotyears != 0 & (socyearseast / soctotyears) < 0.5 ~ 0, # West German
soctotyears != 0 & (socyearseast / soctotyears) > 0.5 ~ 1, # East German
soctotyears != 0 & (socyearseast / soctotyears) == 0.5 ~ eastintv),
# (3.8) Add other categories to eastsoc for younger and non-native citizens
eastsocall = case_when(
eastsoc == 0 ~ 1, # West German
eastsoc == 1 ~ 2, # East German
agea <= agemin & !is.na(eastbefore1990) ~ 3, # born before 1990, but too
# young for socialization
wherebefore1990 == 6 ~ 4, # born after 1990
wherebefore1990 == 3 ~ 5) # non-native
)
}
Finally, we apply the get_east_west_var
function to the data we previously merged and save a new dataset, finaldata
.
finaldata <- get_east_west_var(ess8merged)
Let’s look at the data a bit more. The two commands below list observations of respondents who moved from Eastern to Western Germany and vice versa.
finaldata %>%
select(eastbefore1990, eastintv, agea, agemovedwest,
socyearseast, eastsoc) %>%
filter(!is.na(agemovedwest))
## # A tibble: 67 x 6
## eastbefore1990 eastintv agea agemovedwest socyearseast eastsoc
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 0 40 17 2 0
## 2 1 0 70 65 11 1
## 3 1 0 68 49 11 1
## 4 1 0 34 7 0 0
## 5 1 0 52 38 11 1
## 6 1 0 49 24 9 1
## 7 1 0 38 12 0 0
## 8 1 0 49 24 9 1
## 9 1 0 31 5 0 0
## 10 1 0 41 16 1 0
## # ... with 57 more rows
finaldata %>%
select(eastbefore1990, eastintv, agea, agemovedeast,
socyearseast, eastsoc) %>%
filter(!is.na(agemovedeast))
## # A tibble: 49 x 6
## eastbefore1990 eastintv agea agemovedeast socyearseast eastsoc
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 1 43 22 4 0
## 2 0 1 47 38 0 0
## 3 0 1 56 32 0 0
## 4 0 1 34 31 0 0
## 5 0 1 32 31 0 0
## 6 0 1 55 37 0 0
## 7 0 1 33 20 6 1
## 8 0 1 59 54 0 0
## 9 0 1 44 30 0 0
## 10 0 1 39 27 0 0
## # ... with 39 more rows
The script classified respondents as East or West Germans correctly by taking into account the age at which they moved across regions.
We finish this demonstration by plotting two examples of east-west differences. We start by converting the variable eastsoc
to a factor variable named eastsocfac
. This will help with labelling.
finaldata <- finaldata %>%
mutate(eastsocfac = factor(eastsoc,
levels = c(0,1),
labels = c("West German", "East German")))
In the first graph below, we see the relation between age and ideology (a left right scale going from 0, extreme left, to 10, extreme right). From the graph, it is clear that older East Germans (aged 60 to 80) are more left-leaning than the West Germans of the same age. This could be a legacy of the GDR.
ggplot(data = filter(finaldata, !is.na(agea) & !is.na(lrscale)
& !is.na(eastsocfac))) +
geom_smooth(mapping = aes(x = agea,
y = lrscale,
color = eastsocfac),
method = "loess") +
theme(legend.title=element_blank()) +
labs(x = "Age of respondent",
y = "Left–right self-positioning [0-10]",
caption = "(Based on local polynomial regressions. Source: ESS 2016)")
In the second graph, we plot the number of hours of paid work done by female respondents as a function of age. We see that East German women are more active on the labor market than West German women. Again, this could be a legacy of the different family models promoted during the Cold War in East and West Germany.
ggplot(data = filter(finaldata, !is.na(agea) & !is.na(wkhtot)
& !is.na(eastsocfac) & gndr==2
& agea >= 18 & agea <= 67)) +
geom_smooth(mapping = aes(x = agea,
y = wkhtot,
color = eastsocfac),
method = "loess") +
theme(legend.title=element_blank()) +
labs(x = "Age of respondent (female respondents only)",
y = "Total hours normally worked per week in main job",
caption = "(Based on local polynomial regressions. Source: ESS 2016)")
To conclude, the procedure introduced in this vignette offers a more consistent way of categorizing East and West Germans in the ESS and opens many possibilities of research on the legacies of the Cold War division in Germany.