Lab 6
Tidy Data
Data comes in many formats but R prefers just one: tidy data.
A data set is tidy if and only if:
- Every variable is in its own column
- Every observation is in its own row
- Every value is in its own cell (which follows from the above)
Lab Questions
- Download the lotr_untidy.csv data and save to your data folder.
About the data:
Publication: J.R.R. Tolkien. The Lord of the Rings. Ballantine Books, New York. Copyright 1954-1974. Volume I. The Fellowship of the Ring. Volume II. The Two Towers. Volume III. The Return of the King.
Downloaded from: jennybc on github
Variables:
- Film: The Lord of The Rings Film
- Race: The Race of the Characters
- Female: word spoken by females in LOTR
- Male: word spoken by males in LOTR
- FOTR_ROTK_TTT: Words spoken in each book of Fellowship of the Ring, The Return of the King, and The Two Towers.
-
Load both of the files. Explain what makes each data frame untidy.
-
Using the skills learned in class, make lotr_untidy1 tidy.
-
What’s the total number of words spoken by male hobbits?
-
Does a certain Race dominate (meaning, speak the most words) a movie? Does the dominant Race differ across the movies? Is there a way you can visualize this?
-
Using the skills learned in class, make lotr_untidy2 tidy.