Lab 6

Due by 11:59 PM on Friday, March 5, 2021

Tidy Data

Data comes in many formats but R prefers just one: tidy data.

A data set is tidy if and only if:

Every variable is in its own column
Every observation is in its own row
Every value is in its own cell (which follows from the above)

Lab Questions

Download the lotr_untidy.csv data and save to your data folder.

About the data:

Publication: J.R.R. Tolkien. The Lord of the Rings. Ballantine Books, New York. Copyright 1954-1974. Volume I. The Fellowship of the Ring. Volume II. The Two Towers. Volume III. The Return of the King.

Downloaded from: jennybc on github

Variables:
- Film: The Lord of The Rings Film
- Race: The Race of the Characters
- Female: word spoken by females in LOTR
- Male: word spoken by males in LOTR
- FOTR_ROTK_TTT: Words spoken in each book of Fellowship of the Ring, The Return of the King, and The Two Towers.

Load both of the files. Explain what makes each data frame untidy.
Using the skills learned in class, make lotr_untidy1 tidy.
What’s the total number of words spoken by male hobbits?
Does a certain Race dominate (meaning, speak the most words) a movie? Does the dominant Race differ across the movies? Is there a way you can visualize this?
Using the skills learned in class, make lotr_untidy2 tidy.

Last updated on March 4, 2021

Edit this page