Final Report

Due by 11:59 PM on Monday, April 19, 2021
  • 04/19/2021: Each group submits a zipped folder to Canvas containing all of your analyses, scripts, and inputs/data in the appropriate directory structure. I should be able to run everything after unzipping the project folder.

Final Report Requirements

The entire report may be no more than 10 pages (12 point font, 1 inch margins) not counting appendices. The References and any other Appendices do not count toward your page limits.

The project report must be formatted using R Markdown and knitted directly to PDF.

The due date for the report is the start of your final exam period. It will be a by-name group assignment so one person can submit on behalf of the group.

Your report should be well written. Your goal should be to tell a story with the data (“we thought we would see A, we actually observed B, and we think we saw this because of C.").

Your report should be reproducible and well-organized.

No R code should be be included in the report, unless it is really cool R code or can help elucidate your process.

Elements of the Final Report

Your final report must include all of the following elements:

  1. Title Page with Executive Summary:
  • include the title of the project, the type of analysis (research or application), the date, and the names and course of all group members.
  • Include an Executive Summary of no more than 1/2 page summarizing the context, the primary questions of interest, the data source, the overall approach, and significant findings and/or recommendations.
  1. Introduction: provide context for why and how you are doing the analysis.
  • What is the motivation for the analysis and what are the overarching research/business questions of interest?
  • What is the data source, the general nature of the variables, and the time period of the data.
  • Include a brief literature review and cite peer-reviewed or widely-published articles related to your questions of interest.
    • You may want to use a publication search engine such as Google Scholar.
    • Cite as many articles as individuals in your study group.
  1. Initial Hypotheses: State what variables and relationships your group specifically wants to analyze based on your literature review and questions of interest, before tidying or looking at the data, e.g., is variable X associated with variable Y, or is the association mediated through a third variable.

  2. Data Preparation: Describe any significant challenges in tidying the data and your approach to overcoming the challenges.

  3. Exploratory Data Analysis and Statistical Analysis: Describe your analysis of the initial and/or adjusted hypotheses and the key results and findings

  • Do not just include all your output from R as you will run out of room.
  • Describe your EDA on key variables and relationships using numerical and graphical summaries.
  • Show important graphs and any numerical summaries from statistical analyses you could (but are not required to) have performed
  • Use Captions and/or texts to provide your interpretations
  • If the analysis changed your hypotheses, explain the rationale for change
  • State new hypotheses and any results.
  • Summarize your analysis and results in specific findings for each hypothesis.
  1. Summary:
  • Include a brief discussion of how your findings fit in the overall context of the literature and the research or business situation.
  • Discuss your findings relative to your questions of interest and any recommendations for future work or implementation.
  1. Appendix A: References:
  • List the references cited in your report. Use a standard format for references (such as APA or MLA)
  1. Appendix B (Optional):
  • Any additional R code you want to reference in the main body of the report.
  • There is no need to provide all the code, the data, or all the figures you used in your analysis in the appendix.

Grading Rubric - Report (30 Pts)

Question.Part: Points Topic
Final Report.Overall Format 4. Proper File structure, R Code and R Markdown to be Reproducible
Final Report.Title Page 1. Correct Title Page Elements
Final Report.Executive Summary 2.5 Complete Exec Summary of proper length
Final Report.Introduction 4. Complete Motivation, Data Description,
Final Report.Introduction 1.5 Concise Literature Review with sufficient sources
Final Report.Initial Hypotheses 1. Clear hypotheses about relationships among variables
Final Report.Data Preparation 3. Clear discussion of Data Preparation Challenges and solutions
Final Report.EDA and Statistical Analysis 10. Logical application of appropriate EDA and Statistical methods with proper interpretations
Final Report.Summary 2. Concise and consistent statement of findings related to questions of interest
Final Report.References 1. References Relevant and properly formatted
Final Report.Optional Appendixes . Concise
Total 30

Appendices

Appendix 1: R Markdown Code Chunk Options to Control Display of Code and Chunk Output

  • One of the beauties of R Markdown is you can use Code Chunk options to determine which portions of your analysis get included in a report. For more info see: knitr
  • Options go after the r, in the `{r} separated by a comma and space,
    • e.g., {r, echo = FALSE, eval = TRUE, include = FALSE}

Four options of interest

  • echo = determines if the code will be shown in the output.
    • If FALSE, no code will be shown
  • eval = determines if the code will be evaluated (run) during the production of the output.
    • If FALSE, the chunk will not be run during the knit process so no output will be displayed and it will not be available for later code chunks
  • include = determines whether to include the chunk output in the output document.
    • If FALSE, no output from the chunk will be written into the output document, but the code is still evaluated (assuming eval = TRUE)
    • Plot files are generated if there are any plots in the chunk, so you can manually insert figures later
  • message = : whether to preserve messages.
    • If FALSE, any messages from the code will not show in the output.

Examples

  1. You don’t want to show the code or run it for producing the final document. Use `{r, echo = FALSE, eval = FALSE}
  2. You don’t want to show the code but you want to show the code output in the final document. Use {r, echo = FALSE, eval = TRUE}
  3. You don’t want to show the code but you need to run it to use the results later in the analysis but you don’t want to show the code output in the final document. Use`{r, echo = FALSE, eval = TRUE, include = FALSE}

Structuring Your Code Chunks

  • You can use these tags for R Markdown files producing normal documents or presentations
  • Using them can save you a lot of cutting and pasting which can creating logic errors
  • They also mean you should shape your code chunks to split actions into multiple pieces.
  • That allows you to do your analysis in a logical sequence but not show the initial pieces and then show the final plot or numerical summary.

Appendix 2: Data Sources

Here are some possible sources for data. I will not accept small curated datasets, or those that come from a paper where the data has already been analyzed.

I will accept other sources as long as they are comparably large and complicated. For example, there are many open data portals from the federal governments, states and cities., e.g. (https://ohio.gov/wps/portal/gov/site/government/resources/ohio-data-analytics).