Final Report
- 04/19/2021: Each group submits a zipped folder to Canvas containing all of your analyses, scripts, and inputs/data in the appropriate directory structure. I should be able to run everything after unzipping the project folder.
Final Report Requirements
The entire report may be no more than 10 pages (12 point font, 1 inch margins) not counting appendices. The References and any other Appendices do not count toward your page limits.
The project report must be formatted using R Markdown and knitted directly to PDF.
The due date for the report is the start of your final exam period. It will be a by-name group assignment so one person can submit on behalf of the group.
Your report should be well written. Your goal should be to tell a story with the data (“we thought we would see A, we actually observed B, and we think we saw this because of C.").
Your report should be reproducible and well-organized.
No R code should be be included in the report, unless it is really cool R code or can help elucidate your process.
Elements of the Final Report
Your final report must include all of the following elements:
- Title Page with Executive Summary:
- include the title of the project, the type of analysis (research or application), the date, and the names and course of all group members.
- Include an Executive Summary of no more than 1/2 page summarizing the context, the primary questions of interest, the data source, the overall approach, and significant findings and/or recommendations.
- Introduction: provide context for why and how you are doing the analysis.
- What is the motivation for the analysis and what are the overarching research/business questions of interest?
- What is the data source, the general nature of the variables, and the time period of the data.
- Include a brief literature review and cite peer-reviewed or widely-published articles related to your questions of interest.
- You may want to use a publication search engine such as Google Scholar.
- Cite as many articles as individuals in your study group.
-
Initial Hypotheses: State what variables and relationships your group specifically wants to analyze based on your literature review and questions of interest, before tidying or looking at the data, e.g., is variable X associated with variable Y, or is the association mediated through a third variable.
-
Data Preparation: Describe any significant challenges in tidying the data and your approach to overcoming the challenges.
-
Exploratory Data Analysis and Statistical Analysis: Describe your analysis of the initial and/or adjusted hypotheses and the key results and findings
- Do not just include all your output from R as you will run out of room.
- Describe your EDA on key variables and relationships using numerical and graphical summaries.
- Show important graphs and any numerical summaries from statistical analyses you could (but are not required to) have performed
- Use Captions and/or texts to provide your interpretations
- If the analysis changed your hypotheses, explain the rationale for change
- State new hypotheses and any results.
- Summarize your analysis and results in specific findings for each hypothesis.
- Summary:
- Include a brief discussion of how your findings fit in the overall context of the literature and the research or business situation.
- Discuss your findings relative to your questions of interest and any recommendations for future work or implementation.
- Appendix A: References:
- List the references cited in your report. Use a standard format for references (such as APA or MLA)
- Appendix B (Optional):
- Any additional R code you want to reference in the main body of the report.
- There is no need to provide all the code, the data, or all the figures you used in your analysis in the appendix.
Grading Rubric - Report (30 Pts)
Question.Part: | Points | Topic |
---|---|---|
Final Report.Overall Format | 4. | Proper File structure, R Code and R Markdown to be Reproducible |
Final Report.Title Page | 1. | Correct Title Page Elements |
Final Report.Executive Summary | 2.5 | Complete Exec Summary of proper length |
Final Report.Introduction | 4. | Complete Motivation, Data Description, |
Final Report.Introduction | 1.5 | Concise Literature Review with sufficient sources |
Final Report.Initial Hypotheses | 1. | Clear hypotheses about relationships among variables |
Final Report.Data Preparation | 3. | Clear discussion of Data Preparation Challenges and solutions |
Final Report.EDA and Statistical Analysis | 10. | Logical application of appropriate EDA and Statistical methods with proper interpretations |
Final Report.Summary | 2. | Concise and consistent statement of findings related to questions of interest |
Final Report.References | 1. | References Relevant and properly formatted |
Final Report.Optional Appendixes | . | Concise |
Total | 30 |
Appendices
Appendix 1: R Markdown Code Chunk Options to Control Display of Code and Chunk Output
- One of the beauties of R Markdown is you can use Code Chunk options to determine which portions of your analysis get included in a report. For more info see: knitr
- Options go after the r, in the `{r} separated by a comma and space,
- e.g.,
{r, echo = FALSE, eval = TRUE, include = FALSE}
- e.g.,
Four options of interest
echo =
determines if the code will be shown in the output.- If FALSE, no code will be shown
eval =
determines if the code will be evaluated (run) during the production of the output.- If FALSE, the chunk will not be run during the knit process so no output will be displayed and it will not be available for later code chunks
include =
determines whether to include the chunk output in the output document.- If FALSE, no output from the chunk will be written into the output document, but the code is still evaluated (assuming
eval = TRUE
) - Plot files are generated if there are any plots in the chunk, so you can manually insert figures later
- If FALSE, no output from the chunk will be written into the output document, but the code is still evaluated (assuming
message =
: whether to preserve messages.- If FALSE, any messages from the code will not show in the output.
Examples
- You don’t want to show the code or run it for producing the final document. Use `{r, echo = FALSE, eval = FALSE}
- You don’t want to show the code but you want to show the code output in the final document. Use
{r, echo = FALSE, eval = TRUE}
- You don’t want to show the code but you need to run it to use the results later in the analysis but you don’t want to show the code output in the final document. Use`{r, echo = FALSE, eval = TRUE, include = FALSE}
Structuring Your Code Chunks
- You can use these tags for R Markdown files producing normal documents or presentations
- Using them can save you a lot of cutting and pasting which can creating logic errors
- They also mean you should shape your code chunks to split actions into multiple pieces.
- That allows you to do your analysis in a logical sequence but not show the initial pieces and then show the final plot or numerical summary.
Appendix 2: Data Sources
Here are some possible sources for data. I will not accept small curated datasets, or those that come from a paper where the data has already been analyzed.
I will accept other sources as long as they are comparably large and complicated. For example, there are many open data portals from the federal governments, states and cities., e.g. (https://ohio.gov/wps/portal/gov/site/government/resources/ohio-data-analytics).
-
Data quest Blog has pointers to many data sets
-
Federal bridge inspections from the Federal Highway Admin:
-
Drinking water violations from the EPA:
https://www.epa.gov/ground-water-and-drinking-water/drinking-water-data-and-reports
-
Contracts or grants from USA Spending:
-
DC Crime incident data from the DC police:
-
SBA disaster loans from the Small Business Administration:
https://www.sba.gov/offices/headquarters/oda/resources/1407821
-
Fatal accidents from the NHTSA:
https://www.nhtsa.gov/research-data/fatality-analysis-reporting-system-fars
-
Baltimore crime data from the Baltimore police:
-
Aircraft animal strikes from the FAA:
-
Bank health data from the FDIC (also see banktracker at IRW):
-
USGS water quality data from the USGS:
-
Prison data:
-
College Scorecard:
-
Reporting Carrier On-Time Performance (1987-present) from the Bureau of Transportation Statistics:
https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236
-
The General Social Survey from the non-partisan and objective research organization at the University of Chicago (NORC):