---
title: "PA 10: COVID-19 Infections and California"
author: "Add Names Here"
format: html
embed-resources: true
code-tools: true
toc: true
editor: source
execute: 
  error: false
  echo: true
---

```{r}
#| label: setup
#| include: false
library(tidyverse)
library(scales) #can make nicer axis labels for graphs
```


***This task is complex. It requires many different types of abilities. Everyone will be good at some of these abilities but nobody will be good at all of them. In order to solve this puzzle, you will need to use the skills of each member of your group.***


<!-- The partner **had** the most relaxing spring break plans (this is of course relative) will start as the Computer (typing and listening to instructions from the Coder).  -->

- Starting Computer:  

- Starting Coder:    


## Goals for the Activity  

- Use the `dplyr` verbs to transform your data  
- Use other `tidyverse` functions to prepare and plot the data  
- Use `lubridate` to deal with dates and `stringr` with text.


**THROUGHOUT THE Activity** be sure to follow the Style Guide by doing the following:  

- load the appropriate packages at the beginning of the Quarto document  
- use proper spacing  
- *add labels* to all code chunks  
- comment at least once in each code chunk to describe why you made your coding decisions  
- add appropriate labels to all graphic axes  


## Data Description - United States COVID-19 Cases and Deaths

Starting in January 2020, the New York Times started reporting on COVID-19 infections in the United States and eventually created a [Githhub Repository](https://github.com/nytimes/covid-19-data/) of the data they used and reported on in their stories (a field called "Data Journalism"). They ended their data collection in March 2023 and switched to just using data from national reporting systems. 

We will use their data to evaluate COVID-19 and how it varied across different states. Here are the NY Times data on cumulative cases by date and states (including territories).

```{r}
#| label: read-data
#| message: false
#| eval: false
# data comes from NY Times GitHub Repository - ends March 2023
# not evaluating this code chunk because we will clean the data and use the clean data set
cases <- read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv")
```

Note that both `cases` and `deaths` are cumulative by date and we will want to observe just the unique cases per day so we can get various estimates of totals across the states.  

If we want to extract the daily cases we can use the following code which will calculate the difference in (cumulative) cases for one day minus the previous day using the `diff()` function from `base` R.  

<!-- Your team should create a data subfolder in your project to store the created data and to upload the population data to as well. -->


```{r}
#| label: calculate-new-daily-cases
#| eval: false
# calculates the unique new daily cases and deaths for each state
# eval is false since we created a new data set we will use going forward
cases |> 
  group_by(state) |> 
  arrange(state, date) |> 
  mutate(cases_daily = c(cases[1], diff(cases)),
         deaths_daily = c(deaths[1], diff(deaths))) |> 
  ungroup() |> 
  write_csv("data/covid-cases-us.csv")
```

Now that are data is in the format we need (sort of), we will start our analysis.  

## California COVID-19 Cases

First, we need to read in our clean data.

```{r}
#| label: read-clean-covid-cases-solution
#| eval: true
cases <- read_csv("data/covid-cases-us.csv", show_col_types = FALSE)
```

<!-- Swap roles -- Computer becomes Coder, Coder becomes Computer! -->


### California Monthly COVID-19 Cases from January 2020 - March 2023

We want to create a graph that plots the number of cases per month in California.  We want the x-axis of the plot to have the month/year (e.g. Mar '21), and the number of cases on the y-axis. The months/years should be in order chronologically.

Below is some code that can help you get started with the process of working with the data. Use the functions from `lubridate` to help you extract out the right information. You will also use functions learned from other packages such as `dplyr`. You will want a variable that gives you month and year as "Jun '20" and a date type variable that you can use to easily reorder your month and year variable within your graph (otherwise R will default to alpha-numeric ordering).  Double check that it looks like the data provided in the instructions.

```{r}
#| label: calculate-totals-month
ca_only_month <- cases |> 
 # code to filter out just California  |> 
  ______(date ____(date), #make sure date is treated like a year month day format
         month = ______(date, label = , abbr = ), #extract out month in abbreviated label form, e.g. Jan.
         year = ______(date), #extract out the year
         y_m = _______(date, unit = "month"), #find a function that makes all the dates appear as the first day of the month, ie the floor of the month

#Swap roles -- Computer becomes Coder, Coder becomes Computer! 
                  
         yr = _________(year, pattern = "20*" ,replacement = "'")) |>  #replace the 20 in each year with a ' so it reads as '21
  ________(col = "month_year", month, yr, sep = " ") |>  #create a variable called month_year that joins the abbreviated month with the shortened year, or Apr '21
  # calculate the total cases per month, retaining both the month_year variable and the y_m variable in addition to a new total_cases calculation |> 
  ungroup()
```



<!-- Swap roles -- Computer becomes Coder, Coder becomes Computer! -->

### Minimal Goal Graph

At minimum, try to recreate the following graph using your data from above. Here are some useful functions to consider.

- `fct_reorder()` to use the `y_m` variable which is automatically ordered by date to reordered the character variable `month_year`.  
- `label_comma()` from the `scales` package for changing your y-axis.  
- `theme()` with additional arguments for `axis.text.x` to adjust your label orientation (`angle` and `vjust`).  

```{r}
#| label: covid-graph-cali-solution-minor

```

#### Canvas Quiz Question 1

**Which Month in California had the most COVID-19 cases between January 2020 and March 2023?**

>Insert Answer Here


#### Canvas Quiz Question 2

**Which month in 2021 had the fewest number of cases in California?**

>Insert Answer Here


<!-- Swap roles -- Computer becomes Coder, Coder becomes Computer! -->

### Major Goal Graph (if time)

Want a challenge? Try recreating the graph in the instructions online.

```{r}
#| label: covid-graph-cali-major
```



<!-- Swap roles -- Computer becomes Coder, Coder becomes Computer! -->
<!-- If you didn't do the Major Goal Graph, then do not switch -->


### What Did California COVID-19 Cases Look Like in 2021

Create a graph that provides information about the number of cases in California in 2021 only. You can use the complete data set information originally provided, but your graph should only represent some aspect of COVID-19 in 2021.

Be sure to follow all guidelines for creating good graphics. 

You will submit that graph as png file in your Canvas quiz.

```{r}
#| label: ca-2021-covid

```


<!-- Swap roles -- Computer becomes Coder, Coder becomes Computer! -->
<!-- If you don't have time to do the last part then do not switch -->


### Extra Challenge: Create your own graph (if time)

Using the COVID-19 data, create your graph to answer you own research question. You will submit that graph as png file in your Canvas quiz. Include your research question as the subtitle of the graph.  
