Homework 3

Homework 3#

Homework 3 can be found at this web page or markdown page:

Web page: https://aselshall.github.io/eds/HW/HW3
Markdown page: aselshall/eds

Problem 1#

Each student will have a different solution based on the selected dataset.

Problem 1 can be submitted in separate notebook.

Problem 2#

Problem statement#

Task 1: Plot the maximum concentration of Karenia brevis (cell counts per letter) per week for the whole dataset for each of the regions of Tampa Bay and Charlotte Harbor estuary.

Task 2: FWRI classifies Karenia brevis abundance based on cell counts as described here as follows:

Index	Description	K. brevis abundance	Possible effects (K. brevis only)
0	NOT PRESENT- BACKGROUND	background levels of 1,000 cells or less	no effects anticipated
1	VERY LOW	> 1,000 - 10,000 cells/L	possible respiratory irritation; shellfish harvesting closures when cell abundance equals or exceeds 5,000 cells/L
2	LOW	> 10,000 - 100,000 cells/L	respiratory irritation; shellfish harvesting closures; possible fish kills; probable detection of chlorophyll by satellites at upper range of cell abundance
3	MEDIUM	> 100,000 - 1,000,000 cells/L	respiratory irritation; shellfish harvesting closures; probable fish kills; detection of surface chlorophyll by satellites
4	HIGH	> 1,000,000 cells/L	as above, plus water discoloration

Create new columns Weekly_Index_Tampa and Weekly_Index_Naples, and use the maximum concentration of K. brevis (cells/L) per week to do weekly classifcation of bloom impact per week for the two regions of Tampa Bay and Charlotte Harbor estuary for the whole dataset. For example, if the max concentration in week 1 is 50,000 cells/L in Tampa Bay and 1,500,000 cell/L in Charlotte Harbor estuary, then the first rows in Weekly_Index_Tampa and Weekly_Index_Naples will have the values of 2 and 4, respectively. If the max concentration in a given week is 0 then the index will be 0 and so on.

Create a histogram plot for only index values 1 to 4 for the two regions.

Dataset#

Red tides are caused by Karenia brevis harmful algae blooms. For Karenia brevis cell count data, you can use the current dataset of Physical and biological data collected along the Texas, Mississippi, Alabama, and Florida Gulf coasts in the Gulf of Mexico as part of the Harmful Algal BloomS Observing System from 1953-08-19 to 2023-07-06 (NCEI Accession 0120767). For direct data download, you can use this data link and this data documentation link. Alternatively, FWRI documents Karenia brevis blooms from 1953 to the present. The dataset has more than 200,000 records is updated daily. To request this dataset email: HABdata@MyFWC.com. To learn more about this data, check the FWRI Red Tide Red Tide Current Status.

Study areas#

Conduct your analysis in Tampa Bay and Charlotte Harbor estuary. For Tampa Bay, restrict the Karenia brevis measurements from 27° N to 28° N and 85° W to coast. For Charlotte Harbor estuary, restrict the Karenia brevis measurements from 25.5° N to less than 27° N and 85° W to coast.

Solution#

#As a good coding practice, we add all the libraries that we will use at the beginning of our code

1. Data reading and data wrangling#

# Read a csv file with Pandas

#Display columns labels

# List of column names to focus on

# Filter the DataFrame to include only the selected columns

# Convert the "SAMPLE_DATE" column to datetime format using pd.to_datetime()

# Set the "SAMPLE_DATE" column as the index of your DataFrame

#Display DataFrame

2. Task 1 Plot the maximum concentration#

2.1 Assign region name based on latitude and longitude values.#

Create a new column REGION with default value ‘Other’. Then use a Boolean mask to change ‘Other’ in each row to ‘Tampa Bay’ or ‘Charlotte Harbor’ based on latitude and longitude values.

# Create a new column 'region' with default value 'Other'

# Mask for dicing: Define mask for Tampa Bay region based on latitude and longitude values

# Assign value 'Tampa Bay' to rows matching Tampa Bay mask 

# Mask for dicing: Define mask for Charlotte Harbor estuary region  based on latitude and longitude values

# Assign value 'Charlotte Harbor' to rows matching Charlotte Harbor mask

#Display dataframe

2.2. Select the data for each region#

Create a new DataFrame from for each region: charlotte_harbor_plot_data and tampa_bay_plot_data. Each DataFrame should contain the maximum cellcount per week for the whole period.

#Get rows for charlotte harbor region

#Get rows for tampa bay region

#For these rows, find the maximum cellcount per month for the study period 

#For these rows find the maximum cellcount per month for the study period 

# display data

# display data

2.3 Plot data#

I seeked assistance from ChatGPT 3.5 Turbo to learn how to add legend to a pandas figure with two plots.

# Plot the data with cutomization

2.4 Management question#

Which region experienced more severe red tides in the last 10 years?

Add your analysis here

3. Task 2 Classify red tides blooms#

3.1 Assign bloom index based on cellcount#

Index	Description	K. brevis abundance	Possible effects (K. brevis only)
0	NOT PRESENT- BACKGROUND	background levels of 1,000 cells or less	no effects anticipated
1	VERY LOW	> 1,000 - 10,000 cells/L	possible respiratory irritation; shellfish harvesting closures when cell abundance equals or exceeds 5,000 cells/L
2	LOW	> 10,000 - 100,000 cells/L	respiratory irritation; shellfish harvesting closures; possible fish kills; probable detection of chlorophyll by satellites at upper range of cell abundance
3	MEDIUM	> 100,000 - 1,000,000 cells/L	respiratory irritation; shellfish harvesting closures; probable fish kills; detection of surface chlorophyll by satellites
4	HIGH	> 1,000,000 cells/L	as above, plus water discoloration

# Create a new column 'Bloom_Class' with default value 0

#Dicing to fill-out 'Bloom_Class':

#Display dataframe

3.2. Select the data for each region#

Create a new DataFrame from for each region: charlotte_harbor_hist_data and tampa_bay_hist_data. Each DataFrame should contain the maximum cellcount per week for the whole period.

#Get rows for both regions 

#It is always a good idea to sort your index 

#Select the numeric columns that you want to resample (numeric only)

#Apply resample with max to numeric column

#View data

3.3 Plot histogram#

#Mask to filter out rows with BLOOM_CLASS equal to zero for charlotte_harbor

#Plot data using Pandas plot for charlotte harbor region 

#Mask to filter out rows with BLOOM_CLASS equal to zerotampa bay

#Plot data using Pandas plot for tampa bay region

3.4 Management question#