import os
import pandas as pd
Load data files#
All data were downloaded from weather.gc.ca
using the web API. See download_can_weather_data.py
for details. I could not quickly find a list of valid station IDs, so I instead iterated over what seemed to be valid station IDs and downloaded all valid files. On inspection, most fields are empty for most of the data files. However, daily temperature and precipitation values are well-populated for some weather stations since 1990-01-01. Some details about these stations’ data files are stored in station_metadata.csv
and the weather data itself is stored in weather_data.csv
.
The following code can be used to replicate the process to build the two file station_metadata.csv
and weather_data.csv
in case updates or additional data from the raw files are needed.
data_dir = "/Users/ethan/climate_data"
files = list(os.walk(data_dir))[0][2]
station_list = pd.read_csv("./station_metadata.csv")['STATION_NAME'].tolist()
Iterate over all of the files in the raw data directory, but only retain data with STATION_NAME
belonging to the those in the station_list
. Here I’m cheating a bit because I already know which station names have the data I’m looking for. Set station_list = None
to enable iteration over all data files in the data directory.
all_station_dfs = {}
for f in files:
try:
df = pd.read_csv(f"{data_dir}/{f}")
if station_list is not None:
if df.STATION_NAME[0] in station_list:
df.insert(loc=3, column='FILENAME', value=f)
df = df.set_index("LOCAL_DATE")
df.index = pd.DatetimeIndex(df.index)
df = df.loc[(df.index >= '1990-01-01') & (df.index <= '2022-01-01')]
if len(df) > 0:
all_station_dfs[f] = df
else:
df.insert(loc=3, column='FILENAME', value=f)
df = df.set_index("LOCAL_DATE")
df.index = pd.DatetimeIndex(df.index)
df = df.loc[(df.index >='1990-01-01') & (df.index <= '2022-01-01')]
if len(df) > 0:
all_station_dfs[f] = df
all_station_dfs[f] = df
except Exception as e:
print(f"Error processing {f}")
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (33,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (23) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,15,29,31) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (13,15,17,19,21,23) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (19) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (15) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (25,27,33,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (17,19,21) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,29,31) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (21) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (17,19) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (15,33,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (13) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,29,31) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,15,19,29,31) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (17,19,21,23) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (23,33,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,15,23,29,31) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,15,17,23,29,31,33,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (23,25,27,33,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
Error processing .DS_Store
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,23,29,31) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,15,17,19,21,23,29,31) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,15,23,29,31,33,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,15,17,19,23,29,31) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (27) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (25,27) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,15,25,27,29,31,33,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (19,21,23) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,15,29,31,33,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,15,21,29,31) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,15,17,19,21,29,31) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (13,23,33,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (25) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,15,21,23,25,27,29,31) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (13,23) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (13,17,19,21) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,15,23,29,31) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (13,21) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (19,21,23,25,27,33,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,15,27,29,31) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,15,29,31) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (23,25,33,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (13,15) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (15,23) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,15,21,23,29,31) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,15,25,27,29,31,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,21,29,31) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (15,19) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (27,33,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (13,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (23,25,27) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (21,23) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,15,25,27,29,31) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
Create dataframe containing station metadata. For this application, we’re mainly interested in assessing station locations and data coverage.
all_station_metadata = []
for station_name, df in all_station_dfs.items():
station_data = {
'x': df.x[0],
'y': df.y[0],
'STATION_NAME': df.STATION_NAME[0],
'FILENAME': df.FILENAME[0],
'PROVINCE_CODE': df.PROVINCE_CODE[0],
'min_date': df.index.min(),
'max_date': df.index.max(),
'mean_temp_coverage': 1 - (df.MEAN_TEMPERATURE.isna().sum() / len(df)),
'max_temp_coverage': 1 - (df.MAX_TEMPERATURE.isna().sum() / len(df)),
'total_precipitation_coverage': 1 - (df.TOTAL_PRECIPITATION.isna().sum() / len(df)),
'direction_max_gust_coverage': 1 - (df.DIRECTION_MAX_GUST.isna().sum() / len(df)),
'speed_max_gust_coverage': 1 - (df.SPEED_MAX_GUST.isna().sum() / len(df)),
'min_rel_humiudity_coverage': 1 - (df.MIN_REL_HUMIDITY.isna().sum() / len(df)),
'max_rel_humiudity_coverage': 1 - (df.MAX_REL_HUMIDITY.isna().sum() / len(df)),
}
all_station_metadata.append(pd.Series(station_data))
metadata_df = pd.DataFrame(all_station_metadata)
metadata_df
x | y | STATION_NAME | FILENAME | PROVINCE_CODE | min_date | max_date | mean_temp_coverage | max_temp_coverage | total_precipitation_coverage | direction_max_gust_coverage | speed_max_gust_coverage | min_rel_humiudity_coverage | max_rel_humiudity_coverage | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -75.666667 | 44.600000 | BROCKVILLE PCC | station_4236_data.csv | ON | 1990-01-01 | 2022-01-01 | 0.998624 | 0.999570 | 0.999398 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
1 | -124.500278 | 49.834169 | POWELL RIVER A | station_327_data.csv | BC | 1990-01-01 | 2022-01-01 | 0.982635 | 0.982981 | 0.999914 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
2 | -97.166667 | 50.116667 | STONY MOUNTAIN | station_3678_data.csv | MB | 1990-01-01 | 2022-01-01 | 0.999825 | 1.000000 | 0.999912 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
3 | -82.933333 | 42.333333 | WINDSOR RIVERSIDE | station_4715_data.csv | ON | 1993-12-01 | 2022-01-01 | 1.000000 | 1.000000 | 0.999902 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
4 | -99.416667 | 50.733333 | MCCREARY | station_3822_data.csv | MB | 1990-01-01 | 1994-08-31 | 0.000000 | 0.000000 | 0.989286 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
106 | -93.966667 | 48.633333 | BARWICK | station_3932_data.csv | ON | 1990-01-01 | 2022-01-01 | 0.998912 | 0.999275 | 0.999637 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
107 | -110.283333 | 54.416667 | COLD LAKE A | station_2832_data.csv | AB | 1990-01-01 | 2022-01-01 | 0.999914 | 0.999914 | 0.999914 | 0.908731 | 0.927346 | 0.999052 | 0.999224 |
108 | -125.445556 | 50.333194 | CHATHAM POINT | station_153_data.csv | BC | 1990-01-01 | 2022-01-01 | 0.999359 | 0.999680 | 0.999786 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
109 | -64.916667 | 44.983333 | GREENWOOD A | station_6354_data.csv | NS | 1990-01-01 | 2022-01-01 | 0.999743 | 0.999743 | 0.999743 | 0.965780 | 0.966293 | 0.998460 | 0.998802 |
110 | -111.848897 | 50.555297 | BROOKS | station_2180_data.csv | AB | 1990-01-01 | 2022-01-01 | 0.993552 | 0.993937 | 0.974112 | 0.494274 | 0.494274 | 0.344240 | 0.344048 |
111 rows × 14 columns
Only select station data with good coverage for key features between 1990 and the present day.
qualifying_stations = metadata_df.loc[(metadata_df.mean_temp_coverage >= 0.95) & (metadata_df.total_precipitation_coverage >= 0.95)]
qualifying_stations.to_csv('station_metadata.csv')
qualifying_stations
x | y | STATION_NAME | FILENAME | PROVINCE_CODE | min_date | max_date | mean_temp_coverage | max_temp_coverage | total_precipitation_coverage | direction_max_gust_coverage | speed_max_gust_coverage | min_rel_humiudity_coverage | max_rel_humiudity_coverage | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -75.666667 | 44.600000 | BROCKVILLE PCC | station_4236_data.csv | ON | 1990-01-01 | 2022-01-01 | 0.998624 | 0.999570 | 0.999398 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
1 | -124.500278 | 49.834169 | POWELL RIVER A | station_327_data.csv | BC | 1990-01-01 | 2022-01-01 | 0.982635 | 0.982981 | 0.999914 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
2 | -97.166667 | 50.116667 | STONY MOUNTAIN | station_3678_data.csv | MB | 1990-01-01 | 2022-01-01 | 0.999825 | 1.000000 | 0.999912 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
3 | -82.933333 | 42.333333 | WINDSOR RIVERSIDE | station_4715_data.csv | ON | 1993-12-01 | 2022-01-01 | 1.000000 | 1.000000 | 0.999902 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
5 | -77.533333 | 44.116667 | TRENTON A | station_5126_data.csv | ON | 1990-01-01 | 2022-01-01 | 0.953286 | 0.953286 | 0.953114 | 0.889711 | 0.899948 | 0.926445 | 0.926531 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
106 | -93.966667 | 48.633333 | BARWICK | station_3932_data.csv | ON | 1990-01-01 | 2022-01-01 | 0.998912 | 0.999275 | 0.999637 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
107 | -110.283333 | 54.416667 | COLD LAKE A | station_2832_data.csv | AB | 1990-01-01 | 2022-01-01 | 0.999914 | 0.999914 | 0.999914 | 0.908731 | 0.927346 | 0.999052 | 0.999224 |
108 | -125.445556 | 50.333194 | CHATHAM POINT | station_153_data.csv | BC | 1990-01-01 | 2022-01-01 | 0.999359 | 0.999680 | 0.999786 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
109 | -64.916667 | 44.983333 | GREENWOOD A | station_6354_data.csv | NS | 1990-01-01 | 2022-01-01 | 0.999743 | 0.999743 | 0.999743 | 0.965780 | 0.966293 | 0.998460 | 0.998802 |
110 | -111.848897 | 50.555297 | BROOKS | station_2180_data.csv | AB | 1990-01-01 | 2022-01-01 | 0.993552 | 0.993937 | 0.974112 | 0.494274 | 0.494274 | 0.344240 | 0.344048 |
109 rows × 14 columns
Visualize data coverage for key features#
qualifying_stations.total_precipitation_coverage.plot(kind='hist', bins=20)
<matplotlib.axes._subplots.AxesSubplot at 0x7f7bfc8aaa10>
![../../_images/4094fa1f1753c8c70102398b8d412151bb28ef0a76651711a863e0b9f8a329dd.png](../../_images/4094fa1f1753c8c70102398b8d412151bb28ef0a76651711a863e0b9f8a329dd.png)
qualifying_stations.mean_temp_coverage.plot(kind='hist', bins=20)
<matplotlib.axes._subplots.AxesSubplot at 0x7f7bfdfa1350>
![../../_images/f1947f808ec368b6d3db24596446b899f96715d6c139f9a5b6be10280e283f96.png](../../_images/f1947f808ec368b6d3db24596446b899f96715d6c139f9a5b6be10280e283f96.png)
Build main data frame#
For this use case, we’re only selecting temperature and total precipitation features since there is very good coverage across the selected stations. A small number of stations have good coverage for additional features, like wind gust speed and direction, but the point of this dataset is to have consistent features across stations.
all_dfs_dict = {}
for filename in qualifying_stations.FILENAME:
try:
df = pd.read_csv(f"{data_dir}/{filename}", index_col=5)
df.index = pd.DatetimeIndex(df.index)
df = df.loc[(df.index >='1990-01-01') & (df.index <= '2022-01-01')]
all_dfs_dict[df['STATION_NAME'][0]] = df
except Exception as e:
print(f"Error processing {f}")
keep_columns = [
'MEAN_TEMPERATURE',
'MIN_TEMPERATURE',
'MAX_TEMPERATURE',
'TOTAL_PRECIPITATION'
]
station_dfs = {}
for station_name, df in all_dfs_dict.items():
station_dfs[station_name] = df[keep_columns]
data_df = pd.concat(station_dfs, axis=1)
data_df.columns = data_df.columns.to_flat_index()
data_df.columns = [f"{col[0]};{col[1]}" for col in data_df.columns]
data_df.to_csv('./weather_data.csv')
data_df
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,15,29,31) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (15) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (23) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (33,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (23,25,27,33,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (13) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (25,27,33,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (25,27) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (19,21,23) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,15,29,31,33,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (11,13,15,23,29,31) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (19,21,23,25,27,33,35) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
/usr/local/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (13,23) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
BROCKVILLE PCC;MEAN_TEMPERATURE | BROCKVILLE PCC;MIN_TEMPERATURE | BROCKVILLE PCC;MAX_TEMPERATURE | BROCKVILLE PCC;TOTAL_PRECIPITATION | POWELL RIVER A;MEAN_TEMPERATURE | POWELL RIVER A;MIN_TEMPERATURE | POWELL RIVER A;MAX_TEMPERATURE | POWELL RIVER A;TOTAL_PRECIPITATION | STONY MOUNTAIN;MEAN_TEMPERATURE | STONY MOUNTAIN;MIN_TEMPERATURE | ... | CHATHAM POINT;MAX_TEMPERATURE | CHATHAM POINT;TOTAL_PRECIPITATION | GREENWOOD A;MEAN_TEMPERATURE | GREENWOOD A;MIN_TEMPERATURE | GREENWOOD A;MAX_TEMPERATURE | GREENWOOD A;TOTAL_PRECIPITATION | BROOKS;MEAN_TEMPERATURE | BROOKS;MIN_TEMPERATURE | BROOKS;MAX_TEMPERATURE | BROOKS;TOTAL_PRECIPITATION | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LOCAL_DATE | |||||||||||||||||||||
1990-01-01 | -5.0 | -10.0 | 0.0 | 0.0 | 3.6 | 2.0 | 5.2 | 0.0 | -13.0 | -25.0 | ... | 3.5 | 1.2 | 2.2 | -5.0 | 9.3 | 8.4 | 0.5 | -2.2 | 3.1 | 0.0 |
1990-01-02 | -4.0 | -9.0 | 1.0 | 0.0 | 0.9 | -2.0 | 3.8 | 1.8 | -10.5 | -13.0 | ... | 4.2 | 13.2 | -3.3 | -8.0 | 1.4 | 0.0 | -8.6 | -15.2 | -1.9 | 0.0 |
1990-01-03 | 0.5 | -4.0 | 5.0 | 0.0 | 3.7 | 1.9 | 5.5 | 3.2 | -15.0 | -18.0 | ... | 5.5 | 20.2 | -1.7 | -6.4 | 3.1 | 0.0 | -7.1 | -15.4 | 1.2 | 0.0 |
1990-01-04 | 4.0 | 2.0 | 6.0 | 2.4 | 5.9 | 3.8 | 8.0 | 1.0 | -21.0 | -25.0 | ... | 6.9 | 11.8 | -0.3 | -7.1 | 6.6 | 3.0 | -12.3 | -15.1 | -9.4 | 0.0 |
1990-01-05 | -3.0 | -4.0 | -2.0 | 0.0 | 5.7 | 2.2 | 9.2 | 14.0 | -18.5 | -25.0 | ... | 8.5 | 34.2 | -0.7 | -8.0 | 6.6 | 0.0 | -10.3 | -15.1 | -5.4 | 0.7 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-12-28 | -3.0 | -9.0 | 3.0 | 0.0 | -5.0 | -7.0 | -3.0 | 0.6 | -22.0 | -26.5 | ... | -3.0 | 0.0 | -2.4 | -7.2 | 2.5 | 4.2 | -27.3 | -37.3 | -17.4 | 0.2 |
2021-12-29 | -2.5 | -4.0 | -1.0 | 0.0 | -7.5 | -12.0 | -3.0 | 8.0 | -30.0 | -35.0 | ... | -1.5 | 5.0 | 0.2 | -1.5 | 1.8 | 0.4 | -24.0 | -27.7 | -20.2 | 0.0 |
2021-12-30 | -1.0 | -3.0 | 1.0 | 0.0 | -3.3 | -5.0 | -1.5 | 0.0 | -29.5 | -36.0 | ... | 0.0 | 0.0 | 0.4 | -1.5 | 2.2 | 0.0 | -24.9 | -31.3 | -18.6 | 0.2 |
2021-12-31 | NaN | NaN | NaN | NaN | -6.0 | -10.5 | -1.5 | 0.0 | -31.5 | -35.0 | ... | -0.5 | 0.0 | 0.9 | 0.0 | 1.8 | 3.0 | -27.3 | -33.3 | -21.3 | 0.0 |
2022-01-01 | -0.5 | -2.5 | 1.5 | 8.8 | -3.8 | -10.5 | 3.0 | 11.4 | -32.3 | -38.0 | ... | 4.0 | 29.2 | 3.9 | 0.6 | 7.1 | 5.3 | -19.4 | -33.9 | -4.9 | 0.0 |
11689 rows × 428 columns