I pay my electricity bill weekly, its based on wholesale spot pricing, so can change quite a lot, quite quickly. I’m not quite sure what the underlying factors are that make it change so dramatically and the Electricity bill is not of much help.
I sometimes look on the Electric Authority Website for wholesale prices HERE and filter graphs to only look at my specific region. The graph below shows a screenshot of the webpage.
Gathering the Data
I decided to download the Data on the next tab on the web page and see if I could do some basic analysis with python Pandas. So I went to the download page and in the 4 week period that I’d selected for the graph I get all country regions showing prices for every 30 minutes for the 4 week time period as a .CSV file.
I then took that dataset and filtered it down to the 2 local substations in Wellington , Central Park 111 and 331. I used Excel filter to filter the relevant substations into 2 separate CSV files so that I could upload them to Pandas Dataframe for analysis.
Uploading & analysing the data
I wanted to see if the data reflected the Average cost I was paying on my bill.
I had a dataset of 4 weeks of data at 30 minute intervals.
On my bill the data is split into different periods throughout the day:
Morning, 7am to 12pm
Avo, 12pm to 5pm
Evening, 5pm to 9pm
late Night, 9pm to 11pm
Night , 11pm to 7am
The morning and evening Rates are usually a lot higher as there is higher demand at these times and the night rate is usually low as demand is low between 11pm to 7am.
So after loading the data into a Dataframe I needed to do a few things:
Filter the data so the time period matched that of the bill, so although I had 4 weeks of data I only needed a specific week of data
Cluster the data into daily bins that reflect the different rates. one awkward thing about this is the day starts at 12am and the morning rate starts at 7am, so there is from 12am to 7am which is night period 1 and there is from 11pm to 12am at the end of the day which is night period 2. So instead of 5 bins we end up with 6 bins that include night 1 & 2. Not a major issue.
Groupby the the different bins and then get the mean (average) of the price of each of those bins.
This process went reasonably smoothly, although I did step 1 at the end after seeing that the process worked.
I did the exercise for both Region substations and I found that Central Park 331 matched my bill more accurately than the Central Park 111 substation did.
If you look at the screenshot below that compares the 2 substation results compared to the bill you will see the match. Note, there is a difference in the results to the Bill, the Data I was processing uses $/MWh whilst the bill is in $/kWh, so divide by 1000 to match in $/kWh,
Predicting the next bill
It is Wednesday 8, February, 2023 and the bill is for 26/1/2023 to 1/2/2023, so a week previous.
I can download the current weeks wholesale spot price and use the script to see if the rates are likely to be higher or lower than the week before.
I in fact did this for the current data and got the following results:
So it looks like my bill for power will be lower for the next bill as the rates are lower, so if I keep my consumption the same as the week prior then that is fine.
The code I used is as follows:
import pandas as pd
import numpy as np # for test data
import matplotlib.pyplot as plt
#TODO: 1. SET VARIABLES- PATH & FILE TO READ
# path2Files = 'AHK-Working\Timesheeet-logger\'
path2Files = 'Electricity/'
file1R = 'Wholesale_price_20230202_20230208.csv'
# file2R = 'Wholesale_price_2_20230208145237.csv'
# file2W= "g2.csv"
#TODO: 2. To Dataframe
# read contents of csv file
df = pd.read_csv(path2Files + file1R)
# convert strings to datetimes
df['Period start']= pd.to_datetime(df['Period start'], dayfirst=True)
df['Period end']= pd.to_datetime(df['Period end'], dayfirst=True)
# select start/end datetimes
# df =df[(df['Period start']>'2023-01-26 00:00:00') & (df['Period start']<'2023-02-01 23:30:00')]
# create bins based on hour periods
bins = [0, 7, 12, 17, 21,23,24]
# add custom labels
labels = ['night1', 'morning', 'avo', 'evening', 'late night', 'night2']
# add the bins to the dataframe
df['Time Bin'] = pd.cut(df['Period start'].dt.hour, bins, labels=labels, right=False)
# groupby the separate bins and get mean price value
df1 = df.groupby('Time Bin', as_index=False)['Price ($/MWh)'].agg([list, 'mean'])
This ended up going quite smoothly. I am very pleased with the result.
In the past I’ve had a lot of difficulty trying to get this information and the pandas binning really makes this so easy to do. I definitely wouldn’t want to try it in Excel.
There are a couple of things I want to do after this:
Start doing a bit of plotting,either histograms or line diagrams.
Take previous years datasets and try and predict when there are higher seasonal demands so that I can plan a bit further in the future to see when there are likely to be higher demands so that 1/ I can budget for them & 2/ see if I can change my power usage behaviour to lessen the impact.
In fact for item 2.2 above I’ve already modified how I use power, using the night period for my Hot Water Heating to get a lower rate.