Electricity Consumption Prediction

Electricity Consumption Prediction

Overview

Electricity suppliers would stand to save millions of shillings if they would decrease their peak demand charge. This can only be possible based on the ability to predict electricity consumption on a daily basis. Our project is aimed at solving this problem by coming up with a solution that can predict electricity consumption on an hourly basis.

We will rely on natural factors to build the model. These factors include; temperature, pressure and wind speed.

Other objectives include:

  1. To highlight current electricity consumption by customers

  2. To predict the required electrical energy to be produced

  3. To identify opportunities based on the customers’ behavior to optimize

  4. electrical energy consumption

  5. To identify patterns in electrical energy consumption

Definition of Terms

Some of the common technical terms that you will come across include;

  1. Energy is measured in J and kJ. Power is the rate of using energy and is measured in W and kW. Fuel bills show energy used in kWh, and the cost of this can be calculated if the cost per kWh is known.

  2. Current is the rate of flow of electric charge. A potential difference (voltage) across an electrical component is needed to make a current flow through it.

An electric current flows when electrons move through a conductor, such as a metal wire. The moving electrons can collide with the ions in the metal. This makes it more difficult for the current to flow and causes resistance.

Importance of solving this problem

The power and lighting company would like to understand the different environmental factors that would cause electricity consumption to fluctuate when other factors are constant (ceteris paribus).

Other than the environmental impact on energy production and propagation, there could be other factors that cause electricity consumption to vary.

These causes could be:

  1. The client Factor - People could use various dubious and illegal methods to modify and amend energy consumption by avoiding electricity meters and Tokens to consume power directly.

  2. The government factor- Governments may offer regulations that will affect supply and demand.

Impact of this project

This project is done to provide solutions for the lighting to grasp the energy spectrum in terms of external factors such as temperature and pressure. Through this project, the Lighting Company can understand the following:

  1. The trend of Energy and electricity consumption between the year 2013 and 2017.

  2. The impact of each of the environmental factors on electricity consumption.

  3. How to leverage the project results to improve energy production and propagation to the clients and end-users.

  4. An accurate prediction also allows us to make better decisions in terms of cost and energy efficiency.

Project management

Resources

a). Dataset

  1. We obtained our data from an electricity generation company supplies electricity to a Ugandan city. This can be found in the following link

Electricity Consumption Predictor

b). Software

  1. Google colab- Data analysis, visualizations and modelling

  2. Predicting_Electricity_consumption - The main repository for our work

  3. Google Docs - Project documentation

  4. Trello, Google sheets - Project management

Assumptions

To carry out our analysis, we made a few assumptions on our data:

  1. The data provided by the company was accurate and up to date.

  2. The data were consistent

  3. Data collection was comprehensive and reliable data collection methods were

  4. used.

Constraints

Some of the data was concealed due to data privacy and this proved challenging to make sense of how it impacted our analysis.

Data Analysis

After carrying out data cleaning and making the data fit the format we needed it to be in, we carried out our analysis. The analysis is broken into three stages as shown below.

Univariate Analysis

  1. Central tendencies From our analysis of the central tendencies of each variable, we were able to find out that during the seventh month of the year we have a spike in electricity consumption.

  2. Frequency distributions We carried out analysis of some of the categorical data. When plotting out the Var2 ,we found out A was the most frequent value in the column.

Bivariate Analysis

  1. Pearson Correlation Coefficient Pearson's Correlation Coefficient helps to find out the relationship between two quantities. When carrying out our analysis, we found out that var1 was positively correlated to temperature while pressure and temperature were negatively correlated

  2. R Squared Correlation R²,like correlation, tells you how related two things are. However, we tend to use R² because it’s easier to interpret. R² is the percentage of variation (i.e. varies from 0 to 1) explained by the relationship between two variables.

  3. Average electricity consumption by day of the month

  4. Average electricity consumption by month

  5. Average electricity consumption by day of the week

  6. Average electricity consumption by hour

  7. Effect of temperature on electricity consumption

Modelling

Our project was mainly a time series problem where we were to work on making a model which would make electricity consumption predictions on an hourly basis. We also came across other models that could be used and found that we could work with LSTM( Long short-term memory) and XGBoost because they could work with multiple variables.

Seasonality and Stationarity

Before we carried out modelling, we had to check for autocorrelation and seasonality since they would have affected our model. When checking for seasonality, we found that the trend in electricity consumption is almost stationary as it neither increases or decreases. We also looked into the autocorrelation and partial autocorrelation and found that even after 40 lages, the line does not get inside the Confidence Interval meaning the data does not have seasonality.

Univariate models

  1. ARIMA

  2. SARIMA

Multivariate models

  • VARMAX

The VARMAX class in statsmodels allows estimation of VAR, VMA, and VARMA models (through the order argument), optionally with a constant term (via the trend argument).

  • Vector Autoregression(VAR)

The Vector Autoregression (VAR) method models the next step in each time series using an AR model. It is the generalization of AR to multiple parallel time series, e.g. multivariate time series.

  • Vector Moving Average(VMA)

We leave out the exogenous regressor but now include the constant term.

  • Vector Autoregression Moving Average(VARMA)

This method models the next step in each time series using an ARMA model. It is the generalization of ARMA to multiple parallel time series, e.g. multivariate time series. However, for the above models, they work best when dealing with fewer feature variables, therefore, it also takes a long time for the model and the more the number of features, the longer it takes to train the model.

  • XG BOOST

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework.

  • LSTM

Sequential class model is a linear stack of Layers. You can create a Sequential model and define all of the layers in the constructor. We have used;

LSTM Layer

Dense Layer

Why are we using LSTM Layer? Typical RNN uses information from the previous step to predict the output. But if only the previous step is not enough, that is long term dependency. If we use RNN using all previous steps ,the explosion/vanishing gradient problem is encountered.

The above work can be found through the following link: Electricity consumption predictor model

Conclusion

XGB model performed quite well with an

  1. RMSE score of 82.44%

  2. MAPE of 19.4%

LSTM Performance:

  1. RMSE score of 126.69

  2. MAPE of 25.39%

From the above summary of the models used, we came to a conclusion that XG Boost would be a better model since XGB had a lower MAPE compared to LSTM.

That said, we trust the XGBoost model performance more than the LSTM model.

However, more Hyperparameter tuning and also cross validation can be done in a bid to reduce the Mean Percentage error of the model.