__STYLES__

Hello darkness, my old friend [Finalist]

Tools used in this project
Hello darkness, my old friend [Finalist]

Power BI dashboard

About this project

Objective

Role: Senior Analytics Consultant hired by the U.S. Department of Energy (DOE).

Time range: 2022 - 1H 2023

Task: to amalgamate and refine the raw data, subsequently crafting a comprehensive dashboard or report. This analytical tool will aid in unraveling patterns and trends related to outages, quantifying their impact on communities, and pinpointing potential vulnerabilities in the grid. Second - transparently highlight any caveats or assumptions you make concerning data quality issues or the presence of missing values.Data set

  1. Form DOE-417,

  2. Excel spreadsheet containing the annual summaries,

  3. 3 PDF documents for reference (the survey form, instructions, and documentation for online form submissions).

Data cleaning

  1. Merging files

The first task was to combine 22 sheets into one, with the same number and column names as the 2023 sheet. For this purpose it was necessary to remove or add columns.

  1. Adding an ID number

Each row received its own ID number, which was useful later, when it was necessary to divide one large sheet into several smaller ones.

  1. Dates

Divide the dates into 4 columns:

  • Date Event Began,
  • Time Event Began,
  • Date of Restoration,
  • Time of Restoration

and remove the rows separating individual months.

Replace values:

  • “12:00 noon” -> PM
  • “a.m.” -> AM
  • “p.m.” -> PM
  • “evening” -> “6:00 PM”
  1. States and NERC region

Based on the "area" column, I added a column containing the names of the States. If the rows concerned several states, it was divided into several rows - separate for each state. The value of lost [MW] assigned to them and the number of customers affected were divided in proportion to the number of states.

I removed rows from areas outside the USA, for example Nova Scotia or Manitoba.

Similarly for NERC regions.

I used a map to divide into NERC regions: https://www.google.com/maps/d/u/0/viewer?mid=16rZqG5BrdTh3Lpkhmjs6oafMPaM&hl=en&ll=45.79816950728205%2C-98.59130871875001&z=4

Additionally: PR (Puerto Rico), HICC (Hawai).

  1. Type of event

Based on the "Type of Disturbance" column, I divided the event types into:

  1. Environmental factors:

  2. Snow/Ice,

  3. Wind,

  4. Storm,

  5. Severe weather,

  6. Natural disaster,

  7. Heatwave.

  8. Equipment failure:

  9. Operator action,

  10. Fire,

  11. Technical defect,

  12. Fuel supply emergencies,

  13. Transmission interruption,

  14. System operations.

  15. Human factor:

  16. Vandalism,

  17. Cyber attack,

  18. Sabotage,

  19. Physical attack,

  20. Suspicious activity.

  21. Other:

  22. Public appeal,

  23. Other,

  24. Unknown.

  1. Loss and the number of customers affected

I deleted words before or after the values, e.g. "approx", "at peak" or "PG&E".

Rows with a number range I replaced with the maximum value.

Values such as: “133 on 5/21/04 between 3:00 a.m. and 4:00 a.m., 392 on 5/21/04 between 4:00 p.m. and 5:00 p.m.” I replaced the sum of these numbers.

  1. Missing Data

  2. “-“ replaced with 0.

  3. N/A or NA replaced with 0.

  4. "UNK" or "Unknown" replaced with NULL value.

Dashboard

Used tools: Power Bi & Figma

KPIs:

  • Number of events,
  • Energy loss [MW],
  • Number of customers affected,
  • Avg. restoration time [h].

Slicers:

  • Year,
  • Event type.

Visualizations:

  • Map of the USA and clustered bar chart - NERC Region in three versions: number of events, energy loss [MW] and number of customers affected.
  • Line chart - types of event
  • Heatmap - Most frequent time and month
  • Matrix - Type of event/Year number of events, energy loss [MW], number of customers affected and restoration time.

Conclusions

Patterns and trends around outages

An analysis of outages from 2002 to 2023 shows that California and Texas experience the most power outages. They are also the largest and most populous states (except Alaska). Therefore, it is more reliable to compare the number of customers affected to the population in a state. Such calculations show that the most people affected are in Nevada, Vermont and South Carolina. However, it should be noted that data for Alaska is missing.

The greatest number of outages occur in the summer months, in the afternoon, although the differences between other times of the day and months are not that significant.

Nearly half of outages are due to severe weather, mainly caused by storms and strong winds such as Hurricane Katrina and Sandy. Since 2016, there has been an increase in the number of failures, which poses a challenge to network stability.

What is interesting, however, is the decrease in the number of customers affected due to power outages since 2020, which may indicate the improvement of corrective actions and preventive measures.

The "other" category shows the highest average power loss and the number of customers affected, which emphasizes the need to precisely describe these events. Accurate data in this category is crucial to better avoid similar incidents in the future.

The number of failures caused by vandalism and intentional human actions is also increasing, and cyberattacks are becoming more frequent, which imposes additional challenges on maintaining network security.

Weak points in the grid

Most failures in the power grid result from weather factors beyond our control. Few actions are possible to protect the network against hurricanes or other natural disasters. However, we can focus on improving the equipment we use, because almost a quarter of all failures are caused by damaged equipment. These incidents translate into power outages for almost 40 million people (over 10% of the total).

Data analysis shows that after a significant increase in failures in 2018-2022, their number decreased in the first half of 2023. This is a positive trend, although the problem remains significant.

Additionally, an issue worth noting is the location of power plants in the USA, visible on the map below. The WECC region, which is also the largest area, is most vulnerable to failures, and at the same time has relatively the fewest number of power plants.

A partial solution to this problem may be investing in renewable energy sources, such as photovoltaics. Equipping households with such technologies would reduce the inconvenience of power outages, especially on sunny days. Investments in new technologies can therefore contribute to increasing the resilience of the electricity grid to failures.

undefinedSource: https://atlas.eia.gov/apps/5039a1a01ec34b6bbf0ab4fd57da5eb4/explore

Recommendations for data collection

Shared file with outage data

It is recommended to create a shared file that will serve as a data source for Power BI reports. The entered data should be systematically added to this file to enable real-time updates of the reports.

State names column

For better data readability, it is proposed to add a separate column containing state names. This will facilitate data identification and analysis, as well as expedite understanding of the geographical context associated with power outages.

Precise data

It is recommended to limit the data in columns related to losses and the number of people to numerical values only. Such precision will simplify data analysis, eliminating potential errors related to text formatting and facilitating subsequent calculations and visualizations.

Limiting the use of "Other"

In the case of the "Other" category, it is recommended to precisely describe and complete the data to avoid ambiguity. At the same time, minimizing the use of this category is encouraged by classifying outages in as much detail as possible, enabling more precise analyses.

Complete of power restoration date information

This information will not only allow for a more accurate determination of the duration of power outages but also enable tracking the efficiency and response time in the case of repair actions. Attention to the accuracy and completeness of this data is crucial for a comprehensive analysis and identification of areas requiring improvements in the power delivery system.

Additional project images

Discussion and feedback(2 comments)
comment-564-avatar
Rakesh Kumar Sahoo
Rakesh Kumar Sahoo
7 months ago
Great work and nice representation of visualization.

comment-574-avatar
Aaron Parry
Aaron Parry
7 months ago
Nice nod to Simon & Garfunkel!
2000 characters remaining