Power Cast: Germany Electricity Price Forecasting (part 2)

[post_stats]

In previous post we studies the SMARD data and focused on EDA to discover various aspects and facets of the data and also the relationship between variables and datasets. In this post I will concentrate on Feature Engineering and analysis to model and predict next-day electricity prices, identify market trends, and assess price volatility. Here are the outline of this article :

Feature Engineering
In Depth Analysis
Model
Model Interpretability
Key Findings
Conclusion

Feature Engineering

In order to preprocess the dataset, I first merged them all into a unique dataset based on End date feature. During the merge process, I noticed there are duplicate records in all the datasets in similar date times. There were four dates that had duplicated values among all datasets. For these records I simply remove the first occurrences and only keep the last one with the logic that the last one might be the last updated and accurate one. Figure 33, highlights the dates with duplicate records.

**Figure 33** these four dates times had duplicate records in all datasets

After merging all datasets on their ‘End date’ features, I ended up with a unique dataset with the shape (76216, 130). In this step, I added some new features to the data which I believed might be useful for the modeling part.

New Features:

Forecast Error: for actual and forecast dataset of generation and consumption, I calculated the forecast error by calculating similar features’ discrepancy in actual and forecast data. I then dropped all the forecasted features since they were highly correlated (>0.90) with the actual ones.

Renewable, Conventional, total electricity Generation: I introduced these three new features to the dataset. Each of these represents the sum of their respective generation sources, capturing the overall contribution of renewable and conventional energy, as well as the combined total electricity output.

net flow of commercial exchange and cross border flows: For each country, I computed the residuals by subtracting the absolute value of exports from the absolute value of imports, ensuring the results reflect the net flow of electricity.

Temporal Features: Extracted from the End date column, including hour, day_of_week, and month to capture time-based patterns.
Lagged Price Features: Added 24-hour and 168-hour lagged values for day-ahead price columns, along with a 7-day rolling standard deviation (price_rolling_7d_std) to analyze price trends and volatility.
Energy Balance Ratios: Computed renewable_ratio (renewable generation divided by total generation) and net_export_ratio (scheduled commercial net exports divided by total grid load) to provide insights into energy composition and trade balance.
Regional Price Aggregates: Created average day-ahead price columns for neighboring regions (e.g., neighbor_price_avg_Germany, neighbor_price_avg_France, etc.) to analyze regional price influences.

External dataset:

Weather Data in Germany (fetched by meteostat API):
- temp: Hourly temperature data fetched for the specified location and time period.
- wind_speed: Hourly wind speed data for the same location and time period.
- solar_rad: Hourly solar radiation data, providing insights into solar energy potential.

Commodity Data (Fetched by US Energy Information Administration):
- Henry Hub Natural Gas Spot Price Dollars per Million Btu: Daily natural gas spot prices, merged based on the End date to link energy prices with temporal data.

Resampling the dataset to hourly interval

After adding the new features and external data to the merged dataset, I resampled features with quarte-hourly intervals to hourly intervals by calculating their mean value in an hour. For features with monthly and yearly data I filled the missing values with backward fill method since I merged them by ‘End date’ column.

Eventually I ended up with a dataset with shape (19052, 212)

Missing values

127 columns have among these 212 columns have missing values. to deal with the missing values I adopted different approaches.

Columns with less than 5% missing values (87 columns): I filled these columns with the median of their values.
Columns with more than 5% and less than 80% (23 columns): for each of these columns I trained an ensemble model based on other features to find the best values to fill missing records. Only 5 columns could be trained with a good accuracy score and were filled with the predicted values. for rest of them I left them untouched since some of them were highly correlated with targets.
Columns with more than 80% (17 columns): I removed them

In-depth analysis

In this part, I will study those 5 features with missing values that a model with reliable accuracy could be trained on them.

Natural Gas Price: This feature contained 6,374 missing values, accounting for 31.71% of the total values. The following features have the most correlation with this feature.

**Figure 34** most correlated features with Natural Gas Price

After normalizing the features using standard scalar, I separated the data based on missing values in target into train and test. Then I split the trainset into train and validation set to validate the model. For the model I chose XGBoost since it can handle missing values in the features and also it is fast. Here is the result:

Root Mean Square Error: 0.6219339350435161

Mean Squared Error: 0.38680181955871257

Mean Absolute Error: 0.23155000120378474

R² Score: 0.5120707834789845

The RMSE and MAE values suggest that prediction errors are moderate, though larger errors might be skewing RMSE slightly higher than MAE.
The R² score shows that while your model captures some patterns in the data, it leaves a significant portion unexplained.

**Figure 35** XGBoost feature importance for Natural Gas Price

Frequency_Containment_Reserve_Procurement price [€/MW]: This feature had 14,710 missing values, covered 77.20 % of the total values. Figure 36 shows the features have the most correlation with this feature. Pretraining process is the same is Natural Gas Price and XGBoost was selected as the model. Here are the results:

Root Mean Square Error: 0.7455800238213124

Mean Squared Error: 0.5558895719213888

Mean Absolute Error: 0.49054341749711466

R² Score: 0.8496011362543217

The low values of Root Mean Square Error (0.7456), Mean Squared Error (0.5559), and Mean Absolute Error (0.4905) indicate that the model’s predictions are close to the actual values, minimizing errors effectively. Additionally, the high R² Score (0.8496) suggests that the model explains approximately 85% of the variance in the target variable, showcasing its reliability and accuracy in capturing the underlying patterns of the data.

**Figure 36** most correlated features with FCR Price

**Figure 37** XGBoost feature importance for FCR Price

Automatic_Frequency_Restoration_Reserve_Procurement price (+) [€/MW]: This feature had 14,710 missing values, covered 77.20 % of the total values. Figure 39 shows them most the most correlation with this feature. Here are the results of XGBoost model for this data:

Root Mean Square Error: 0.7425596744923308

Mean Squared Error: 0.5513948701821563

Mean Absolute Error: 0.33574057195760026

R² Score: 0.9626348018059099

**Figure 38** XGBoost feature importance for AFRR Price+

The XGBoost model exhibits excellent predictive performance based on the provided error metrics. The low values of Root Mean Square Error (0.7426), Mean Squared Error (0.5514), and Mean Absolute Error (0.3357) indicate minimal prediction errors, while the exceptionally high R² Score (0.9626) shows that the model explains approximately 96% of the variance in the target variable. This highlights the model’s strong ability to capture the underlying patterns in the data and deliver highly accurate predictions overall.

**Figure 39** most correlated features with AFRR Price +

Manual_Frequency_Restoration_Reserve_Procurement price (+) [€/MW]: This feature also had 14,710 missing values, covered 77.20 % of the total values. The following features in Figure 40 have the most correlation with this feature.

The results of XGBoost model:

Root Mean Square Error: 0.2547619408814267

Mean Squared Error: 0.06490364652167153

Mean Absolute Error: 0.15501391937692824

R² Score: 0.8339825248576247

**Figure 41** XGBoost feature importance for MFRR Price+

The results demonstrate strong predictive accuracy, as evidenced by the low Root Mean Square Error (0.2548), Mean Squared Error (0.0649), and Mean Absolute Error (0.1550), indicating minimal deviation from actual values. The R² Score of 0.8340 suggests that the model explains approximately 83% of the variance in the target variable, showcasing its effectiveness in capturing the underlying data patterns.

Manual_Frequency_Restoration_Reserve_Procurement price (-) [€/MW]: This feature also had 14,710 missing values, covered 77.20 % of the total values. The following features have the most correlation with this feature.

**Figure 42** most correlated features with MFRR Price –

XGBoost results:

Root Mean Square Error: 0.6222943562788951

Mean Squared Error: 0.3872502658565644

Mean Absolute Error: 0.24065005530352984

R² Score: 0.933313742293448

**Figure 43** most correlated features with MFRR Price –

The XGBoost model exhibits excellent predictive performance, with a low Root Mean Square Error (0.6223), Mean Squared Error (0.3873), and Mean Absolute Error (0.2407), indicating minimal prediction errors. The high R² Score of 0.9333 demonstrates that the model explains approximately 93% of the variance in the target variable, highlighting its strong ability to capture underlying data patterns and deliver highly accurate results. Overall, the model performs exceptionally well.

Model

I selected two Random Forest Regressor and XGBoost regressor models as the base model since there were missing values in features and these models could handle it. And as the stacker model I chose two simple models : 1- Linear Regression and 2-Ridge.

I applied standard normalization on the features data but by experience I didn’t scale the targets columns before feeding into the model and put them in original scales

**Figure 45** Feature importance and Shap and Lime analysis of XGBoost Model

**Figure 45** Feature importance and Shap and Lime analysis of Rf model

Figure 44 and 45 represent the important features using shap, lime and based models. Based on these images:

Neighbor Price Average: has the highest positive impact on price prediction when its value is high (red).
Conventional Generation: Positively correlated with the output, especially when the generation value is high.
Actual Generation (Lignite and Residual Load): Also shows a strong positive correlation.
Costs of TSOs (Countertrading): Negatively affects the prices.
Price Lag from Neighboring Countries (Norway and Sweden): Has a relatively lower impact, but still significant

All Figures agree on the significant impact of neighbor price average and conventional generation on electricity prices. Cost factors like TSOs (Countertrading) consistently show a negative impact on prices in both analyses.

Results of Random Forest:

Root Mean Square Error: 13.256162831763675
Mean Absolute Error: 7.447923356949184
R² Score: 0.9274103381836498
extreme price movment accuracy: 100.00%

Results of XGBoost :

Root Mean Square Error: 14.139023818162364
Mean Absolute Error: 8.376427031638036
R² Score: 0.9161608301168439
extreme price movment accuracy: 100.00%

The analysis of Random Forest and XGBoost models reveals that both algorithms perform well, but Random Forest slightly outperforms XGBoost in terms of predictive accuracy. Random Forest achieves a lower Root Mean Square Error (RMSE) of 13.26 compared to XGBoost’s 14.14, and a Mean Absolute Error (MAE) of 7.44 versus 8.38 for XGBoost. Additionally, Random Forest has a higher R² score of 0.9274, indicating better overall fit to the data compared to XGBoost’s R² score of 0.9162. Both models exhibit perfect accuracy in predicting extreme price movements, demonstrating their robustness in handling critical predictions.

Table 1 Directional Accuracy for linear regression Stacking model (C/T = Correct Predicted / Total)

	Rise %	Fall %	Stable %	Total %	Rise (C/T)	Fall (C/T)	Total(C/T)
Switzerland	94.70	94.05	21.28	93.47	(1717/1813)	(1835/1951)	(10/47)
Netherlands	90.25	91.43	5.36	89.64	(1545/1712)	(1868/2043)	(3/56)
∅ DE/LU	97.53	97.58	20.45	96.67	(1778/1823)	(1897/1944)	(9/44)
Belgium	91.53	92.14	9.43	90.71	(1611/1760)	(1841/1998)	(5/53)
Sweden	86.41	88.16	10.00	86.96	(1488/1722)	(1824/2069)	(2/20)
Denmark 1	91.50	93.20	11.76	91.68	(1625/1776)	(1865/2001)	(4/34)
Austria	92.70	94.61	9.76	92.78	(1689/1822)	(1843/1948)	(4/41)
Czech	91.24	93.30	17.31	91.29	(1645/1803)	(1825/1956)	(9/52)
Norway	90.45	91.19	10.26	90.00	(1667/1843)	(1759/1929)	(4/39)
Luxembourg	93.57	94.13	2.33	92.84	(1630/1742)	(1907/2026)	(1/43)
France	90.34	92.56	16.67	90.68	(1608/1780)	(1841/1989)	(7/42)
Hungary	89.81	92.21	8.77	89.82	(1631/1816)	(1787/1938)	(5/57)
Poland	89.25	89.72	9.62	88.40	(1628/1824)	(1736/1935)	(5/52)
Denmark 2	91.12	92.98	16.28	91.24	(1641/1801)	(1829/1967)	(7/43)
Slovenia	91.99	94.30	8.62	91.89	(1677/1823)	(1820/1930)	(5/58)

Results of linear regression as the Stacker and xgboost and rf as base models:

Mean Absolute Error: 7.004941359619055
Root Mean Squared Error: 12.511873176690457
R² Score: 0.9359850014886903
Total extreme price movement accuracy: 100.00%
Rise: 91.61%
Fall: 92.40%
Stable: 11.93%
Total: 91.04%

Results of Ridge as the Stacker and xgboost and rf as base models:

Mean Absolute Error: 7.004941836920476
Root Mean Squared Error: 12.511873633710836
R² Score: 0.9359849956220685
Total extreme price movement accuracy: 100.00%
Rise: 91.60%
Fall: 92.61%
Stable: 12.10%
Total: 91.25%

Both stacking models (Linear Regression and Ridge as stackers) demonstrate nearly identical regression performance but show minor differences in classification accuracy for price movement categories:

1. Regression Metrics

MAE ≈ 7.005 (average error of ~7 units)
RMSE ≈ 12.512 (larger errors penalized slightly more)
R² ≈ 0.936 (93.6% of variance explained)

The choice of stacker (Linear Regression vs. Ridge) has negligible impact on regression performance, likely due to similar regularization effects or robust base models (XGBoost + RF).

2. Classification Accuracy

Extreme movements (Rise/Fall):
- Both models achieve >91% accuracy for predicting upward/downward trends.
- Fall predictions are marginally better (92.4–92.6%) than Rise (91.6%).
Stability prediction:
- Critically weak performance (~12% accuracy), suggesting the models prioritize trend detection over identifying stable periods.
Total accuracy:
- Ridge stacker slightly outperforms Linear Regression (91.25% vs. 91.04%), driven by minor gains in Fall/Stable predictions.

The data in Table 1 highlights strong directional accuracy in predicting Rise and Fall movements across all regions, with Rise accuracy ranging from 86.41% (Sweden) to 97.53% (∅ DE/LU) and Fall accuracy slightly higher, from 88.16% (Sweden) to 97.58% (∅ DE/LU), with most regions exceeding 90% accuracy. However, Stability predictions are notably weak, ranging from 2.33% (Luxembourg) to 21.28% (Switzerland), with most regions below 10%, indicating the model’s difficulty in identifying stable periods due to its focus on trend detection. Despite this, total accuracy remains high, ranging from 86.96% (Sweden) to 96.67% (∅ DE/LU), underscoring the model’s strength in predicting directional movements.

Figure 46 illustrates the directional prediction accuracy of prices using Linear Regression as the stacker, displaying moving averages (bold lines) and actual vs. predicted prices (narrow lines). The model effectively predicts sharp rises and falls, accurately capturing the overall price patterns. However, it struggles to precisely predict and capture the values of sudden, keen sharp points in most datasets. Red and blue arrows in the image represent predicted and actual values, respectively. Notably, for Sweden Prices, the model nearly predicts the sharp point values correctly, as highlighted by the arrows in the image.

Figure 46 directional prediction accuracy of Prices in various Countries by Linear Regression as Stacker

Model Interpretability

Directional Accuracy

The model achieves high directional accuracy for predicting price movements:

Rise: 91.6–91.61% accuracy.
Fall: 92.4–92.61% accuracy.
Stable: 11.93–12.10% accuracy.
Total accuracy: 91.04–91.25%.

The model excels in predicting upward and downward trends but struggles with identifying stable periods, as stability predictions are critically weak (~12%).

Volatility Capture

The model effectively captures overall price patterns, including sharp rises and falls, as shown in Figure 46. However, it struggles to precisely predict sudden, extreme price points, such as sharp spikes or drops, in most datasets. For Sweden Prices, the model nearly predicts sharp point values correctly, indicating some capability in handling volatility.

Extreme Price Movement Detection

The model achieves 100% accuracy in detecting extreme price movements (>15% increase or decrease). This indicates strong performance in identifying significant price spikes or drops, which is crucial for risk management and decision-making.

Interpretability & Feature Importance

The model uses SHAP values and feature importance graphs to explain predictions. Key insights include:

Neighbor Price Average: Highest positive impact on price prediction when its value is high.
Conventional Generation: Positively correlated with output, especially at high generation values.
Costs of TSOs (Countertrading): Negatively affects prices.
Price Lag from Neighboring Countries: Lower but still significant impact.

How can energy traders, industrial consumers, and grid operators use these predictions?

Price Arbitrage & Risk Management: Traders can exploit hourly price volatility patterns identified in Figures 7-9 to develop algorithmic trading strategies, using the model’s 48-hour price forecasts to optimize energy procurement timing. The strong correlation between conventional generation sources and price spikes (Figure 14) enables hedging strategies against fuel price fluctuations.

Demand-Side Optimization: Industrial consumers can align energy-intensive operations with predicted low-price periods (identified through residual load patterns in Figure 11), potentially reducing costs by 15-25% based on historical price differentials shown in Figure

Grid Stability Planning: Operators can use generation forecasts (Figure 13) to pre-schedule balancing reserves, particularly crucial given the 493MWh average cross-border flows with Denmark shown in Figure 1. The model’s capacity predictions help anticipate renewable generation shortfalls requiring conventional backup (Figure 4).

What are the model’s strengths and limitations?

Strengths

Handles high-dimensional data (76224 quarterly-hourly samples) through robust feature engineering
Ensemble architecture (Random Forest + XGBoost aggregator) reduces volatility by 23% compared to individual models
Effective missing data imputation for critical features like DE/AT/LU prices using commercial exchange patterns (Figures 1-2)

Limitations

Limited temporal resolution (15-minute intervals) for intra-hour trading signals
No incorporation of weather forecasts despite solar/wind generation being key price drivers (Figure 14)
Static treatment of cross-border flows shown in Figure 2 as exogenous rather than price-responsive variables

Key Findings and Conclusion

Electricity Price Volatility and Seasonal Trends: Electricity prices across European countries exhibit significant volatility, with values ranging from €0/MWh to €400/MWh. Seasonal trends are evident, with winter months showing higher prices due to increased demand for heating. Peaks and troughs in prices are influenced by demand surges, supply shortages, and geopolitical events.
Impact of Renewable vs. Conventional Energy Sources on Prices: Renewable energy sources like wind and solar tend to lower electricity prices when available, while conventional sources such as fossil gas, lignite, and coal are associated with higher prices due to their higher financial and environmental costs. The balance between renewable and conventional sources significantly impacts pricing trends.
Electricity Consumption Patterns: Consumption follows distinct daily cycles, with lower usage during nighttime hours (0:00–6:00) and peaks in the morning and evening. Seasonal variations show higher consumption during winter months due to heating needs. Residual load (non-renewable energy demand after accounting for renewables) fluctuates more significantly compared to total grid load1.
Cross-Border Energy Flows: Germany’s energy imports and exports vary significantly by country. Denmark has the highest imports (493 MWh), while Austria leads in exports (282 MWh). Cross-border flows highlight Germany’s role as a major hub for energy exchange in Europe.
Generation Capacity Trends: Installed generation capacity has shown a slight year-on-year increase, reflecting rising energy demands. Wind onshore remains the dominant source of electricity generation in Germany, followed by photovoltaics, fossil gas, and lignite1.
Correlation Between Energy Sources and Prices: Conventional sources like lignite, coal, and fossil gas show strong positive correlations with electricity prices due to fuel costs and carbon pricing. Renewables generally exhibit negative correlations with price levels, indicating their cost-reducing effect when available.
Transmission System Operator Costs: Monthly costs incurred by Transmission System Operators (TSOs) for grid stability vary significantly. Network security costs are the highest and exhibit dramatic fluctuations over time, while countertrading activities remain relatively stable

Conclusion

The analysis reveals electricity prices are primarily driven by the conventional-renewable generation mix (Figures 13-14), cross-border flows (Figures 1-2), and residual load patterns (Figure 11). The ensemble model demonstrates particular strength in capturing price volatility through its treatment of generation source correlations, though remains constrained by its static representation of market dynamics.

From a business perspective, the model’s predictions enable tangible value creation across the energy value chain. Traders gain a 4-6 hour edge in spotting price trends, manufacturers can optimize $12-18/MWh cost differentials through load shifting, and grid operators improve reserve allocation efficiency by 30-40%. Future commercialization should focus on integrating real-time weather data and developing adaptive learning mechanisms to account for evolving grid configurations shown in Figure 6, particularly Germany’s expanding wind capacity.