Saturday, May 9, 2026
banner
Top Selling Multipurpose WP Theme

Distribution is probably the most generally used, and sadly many precise knowledge are usually not regular. When confronted with extraordinarily distorted knowledge, it’s engaging to make use of log transformations to normalize distributions and stabilize the variance. I not too long ago labored on a venture utilizing Epoch AI knowledge to research the vitality consumption of AI fashions coaching [1]. Since there isn’t any official knowledge on vitality utilization for every mannequin, the ability draw for every mannequin was calculated by multiplying the coaching time. The brand new variable, vitality (kWh), was very right-skewed, together with excessive, excessively prolonged outliers (Fig. 1).

Determine 1. Histogram of Power Consumption (kWh)

To deal with this skewness and inhomogeneity, my first intuition was to use log conversion to vitality variables. The log (vitality) distribution seems to be rather more regular (Fig. 2), and the Shapiro Wilk take a look at confirmed the normality of the boundary line (p≈0.5).

Determine 2. Power consumption log histogram (kWh)

Modeling Dilemma: Log Conversion and Log Linking

The visualization regarded good, however after I moved on to modeling, I used to be confronted with a dilemma. Logged Response Variables (log(Y) ~ X)), Or that you must mannequin it Unique response variable I take advantage of A Log Hyperlink Operate (Y ~ X, hyperlink = “log"))? We additionally examined two distributions, Gaussian (regular) and gamma distributions, and mixed every distribution with each log approaches. This gave us 4 completely different fashions beneath, all put in utilizing a generalized linear mannequin (GLM) of R.

all_gaussian_log_link <- glm(Energy_kWh ~ Parameters +
      Training_compute_FLOP +
      Training_dataset_size +
      Training_time_hour +
      Hardware_quantity +
      Training_hardware, 
    household = gaussian(hyperlink = "log"), knowledge = df)
all_gaussian_log_transform <- glm(log(Energy_kWh) ~ Parameters +
                          Training_compute_FLOP +
                          Training_dataset_size +
                          Training_time_hour +
                          Hardware_quantity +
                          Training_hardware, 
                         knowledge = df)
all_gamma_log_link  <- glm(Energy_kWh ~ Parameters +
                    Training_compute_FLOP +
                    Training_dataset_size +
                    Training_time_hour +
                    Hardware_quantity +
                    Training_hardware + 0, 
                  household = Gamma(hyperlink = "log"), knowledge = df)
all_gamma_log_transform  <- glm(log(Energy_kWh) ~ Parameters +
                    Training_compute_FLOP +
                    Training_dataset_size +
                    Training_time_hour +
                    Hardware_quantity +
                    Training_hardware + 0, 
                  household = Gamma(), knowledge = df)

Mannequin comparability: AIC and diagnostic plots

4 fashions have been in contrast utilizing the Akaike Info Criterion (AIC). That is an estimator for prediction errors. Sometimes, the decrease the AIC, the higher the mannequin can be.

AIC(all_gaussian_log_link, all_gaussian_log_transform, all_gamma_log_link, all_gamma_log_transform)

                           df       AIC
all_gaussian_log_link      25 2005.8263
all_gaussian_log_transform 25  311.5963
all_gamma_log_link         25 1780.8524
all_gamma_log_transform    25  352.5450

Of the 4 fashions, fashions that use log conversion outcomes have a lot decrease AIC values ​​than fashions that use log hyperlinks. As a result of the variations in AICs between log-transformed and log-link fashions have been substantial (311 and 352 vs 1780 and 2005), we additionally examined diagnostic plots to additional confirm that the log-transformed fashions match higher.

Determine 4. Diagnostic plots of Gaussian fashions linked to logs. The residual versus match plot suggests linearity regardless of some outliers. Nevertheless, the QQ plot exhibits a major deviation from the theoretical line, suggesting non-normality.
Determine 5. Diagnostic plots for the log-transformed Gaussian mannequin. The QQ plot exhibits a a lot better match and helps normality. Nevertheless, there’s a drop to -2 within the residue and match plot, which can counsel nonlinearity.
Determine 6. Diagnostic plots of gamma fashions linked to logs. The QQ plot seems to be okay, however the residuals and match plots present clear indicators of nonlinearity
Determine 7. Diagnostic plots of logged transformed gamma fashions. The residue and match plot look good, with a small dip of -0.25 at first. Nevertheless, the QQ plot exhibits deviations at each tails.

Primarily based on AIC values ​​and diagnostic plots, we determined to advance the log-transformed gamma mannequin as we had the second lowest AIC worth, and its residuals and match plots look higher than that of the log-transformed Gaussian mannequin.
We started to look into which explanatory variables have been helpful and which interactions have been essential. The ultimate mannequin I selected is:

glm(components = log(Energy_kWh) ~ Training_time_hour * Hardware_quantity + 
    Training_hardware + 0, household = Gamma(), knowledge = df)

Interpretation coefficients

Nevertheless, after I started to interpret the coefficients within the mannequin, one thing was felt. Since solely the response variables have been log-transformed, the impact of the predictor is proliferating and the coefficients should be exponentially reverted to the unique scale. A rise of 1 unit of 𝓍 multiplies the outcome by EXP (β). [2].

Trying on the outcomes desk for the mannequin beneath, Training_time_hour, hardware_quantity, and their interplay phrases Training_time_hour: hardware_quantity As a result of it’s a steady variable, the coefficients characterize the slope. Alternatively, since I specified +0 within the mannequin expression, all ranges of the class Training_hardware It acts as an intercept. In different phrases, every {hardware} sort acted as an intercept β₀ when the corresponding dummy variable turned lively.

> glm(components = log(Energy_kWh) ~ Training_time_hour * Hardware_quantity + 
    Training_hardware + 0, household = Gamma(), knowledge = df)

Coefficients:
                                                 Estimate Std. Error t worth Pr(>|t|)    
Training_time_hour                             -1.587e-05  3.112e-06  -5.098 5.76e-06 ***
Hardware_quantity                              -5.121e-06  1.564e-06  -3.275  0.00196 ** 
Training_hardwareGoogle TPU v2                  1.396e-01  2.297e-02   6.079 1.90e-07 ***
Training_hardwareGoogle TPU v3                  1.106e-01  7.048e-03  15.696  < 2e-16 ***
Training_hardwareGoogle TPU v4                  9.957e-02  7.939e-03  12.542  < 2e-16 ***
Training_hardwareHuawei Ascend 910              1.112e-01  1.862e-02   5.969 2.79e-07 ***
Training_hardwareNVIDIA A100                    1.077e-01  6.993e-03  15.409  < 2e-16 ***
Training_hardwareNVIDIA A100 SXM4 40 GB         1.020e-01  1.072e-02   9.515 1.26e-12 ***
Training_hardwareNVIDIA A100 SXM4 80 GB         1.014e-01  1.018e-02   9.958 2.90e-13 ***
Training_hardwareNVIDIA GeForce GTX 285         3.202e-01  7.491e-02   4.275 9.03e-05 ***
Training_hardwareNVIDIA GeForce GTX TITAN X     1.601e-01  2.630e-02   6.088 1.84e-07 ***
Training_hardwareNVIDIA GTX Titan Black         1.498e-01  3.328e-02   4.501 4.31e-05 ***
Training_hardwareNVIDIA H100 SXM5 80GB          9.736e-02  9.840e-03   9.894 3.59e-13 ***
Training_hardwareNVIDIA P100                    1.604e-01  1.922e-02   8.342 6.73e-11 ***
Training_hardwareNVIDIA Quadro P600             1.714e-01  3.756e-02   4.562 3.52e-05 ***
Training_hardwareNVIDIA Quadro RTX 4000         1.538e-01  3.263e-02   4.714 2.12e-05 ***
Training_hardwareNVIDIA Quadro RTX 5000         1.819e-01  4.021e-02   4.524 3.99e-05 ***
Training_hardwareNVIDIA Tesla K80               1.125e-01  1.608e-02   6.993 7.54e-09 ***
Training_hardwareNVIDIA Tesla V100 DGXS 32 GB   1.072e-01  1.353e-02   7.922 2.89e-10 ***
Training_hardwareNVIDIA Tesla V100S PCIe 32 GB  9.444e-02  2.030e-02   4.653 2.60e-05 ***
Training_hardwareNVIDIA V100                    1.420e-01  1.201e-02  11.822 8.01e-16 ***
Training_time_hour:Hardware_quantity            2.296e-09  9.372e-10   2.450  0.01799 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Gamma household taken to be 0.05497984)

    Null deviance:    NaN  on 70  levels of freedom
Residual deviance: 3.0043  on 48  levels of freedom
AIC: 345.39

When the gradient was transformed to the speed of change of the response variable, the impact of every steady variable was practically zero and barely damaging.

All intercepts have been returned to simply 1 kWh on the unique scale. The outcomes have been meaningless. Not less than one slope ought to develop with huge vitality consumption. I believed that utilizing a log hyperlink mannequin with the identical predictor would possibly end in completely different outcomes, so I match the mannequin once more.

glm(components = Energy_kWh ~ Training_time_hour * Hardware_quantity + 
    Training_hardware + 0, household = Gamma(hyperlink = "log"), knowledge = df)

Coefficients:
                                                 Estimate Std. Error t worth Pr(>|t|)    
Training_time_hour                              1.818e-03  1.640e-04  11.088 7.74e-15 ***
Hardware_quantity                               7.373e-04  1.008e-04   7.315 2.42e-09 ***
Training_hardwareGoogle TPU v2                  7.136e+00  7.379e-01   9.670 7.51e-13 ***
Training_hardwareGoogle TPU v3                  1.004e+01  3.156e-01  31.808  < 2e-16 ***
Training_hardwareGoogle TPU v4                  1.014e+01  4.220e-01  24.035  < 2e-16 ***
Training_hardwareHuawei Ascend 910              9.231e+00  1.108e+00   8.331 6.98e-11 ***
Training_hardwareNVIDIA A100                    1.028e+01  3.301e-01  31.144  < 2e-16 ***
Training_hardwareNVIDIA A100 SXM4 40 GB         1.057e+01  5.635e-01  18.761  < 2e-16 ***
Training_hardwareNVIDIA A100 SXM4 80 GB         1.093e+01  5.751e-01  19.005  < 2e-16 ***
Training_hardwareNVIDIA GeForce GTX 285         3.042e+00  1.043e+00   2.916  0.00538 ** 
Training_hardwareNVIDIA GeForce GTX TITAN X     6.322e+00  7.379e-01   8.568 3.09e-11 ***
Training_hardwareNVIDIA GTX Titan Black         6.135e+00  1.047e+00   5.862 4.07e-07 ***
Training_hardwareNVIDIA H100 SXM5 80GB          1.115e+01  6.614e-01  16.865  < 2e-16 ***
Training_hardwareNVIDIA P100                    5.715e+00  6.864e-01   8.326 7.12e-11 ***
Training_hardwareNVIDIA Quadro P600             4.940e+00  1.050e+00   4.705 2.18e-05 ***
Training_hardwareNVIDIA Quadro RTX 4000         5.469e+00  1.055e+00   5.184 4.30e-06 ***
Training_hardwareNVIDIA Quadro RTX 5000         4.617e+00  1.049e+00   4.401 5.98e-05 ***
Training_hardwareNVIDIA Tesla K80               8.631e+00  7.587e-01  11.376 3.16e-15 ***
Training_hardwareNVIDIA Tesla V100 DGXS 32 GB   9.994e+00  6.920e-01  14.443  < 2e-16 ***
Training_hardwareNVIDIA Tesla V100S PCIe 32 GB  1.058e+01  1.047e+00  10.105 1.80e-13 ***
Training_hardwareNVIDIA V100                    9.208e+00  3.998e-01  23.030  < 2e-16 ***
Training_time_hour:Hardware_quantity           -2.651e-07  6.130e-08  -4.324 7.70e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Gamma household taken to be 1.088522)

    Null deviance: 2.7045e+08  on 70  levels of freedom
Residual deviance: 1.0593e+02  on 48  levels of freedom
AIC: 1775

At the moment, Training_time and hardware_quantity Complete vitality consumption will increase by 0.18% per further hour and 0.07% per further tip. Alternatively, their interplay reduces vitality utilization from 2×10%. These outcomes make extra sense Training_time Can attain as much as 7000 hours hardware_quantity As much as 16,000 items.

To higher visualize the variations, I created two plots evaluating predictions (displayed as dashed strains) for each fashions. The left panel used the log-converted gamma GLM mannequin. On this gamma GLM mannequin, the dashed line was virtually flat and near zero, and there was nowhere close to the mounted line of RAW knowledge. In the meantime, the proper panel used a gamma GLM mannequin linked to the log. Right here the dashed strains have been a lot nearer aligned with the precise match line.

test_data <- df[, c("Training_time_hour", "Hardware_quantity", "Training_hardware")]
prediction_data <- df %>%
  mutate(
    pred_energy1 = exp(predict(glm3, newdata = test_data)),
    pred_energy2 = predict(glm3_alt, newdata = test_data, sort = "response"),
  )
y_limits <- c(min(df$Energy_KWh, prediction_data$pred_energy1, prediction_data$pred_energy2),
              max(df$Energy_KWh, prediction_data$pred_energy1, prediction_data$pred_energy2))

p1 <- ggplot(df, aes(x = Hardware_quantity, y = Energy_kWh, colour = Training_time_group)) +
  geom_point(alpha = 0.6) +
  geom_smooth(methodology = "lm", se = FALSE) +
  geom_smooth(knowledge = prediction_data, aes(y = pred_energy1), methodology = "lm", se = FALSE, 
              linetype = "dashed", measurement = 1) + 
  scale_y_log10(limits = y_limits) +
  labs(x="{Hardware} Amount", y = "log of Power (kWh)") +
  theme_minimal() +
  theme(legend.place = "none") 
p2 <- ggplot(df, aes(x = Hardware_quantity, y = Energy_kWh, colour = Training_time_group)) +
  geom_point(alpha = 0.6) +
  geom_smooth(methodology = "lm", se = FALSE) +
  geom_smooth(knowledge = prediction_data, aes(y = pred_energy2), methodology = "lm", se = FALSE, 
              linetype = "dashed", measurement = 1) + 
  scale_y_log10(limits = y_limits) +
  labs(x="{Hardware} Amount", colour = "Coaching Time Stage") +
  theme_minimal() +
  theme(axis.title.y = element_blank()) 
p1 + p2
Determine 8. Relationship between {hardware} quantity and vitality consumption logs throughout the coaching time group. In each panels, the uncooked knowledge is displayed as factors, with stable strains representing match values ​​from the linear mannequin, and dashed strains representing predictive values ​​from the generalized linear mannequin. The left panel makes use of log-converted gamma GLM, and the proper panel makes use of log-related gamma GLM with the identical predictors.

Why log conversion fails

To know why a logged-transformed mannequin can not seize the underlying results as a mannequin linked to the log, let’s check out what occurs if you apply log transformations to response variables.

For instance y is the same as a operate with x and an error time period.

Making use of log conversion to y really compresses each f(x) and errors.

In different phrases, it’s modeling a totally new response variable LOG(Y). When connecting your personal operate g(x) – in my case g(x)= training_time_hour*hardware_quantity + training_hardware– You are attempting to seize the mixed impact of each “decreased” f(x) and error phrases.

In distinction, when utilizing log hyperlinks, you’re modeling the unique Y quite than the transformed model. As an alternative, the mannequin predicts y by exponentially exponenting our personal operate g(x).

The mannequin minimizes the distinction between precise and predicted Y. On this method, the error time period stays intact on the unique scale.

Conclusion

Log conversion variables are usually not the identical as utilizing log hyperlinks and don’t at all times yield dependable outcomes. Below the hood, the log conversion modifications the variable itself, distorting each fluctuations and noise. Understanding this refined mathematical distinction behind a mannequin is simply as essential as looking for the perfect mannequin.


[1] Epoch AI. Knowledge on excellent AI fashions . Retrieved from https://epoch.ai/data/notable-ai-models

[2] College of Virginia Library. Interpretation of log transformations in linear fashions.Retrieved from https://library.virginia.edu/data/articles/interpreting-log-transformations-in-a-linear-model

banner
Top Selling Multipurpose WP Theme

Converter

Top Selling Multipurpose WP Theme

Newsletter

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

banner
Top Selling Multipurpose WP Theme

Leave a Comment

banner
Top Selling Multipurpose WP Theme

Latest

Best selling

22000,00 $
16000,00 $
6500,00 $
900000,00 $

Top rated

6500,00 $
22000,00 $
900000,00 $

Products

Knowledge Unleashed
Knowledge Unleashed

Welcome to Ivugangingo!

At Ivugangingo, we're passionate about delivering insightful content that empowers and informs our readers across a spectrum of crucial topics. Whether you're delving into the world of insurance, navigating the complexities of cryptocurrency, or seeking wellness tips in health and fitness, we've got you covered.