Decompositions and Dueto reports are extremely useful for
They are typically presented as bar or waterfall charts.
Because it measures all contributions on the same units (dollars) and can be arbitrarily aggregated, they are much easier to interpret than raw regression results.
Modern software can calculate them automatically.
The most striking thing about the business world is its jargon. It does not have a monopoly on this, since we live in a world of claptrap. Universities, the media, and psychoanalysts are masters of the genre. Still, business jargon is particularly deadly, enough to utterly discourage the workplace hero, the Stakhanovite, lying dormant in you. -- Corinne Maier
Vector notation: $$ f(x_{k}) = a + x'L + h(x) $$
... just don't add the columns together. eg cross out the $1$.
SUM() / GROUP BY
If not using OLS, the above is not justified.
But, combining taylor expansion + chain rule for each element of Y:
$$ y_i = a + \sum x_i \frac{\partial y}{\partial x_i}|_0 + h(x) $$And back to matrix notation: $$ Y = (X \circ \nabla Y) 1_k + h(X) $$
Where $\circ$ is the Hadamard product and $\nabla Y$ is the deriviative of Y wrt to X.
Note: Error term not "nice" anymore. Many (mostly ad-hoc) methods for dealing with them.
Note: Instead of dealing with $\nabla Y$ analytically (annoying and/or hard), use modern software to carry out procedure numerically.
lets back up, go left at Alburque
$$ \frac{\partial f}{\partial t} = \lim_{dt \to 0} \frac{f(x, t + dt) - f(x, t)}{t + dt - t} \approx \frac{f(x, t:=1) - f(x, t:=0)}{1 - 0} $$ $$ \approx I(t=1) \mbox{ decomp} (Y) - I(t=0) \mbox{ decomp} (Y) $$ $$ \approx [I(t=1) - I(t=0)] \mbox{ decomp} (Y) $$
The LADWP may owe another $40 million to consumers after overcharging them years. The FBI raided the LADWP and City Attorney's offices last Monday investigating, bribery, kickbacks, extortion and money laundering. Consumer Watchdog President Jamie Court says there is no doubt crimes were committed and that the consumers are owed millions.
yyyymm | days | tier1 | usage | cpi | AvgTemp | |
---|---|---|---|---|---|---|
0 | 2016-02-15 | 64 | 0.14414 | 1670 | 1.000000 | 56.93 |
1 | 2016-04-15 | 57 | 0.14140 | 473 | 0.999544 | 60.09 |
2 | 2016-06-15 | 61 | 0.13650 | 429 | 1.044552 | 61.70 |
3 | 2016-08-15 | 59 | 0.13883 | 869 | 1.068874 | 70.50 |
4 | 2016-10-15 | 62 | 0.14087 | 693 | 1.070519 | 70.10 |
5 | 2016-12-15 | 62 | 0.14370 | 738 | 1.025528 | 63.71 |
6 | 2017-02-15 | 63 | 0.14729 | 1183 | 1.017346 | 55.64 |
7 | 2017-04-15 | 58 | 0.14922 | 691 | 1.033938 | 59.86 |
8 | 2017-06-15 | 61 | 0.15091 | 473 | 1.074467 | 62.29 |
9 | 2017-08-15 | 59 | 0.14874 | 1161 | 1.072076 | 70.47 |
10 | 2017-10-15 | 62 | 0.14887 | 310 | 1.075681 | 70.35 |
11 | 2017-12-15 | 61 | 0.15276 | 644 | 1.034617 | 64.57 |
Dep. Variable: | tier1 | R-squared: | 0.957 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.950 |
Method: | Least Squares | F-statistic: | 123.4 |
Date: | Tue, 25 Aug 2020 | Prob (F-statistic): | 9.81e-15 |
Time: | 19:08:09 | Log-Likelihood: | 72.840 |
No. Observations: | 27 | AIC: | -135.7 |
Df Residuals: | 22 | BIC: | -129.2 |
Df Model: | 4 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
t | 0.0112 | 0.001 | 15.214 | 0.000 | 0.010 | 0.013 |
heat | 0.0034 | 0.002 | 1.793 | 0.087 | -0.001 | 0.007 |
cool | 0.0018 | 0.002 | 1.074 | 0.295 | -0.002 | 0.005 |
cpi | 1.6116 | 0.163 | 9.865 | 0.000 | 1.273 | 1.950 |
base | -2.0877 | 0.014 | -151.252 | 0.000 | -2.116 | -2.059 |
Omnibus: | 2.320 | Durbin-Watson: | 1.004 |
---|---|---|---|
Prob(Omnibus): | 0.313 | Jarque-Bera (JB): | 1.976 |
Skew: | 0.634 | Prob(JB): | 0.372 |
Kurtosis: | 2.613 | Cond. No. | 625. |
decomp = X.apply(lambda x: x * results.params[x.name])
decomp = X.apply(lambda x: x * numpy.exp(results.fittedvalues) * results.params[x.name])
decomp.head()
t | heat | cool | cpi | base | |
---|---|---|---|---|---|
0 | 0.015383 | 0.000000 | 0.002034 | 0.000000 | 0.123987 |
1 | 0.017194 | 0.000000 | 0.001245 | -0.000104 | 0.123987 |
2 | 0.005322 | 0.000000 | 0.000824 | 0.009765 | 0.123987 |
3 | 0.000000 | 0.002647 | 0.000000 | 0.015097 | 0.123987 |
4 | 0.001123 | 0.002477 | 0.000000 | 0.015585 | 0.123987 |
yearly_decomp = ( decomp.drop(columns="yyyymm").
multiply(df['usage'], axis='index').
groupby(df['yyyy']).
mean())
yoy_dueto = yearly_decomp.loc[4] - yearly_decomp.loc[3]
def log_log_builder():
model = Sequential([
Lambda(tensorflow.keras.backend.log),
Dense(DIM, input_shape=(X.shape[1], ), kernel_initializer='normal', activation='relu'),
Dense(Y.shape[1]),
Lambda(tensorflow.keras.activations.exponential, output_shape=[2])
])
model.compile(loss=scaled_mse, optimizer='adam')
return model
estimator = KerasRegressor(build_fn=log_log_builder, epochs=10000, batch_size=20, verbose=0)
estimator.fit(X, Y)
yhat = estimator.predict(X,)
with tensorflow.GradientTape() as tape:
Xk = tensorflow.keras.backend.constant(X, shape=X.shape)
tape.watch(Xk)
yhat = estimator.model(Xk[:,:])
derivs = tape.jacobian(yhat, Xk)
derivs = derivs.numpy()
(27, 2, 27, 4)
t | days | AvgTemp | cpi | |
---|---|---|---|---|
0 | 0.001541 | -0.033964 | -0.011730 | 0.145384 |
1 | 0.001603 | -0.035326 | -0.012201 | 0.151212 |
2 | 0.001675 | -0.036910 | -0.012748 | 0.157993 |
3 | 0.001738 | -0.038289 | -0.013224 | 0.163895 |
4 | 0.001740 | -0.038335 | -0.013240 | 0.164094 |
t | days | AvgTemp | cpi | |
---|---|---|---|---|
22 | 6.159162 | 707.881206 | 305.696271 | -2884.599848 |
23 | 7.097186 | 815.689652 | 352.253009 | -3323.916817 |
24 | 39.473219 | 664.365496 | 272.278571 | -2494.512985 |
25 | 39.553072 | 665.709491 | 272.829384 | -2499.559326 |
26 | 40.253026 | 677.490268 | 277.657530 | -2543.792962 |
total_bill = KerasRegressor(lambda: None)
total_bill.model = Sequential( estimator.model.layers[:-1] + [
Lambda(lambda x: tensorflow.keras.activations.exponential(x[:,0] + x[:,1]), output_shape=[1])
])
decomp_bill = (derivs * X).groupby(df['yyyy']).mean()
dueto_keras = decomp_bill.loc[2020] - decomp_bill.loc[2019]
Decompositions and Dueto reports are extremely useful for
They are typically presented as bar or waterfall charts.
Because it measures all contributions on the same units (dollars) and can be arbitrarily aggregated, they are much easier to interpret than raw regression results.
Modern software can calculate them automatically.
# Questions ?
This is a great question that I don't have a good answer for.
Additive decompositions have been in Mathematics for a long time, but in the modern sense, popped up in Economics in the postwar era as people tried to figure out how inflation indexes were behaving, and then spread to psychology via quantitative marketing.
For Duetos, it's unclear unclear to me who invented them, who named them, or when or where; I learned on the job at a company with ex-Deloitte founders.
In Explanability / Fairness, decompose "contribution" for a specific point for each variable, by inspecting the partial (directional) derivative at the point.
Decomps and duetos take the derivative at the reference group / origin / time 0 for all points and aggregate.
Easy: Propogate error, incorporate into plots
Medium: Extend the keras model to incorporate Tier 2 and Tier 3 pricing. (Hint: Use ReLU layers)
Medium: Incorporate time series components into trend, seasonality as in Hyndman
Impossible: Interpret cable bill
In OLS, if you have any Y error remaining and unallocated, add the group to the set of predictors.
For the nonlinear case, the easiest way is calculate the residual, and carry it around.
log1p
and other sign preserving transforms and activation functions.Jupyter Lab does not embed custom.css
(issue 8793), but you can manually dump style rules into
the notebook.
from IPython.core.display import display, HTML
display(HTML(f'<style>{_css}</style>'))
For logos, use base64 dataurls inside the CSS.