A DID-based Machine Learning Method of Estimating CATEA DID-

Research Question

Given delivery capacity constraint, 哪些用户应该被优先上门？

From a revenue-management perspective, one criterion would be how responsive is a customer's purchasing behavior to the delivery service.
Along this line of thinking, we should estimate the individual level treatment effect (or conditional average treatment effect, CATE) of last-mile delivery: $\delta(x) = \mathbb{E}\left [ Y(1) - Y(0) | X=x\right ].$ Since the treatment variable in the data could be continuous, we are also interested in the following object: $\delta(x,d) = \mathbb{E}\left [ Y(d) - Y(0) | X=x, D=d\right ]$ , where $d$ is the treatment intensity (like the percentage of the orders being delivered to home), which is continuous.

A DID-based Framework

The standard DiD model in panel regression form: $Y_{it} = \alpha_i + \tau_t + \delta D_{it} + \epsilon_{it}.$ However, this linear model may not be sufficiently flexible for estimating highly nonlinear CATEs Instead, we assume $Y_{it} = \alpha_i + \tau_t(\tilde X_i) + \delta(X_i,D_{it}) + \epsilon_{it},$ where

we allow the treatment effect to depend on a set of individual level features $X_i$
we allow the features and the treatment to interact nonlinearly
we allow the time trend to be individual specific ( $\tilde X_i$ can be a subset of $X_t$ ) This configuration is more robust as the parallel trend assumptions often fail in practice

More on the Model

We assume there are two periods, where $t \in \{ 0, 1\}$ .

For the treated group, we have $\Delta Y^1_i(d) = Y_{i1}(d)-Y_{i0}(0) = \Delta\tau(\tilde X_i) + \delta(X_i,d) + \Delta \epsilon_{i},$ where $\Delta\tau(\tilde X_i) = \tau_1(\tilde X_i)-\tau_0(\tilde X_i)$ and $\Delta \epsilon_{i} = \epsilon_{i1}-\epsilon_{i0}.$
For the control group, we have $\Delta Y^0_i = Y_{i1}(0)-Y_{i0}(0) = \Delta\tau(\tilde X_i) + \delta(X_i,0) + \Delta \epsilon_{i} = \Delta\tau(\tilde X_i) + \Delta \epsilon_{i},$ where it is natural to assume $\delta(X_i,0) = 0$ .

Thus, we have $\mathbb{E} [Y_1(d)-Y_0(0)|X=x,D=d] = \Delta \tau(\tilde x) + \delta(x,d)$ and $\mathbb{E} [Y_1(0)-Y_0(0)|X=x,D=0] = \Delta \tau(\tilde x)$

Taking the difference in differences, we can eliminate the individual specific time trends and get the individual level treatment effects: $\mathbb{E} [Y_1(d)-Y_0(0)|X=x,D=d]-\mathbb{E} [Y_1(0)-Y_0(0)|X=x,D=0] = \delta(x,d)$

Then, we can estimate the right-hand side using any machine learning methods.

Estimation algorithm

take the difference to eliminate the individual level fixed effect $\Delta Y^1_i = Y_{i1}(d)-Y_{i0}(0),$ and $\Delta Y^0_i = Y_{i1}(0)-Y_{i0}(0)$
matching the treated units with the control units based on feature $\tilde X_i$ . For example, one can use KNN matching
calculate the difference in differences using the matched sample $\Delta Y^1_i - \widehat{\Delta Y^0_i}$
regress $\Delta Y^1_i - \widehat{\Delta Y^0_i}$ on $(x,d)$ to estimate the conditional average treatment effect $\delta(x,d)$ . This step can be performed with any machine learning methods that are more powerful in prediction than linear regressions.