From a revenue-management perspective, one criterion would be how responsive is a customer's purchasing behavior to the delivery service.
Along this line of thinking, we should estimate the individual level treatment effect (or conditional average treatment effect, CATE) of last-mile delivery:
δ(x)=E[Y(1)−Y(0)∣X=x].
Since the treatment variable in the data could be continuous, we are also interested in the following object:
δ(x,d)=E[Y(d)−Y(0)∣X=x,D=d],
where d is the treatment intensity (like the percentage of the orders being delivered to home), which is continuous.
A DID-based Framework
The standard DiD model in panel regression form:
Yit=αi+τt+δDit+ϵit.
However, this linear model may not be sufficiently flexible for estimating highly nonlinear CATEs
Instead, we assume
Yit=αi+τt(X~i)+δ(Xi,Dit)+ϵit,
where
we allow the treatment effect to depend on a set of individual level features Xi
we allow the features and the treatment to interact nonlinearly
we allow the time trend to be individual specific (X~i can be a subset of Xt)
This configuration is more robust as the parallel trend assumptions often fail in practice
More on the Model
We assume there are two periods, where t∈{0,1}.
For the treated group, we have
ΔYi1(d)=Yi1(d)−Yi0(0)=Δτ(X~i)+δ(Xi,d)+Δϵi,
where Δτ(X~i)=τ1(X~i)−τ0(X~i) and Δϵi=ϵi1−ϵi0.
For the control group, we have
ΔYi0=Yi1(0)−Yi0(0)=Δτ(X~i)+δ(Xi,0)+Δϵi=Δτ(X~i)+Δϵi,
where it is natural to assume δ(Xi,0)=0.
Thus, we have
E[Y1(d)−Y0(0)∣X=x,D=d]=Δτ(x~)+δ(x,d) and
E[Y1(0)−Y0(0)∣X=x,D=0]=Δτ(x~)
Taking the difference in differences, we can eliminate the individual specific time trends and get the individual level treatment effects:
E[Y1(d)−Y0(0)∣X=x,D=d]−E[Y1(0)−Y0(0)∣X=x,D=0]=δ(x,d)
Then, we can estimate the right-hand side using any machine learning methods.
Estimation algorithm
take the difference to eliminate the individual level fixed effect
ΔYi1=Yi1(d)−Yi0(0), and
ΔYi0=Yi1(0)−Yi0(0)
matching the treated units with the control units based on feature X~i.
For example, one can use KNN matching
calculate the difference in differences using the matched sample
ΔYi1−ΔYi0
regress ΔYi1−ΔYi0 on (x,d) to estimate the conditional average treatment effect δ(x,d). This step can be performed with any machine learning methods that are more powerful in prediction than linear regressions.