Using contextual MAB’s to maximize community census engagement
Adding context and classifiers to traditional multi-armed-bandits.
data science
Published
June 8, 2020
The 2020 Census
The year is 2020 and while the Coronavirus pandemic has demanded the attention of our nation, there has been another important event taking place in the background. This year marks the 24th time that the United States will conduct a nation-wide census, aimed at counting each and every person in the U.S., and its five territories. Though it has not garnered much media attention, the numbers aggregated in the census responses form a working model of our nation’s population, upon which governing leaders base their decisions about pertinent issues like how to distribute local government funding, and how many seats each state has in the House of Representatives.
Collecting survey data on this scale is inherently challenging, and the obstacle has only been compounded in the wake of the epidemic and the decreasing level of trust that Americans have in their government. For this reason, local governing authorities have been working hard to get the word out, and improve census response rates on a community-by-community level.
One such technique that could be used to increase community awareness and engagement with the census is an email campaign. An integral piece to any email marketing campaign is the messaging used. The reality is that even in small communities, there may be sub-regions that respond to email messaging differently than others. So the question is
Important
How can local governing authorities use data to identify the optimal messaging to maximize census response rates on a community level?
In this post, I will be proposing the use of contextual multi-armed bandits to identify the messaging that leads to maximal census response rates per community.
Code
class MAB:""" Base Multi-Armed Bandit class. """def__init__(self, slots: List[scipy.stats.bernoulli]):self.slots = slotsself.k =len(self.slots)self.choices = [list(0for _ inrange(self.k))]self.rewards = [list(0for _ inrange(self.k))]# initialize estimated probs uniformlyself.estimated_probs = [[1/len(slots) for _ inrange(self.k)]]def pull(self, t):# needs to be implemented individually for each type of MABpassdef update_history(self, arm, reward): choice = [1if arm == i else0for i inrange(self.k)]self.choices.append(choice) r = [reward if arm == i else0for i inrange(self.k)]self.rewards.append(r)returnNonedef update_probs(self, t): estimated_prob = []for slot inrange(self.k): slot_choices = [c[slot] for c inself.choices] slot_rewards = [r[slot] for r inself.rewards] estimated_prob.append((np.sum(slot_rewards) / (np.sum(slot_choices) +1)))self.estimated_probs.append(estimated_prob)returnNonedef play(self, n):for t inrange(n): arm, reward =self.pull(t)self.update_probs(t)
Epsilon-greedy Multi-Armed-Bandits
Note
Before we start adding context to our multi-armed-bandits, let’s begin by introducing how one might use what are referred to as ε-greedy MABs to identify the optimal email messaging.
Why not just use A/B testing?
If, for example, we were solely interested in performing research to answer the question of which message evokes the maximal census response rate per community, we would likely reach for a randomized control trial, which (assuming we can collect data at the necessary scale) would enable us to leverage statistical rigor in conclusively determining which email message is optimal, and potentially even shed light on the causal relationship between certain messages and a citizen’s propensity to respond to the census.
In a vacuum, this sounds perfect. In the real-world, unfortunately, there is one huge downside. That is, in order to conclusively determine the optimal message to a degree of statistical certainty, we would be required to show the sub-optimal message(s) to many citizens, potentially wreaking havoc on the census results. This is a classic problem known as the exploration-exploitation tradeoff.
In order to balance the objective of maximizing census response rates, while simultaneously exploring how citizens respond to different messaging, we could use a multi-armed-bandit (from here forward, referred to as MABs).
In the MAB framework, each potential email message can be viewed as a candidate arm that our “bandit” could pull upon deciding which email to send to each citizen.
Let’s assume that a small town’s local government has developed two potential email messages aimed at improving census response rates.
The first emphasizes the citizen’s civic duty in participating in the census, in addition to the long-term implications of its results.
The second hones in on the data privacy and security measures that are in place to ensure each citizens rights are respected.
Our ε-greedy MAB will begin by randomly choosing which message to send out, then as citizens begin to responding to the census, it will show the message with the best response rate 1-ε% of the time, then show the other message in the remaining ε% of emails.
Code
class EpsilonGreedy(MAB):def__init__(self, slots: List[scipy.stats.bernoulli], epsilon: float, epsilon_decay: bool=False, ):""" inputs: slots: list of bernoulli random variables representing each message epsilon: float, where exploit with prob 1-ε epsilon_decay: bool, whether or not to decrease epsilon over time """super().__init__(slots)self.epsilon = epsilonself.exploit = (lambda t: bernoulli(1- (self.epsilon / np.log(t +2)))if epsilon_decayelse bernoulli(1-self.epsilon) )def pull(self, t):ifself.exploit(t).rvs():# pull from argmax of estimated probs arm = np.argmax(self.estimated_probs[t]) reward =self.slots[arm].rvs()self.update_history(arm, reward)else:# pull from random arm arm = np.random.randint(0, len(self.slots)) reward =self.slots[arm].rvs()self.update_history(arm, reward)return arm, reward
Let’s imagine that for this small town, on average the civic-duty message yields census responses 80% of the time, and the data-privacy message evokes a response 30% of the time.
The chart above is dynamic. Hover your cursor over the chart to interact.
From the plot above, we can see that our MAB gained a fairly stable estimate of the civic-duty message’s response rate after about 300 emails, yet it took a full 800 emails to get an accurate approximation of the data-privacy message’s response rate. While a completely random email allocation strategy would likely have benefitted us with more accurate approximations of response rates for each message in a fraction of the time, the approach inherent to the MAB allowed us to show the civic-duty message to the majority of folks, generating a much larger citizen turn out.
Contextual MABs
It may very well be the case, however, that certain subregions within this small town will respond to email messaging differently. Let’s assume that by using email metadata, you are able to locate census respondents as being from either the East or West side of town. For illustrative purposes, let’s also assume that about 75% of the town’s population resides on the Eastern side, while only 25% lives on the West side (lopsided, I know).
Though in general the civic-duty message illicits about 80% response rate, and the data-privacy message receives about a 30% response rate, these statistics are not particularly helpful in resolving whether or not the East and West sides of town prefer different messaging. Let’s suppose that the probability a citizen in the town responds to a certain message can be summarized in the table below.
Civic Duty
Data Privacy
East
0.9
0.177
West
0.5
0.67
Note
The aggregate statistics remain true, as we can verify by carrying out a weighted average of the message response rates per region. \((0.9\cdot0.75) + (0.5\cdot0.25)=0.8\) and \((0.177\cdot0.75) + (0.67\cdot0.25)\approx 0.3\)
As you’ll notice, given these numbers, the optimal message for residents on the East side is overwhelmingly the civic-duty message, while those from the West side prefer that data-privacy message. This is a stark difference, but one that our ε-greedy MAB had no way of capturing. For this reason, we now introduce the contextual MAB, which fits a predictive model using historical context (ie, which side of town the email recipient was on) as well as whether or not the recipient responded to the census to approximate the response rate per message per community, as opposed to simply calculating historical frequencies.
Let’s code it up
Code
from sklearn.linear_model import LogisticRegressionfrom scipy.stats import randint
As alluded to, the “context” will refer to which side of town a given email recipient is from, and the predictive model will be a simple logistic regression model. Additionally, we will encode the context and “reward” (ie whether or not the recipient responds to the email) with the following two functions, which will be useful to simulate the delivery of emails.
Code
def context_fn():""" Returns a 0 25% of the time (East side) Returns a 1 75% of the time (West side) """ context = bernoulli(p=0.75).rvs()return contextdef reward_fn(context: int, message: int):# from the West sideif context:# data-privacy messageif message: p =0.67# civic-duty messageelse: p =0.5# from the East sideelse:# data-privacy messageif message: p =0.177# civic duty messageelse: p =0.9return bernoulli(p).rvs()
And finally, it is time to translate our contextual MAB into code. At it’s core, it’s nearly identical to the ε-greedy implementation, except it now fits a logistic regression model for each message at each update. I did include some additional utilites for easier plotting. I’d encourage you to skim over each function, but don’t get bogged down in the details.
Code
class ContextualEpsilonGreedy:def__init__(self, context_fn: Callable[[], int], reward_fn: Callable[[int, int], int], epsilon: float, contexts: List[str], arms: List[str], k: float=2, epsilon_decay: bool=False, ):""" inputs: context_fn: function returning binary context reward_fn: function returning binary reward, given context and message epsilon: float, where exploit with prob 1-ε contexts: names of each context arms: names of each message k: number of messages epsilon_decay: bool, whether or not to decrease epsilon over time """# utilites to run simulationsself.context_fn = context_fnself.reward_fn = reward_fnself.epsilon = epsilonself.exploit = (lambda t: bernoulli(1- (self.epsilon / np.log(t +2)))if epsilon_decayelse bernoulli(1-self.epsilon) )# contexts/arms are helpful for saving history appropriatelyself.contexts = contextsself.arms = armsself.k =len(self.arms)# instantiating one model per message and historyself.models = [ LogisticRegression(random_state=742, solver="lbfgs") for _ inrange(self.k) ]self.history = pd.DataFrame(columns=["context", "arm", "reward"], dtype=int)# dataframe to save history of models' predicted probsself.context_arm_combos = [f"{context}, {arm}"for context inself.contexts for arm inself.arms ]self.estimated_probs = pd.DataFrame(dict(**{"time step": [-1]},**{context_arm: 1/self.k for context_arm inself.context_arm_combos}, ), dtype=float, )def pull(self, t, context):ifself.exploit(t).rvs() and t >0:# pull from argmax of estimated probs contexts_arms = [ context_armfor context_arm inself.context_arm_combosifself.contexts[context] in context_arm ] probs_given_context =self.estimated_probs[contexts_arms].iloc[-1].values arm = probs_given_context.argmax() reward =self.reward_fn(context, arm)else:# pull from random arm arm = np.random.randint(0, self.k) reward =self.reward_fn(context, arm)return arm, rewarddef update_history(self, context, arm, reward):self.history =self.history.append( {"context": context, "arm": arm, "reward": reward}, ignore_index=True )returnNonedef update_probs(self, t): updated_probs = {"time step": t}# ensuring there has been at least one of each arm-reward comboifself.history.groupby(["arm", "reward"]).size().shape[0] ==4:# re-training each message's modelfor a, arm inenumerate(self.arms): data =self.history.loc[(self.history["arm"] == a)] X = np.array(data[["context"]]) y = np.array(data["reward"])self.models[a].fit(X, y)for c, context inenumerate(self.contexts): updated_probs[f"{context}, {arm}"] = cmab.models[a].predict_proba( np.array([[c]]) )[0, 1]else:for arm inself.arms:for context inself.contexts: updated_probs[f"{context}, {arm}"] =1/self.k# updating history of model's predictionsself.estimated_probs =self.estimated_probs.append( updated_probs, ignore_index=True )returnNonedef play(self, n):# simulating n pulls from the MABfor t inrange(n): context =self.context_fn() arm, reward =self.pull(t, context)self.update_history(context, arm, reward)self.update_probs(t)
At this point, we have all the tools necessary to simulate the delivery of emails. Below, we will instantiate our contextual MAB, then have it shoot out 1,000 emails, according to our ε-greedy policy.
# instantiating and simulating 1,000 emails sent from contextual MABcmab = ContextualEpsilonGreedy(context_fn, reward_fn, epsilon=.1, contexts=["East", "West"], arms=["Civic duty", "Data privacy"])cmab.play(1_000)# plotting estimated probs for each context/messageplot_cmab_probs(cmab)
Using the email recipient’s location, our contextual MAB was able to get a reliable estimate of the traction associated with each message on each side of town. This sort of analysis balances the exploration-exploitation tradeoff in a way that is both practical (delivering the optimal message the majority of the time), and informative to local governing authorities!
A few notes
Notice, it takes many more emails to get an accurate estimate of the conversion probabilities for location-message combinations that perform poorly, because we do not explore them as much.
In this case, we only leverage a single feature, location, and use a simple logistic regression model. In a real-world scenario, it is likely that we would have access to more features. It is straightforward to extend this implementation to handle more features and more complex models.
Conclusion
In this post, I outlined a technique to identify the optimal messaging to include in an email campaign in order to improve census response rates on a community level, using contextual multi-armed bandits. Though this concludes the post, there are many extensions we could add to the approaches proposed in this article to optimize messaging delivery.
Possible extensions
Instead of using point estimates for the coefficients in the logistic regression models associated with each message, we could model each as its own distribution, and leverage thompson sampling as Guilherme Duarte Marmerola does in his awesome blog post.
In order to find the optimal ε to use in our ε-greedy algorithms, we could carry out a more formal regret analysis to balance the tradeoff between exploration and exploitation.
In the algorithms implemented above, we update our models after each new data point. In reality, we would likely want to do some sort of batching.
Assumptions
We have assumed that we immediately know whether the email recipient responded to the census upon sending the email. In reality, this process is much more lagged, and attributing the census response to an email is more involved. Attribution is a challenging problem, and one that we ignored for simplicity in the examples above.