Using contextual MAB’s to maximize community census engagement

Adding context and classifiers to traditional multi-armed-bandits.
data science
Published

June 8, 2020

The 2020 Census

The year is 2020 and while the Coronavirus pandemic has demanded the attention of our nation, there has been another important event taking place in the background. This year marks the 24th time that the United States will conduct a nation-wide census, aimed at counting each and every person in the U.S., and its five territories. Though it has not garnered much media attention, the numbers aggregated in the census responses form a working model of our nation’s population, upon which governing leaders base their decisions about pertinent issues like how to distribute local government funding, and how many seats each state has in the House of Representatives.

Collecting survey data on this scale is inherently challenging, and the obstacle has only been compounded in the wake of the epidemic and the decreasing level of trust that Americans have in their government. For this reason, local governing authorities have been working hard to get the word out, and improve census response rates on a community-by-community level.

One such technique that could be used to increase community awareness and engagement with the census is an email campaign. An integral piece to any email marketing campaign is the messaging used. The reality is that even in small communities, there may be sub-regions that respond to email messaging differently than others. So the question is

Important

How can local governing authorities use data to identify the optimal messaging to maximize census response rates on a community level?

In this post, I will be proposing the use of contextual multi-armed bandits to identify the messaging that leads to maximal census response rates per community.

Code
class MAB:
    """
    Base Multi-Armed Bandit class.
    """

    def __init__(self, slots: List[scipy.stats.bernoulli]):
        self.slots = slots
        self.k = len(self.slots)
        self.choices = [list(0 for _ in range(self.k))]
        self.rewards = [list(0 for _ in range(self.k))]
        # initialize estimated probs uniformly
        self.estimated_probs = [[1 / len(slots) for _ in range(self.k)]]

    def pull(self, t):
        # needs to be implemented individually for each type of MAB
        pass

    def update_history(self, arm, reward):
        choice = [1 if arm == i else 0 for i in range(self.k)]
        self.choices.append(choice)
        r = [reward if arm == i else 0 for i in range(self.k)]
        self.rewards.append(r)
        return None

    def update_probs(self, t):
        estimated_prob = []
        for slot in range(self.k):
            slot_choices = [c[slot] for c in self.choices]
            slot_rewards = [r[slot] for r in self.rewards]
            estimated_prob.append((np.sum(slot_rewards) / (np.sum(slot_choices) + 1)))
        self.estimated_probs.append(estimated_prob)
        return None

    def play(self, n):
        for t in range(n):
            arm, reward = self.pull(t)
            self.update_probs(t)

Epsilon-greedy Multi-Armed-Bandits

Note

Before we start adding context to our multi-armed-bandits, let’s begin by introducing how one might use what are referred to as ε-greedy MABs to identify the optimal email messaging.

Why not just use A/B testing?

If, for example, we were solely interested in performing research to answer the question of which message evokes the maximal census response rate per community, we would likely reach for a randomized control trial, which (assuming we can collect data at the necessary scale) would enable us to leverage statistical rigor in conclusively determining which email message is optimal, and potentially even shed light on the causal relationship between certain messages and a citizen’s propensity to respond to the census.

In a vacuum, this sounds perfect. In the real-world, unfortunately, there is one huge downside. That is, in order to conclusively determine the optimal message to a degree of statistical certainty, we would be required to show the sub-optimal message(s) to many citizens, potentially wreaking havoc on the census results. This is a classic problem known as the exploration-exploitation tradeoff.

In order to balance the objective of maximizing census response rates, while simultaneously exploring how citizens respond to different messaging, we could use a multi-armed-bandit (from here forward, referred to as MABs).

Let’s code it up

Tip

If you’re unfamiliar with MABs, I’d recommend checking out this short introduction from Optimizely before proceeding.

In the MAB framework, each potential email message can be viewed as a candidate arm that our “bandit” could pull upon deciding which email to send to each citizen.

Let’s assume that a small town’s local government has developed two potential email messages aimed at improving census response rates.

  1. The first emphasizes the citizen’s civic duty in participating in the census, in addition to the long-term implications of its results.

  2. The second hones in on the data privacy and security measures that are in place to ensure each citizens rights are respected.

Our ε-greedy MAB will begin by randomly choosing which message to send out, then as citizens begin to responding to the census, it will show the message with the best response rate 1-ε% of the time, then show the other message in the remaining ε% of emails.

Code
class EpsilonGreedy(MAB):
    def __init__(
        self,
        slots: List[scipy.stats.bernoulli],
        epsilon: float,
        epsilon_decay: bool = False,
    ):
        """
        inputs:
            slots: list of bernoulli random variables representing each message
            epsilon: float, where exploit with prob 1-ε
            epsilon_decay: bool, whether or not to decrease epsilon over time
        """
        super().__init__(slots)
        self.epsilon = epsilon
        self.exploit = (
            lambda t: bernoulli(1 - (self.epsilon / np.log(t + 2)))
            if epsilon_decay
            else bernoulli(1 - self.epsilon)
        )

    def pull(self, t):
        if self.exploit(t).rvs():
            # pull from argmax of estimated probs
            arm = np.argmax(self.estimated_probs[t])
            reward = self.slots[arm].rvs()
            self.update_history(arm, reward)
        else:
            # pull from random arm
            arm = np.random.randint(0, len(self.slots))
            reward = self.slots[arm].rvs()
            self.update_history(arm, reward)
        return arm, reward

Let’s imagine that for this small town, on average the civic-duty message yields census responses 80% of the time, and the data-privacy message evokes a response 30% of the time.

Civic Duty Data Privacy
probability of response 0.8 0.3
# initializing candidate messages
civic_duty = bernoulli(0.8)
data_privacy = bernoulli(0.3)

# initializing MAB and simulating 1,000 emails
eg = EpsilonGreedy(slots=[civic_duty, data_privacy], epsilon=0.2)
eg.play(1_000)

# plotting estimated probabilites
plot_eg_probs(eg)
Note

The chart above is dynamic. Hover your cursor over the chart to interact.

From the plot above, we can see that our MAB gained a fairly stable estimate of the civic-duty message’s response rate after about 300 emails, yet it took a full 800 emails to get an accurate approximation of the data-privacy message’s response rate. While a completely random email allocation strategy would likely have benefitted us with more accurate approximations of response rates for each message in a fraction of the time, the approach inherent to the MAB allowed us to show the civic-duty message to the majority of folks, generating a much larger citizen turn out.

Contextual MABs

It may very well be the case, however, that certain subregions within this small town will respond to email messaging differently. Let’s assume that by using email metadata, you are able to locate census respondents as being from either the East or West side of town. For illustrative purposes, let’s also assume that about 75% of the town’s population resides on the Eastern side, while only 25% lives on the West side (lopsided, I know).

Though in general the civic-duty message illicits about 80% response rate, and the data-privacy message receives about a 30% response rate, these statistics are not particularly helpful in resolving whether or not the East and West sides of town prefer different messaging. Let’s suppose that the probability a citizen in the town responds to a certain message can be summarized in the table below.

Civic Duty Data Privacy
East 0.9 0.177
West 0.5 0.67
Note

The aggregate statistics remain true, as we can verify by carrying out a weighted average of the message response rates per region. \((0.9\cdot0.75) + (0.5\cdot0.25)=0.8\) and \((0.177\cdot0.75) + (0.67\cdot0.25)\approx 0.3\)

As you’ll notice, given these numbers, the optimal message for residents on the East side is overwhelmingly the civic-duty message, while those from the West side prefer that data-privacy message. This is a stark difference, but one that our ε-greedy MAB had no way of capturing. For this reason, we now introduce the contextual MAB, which fits a predictive model using historical context (ie, which side of town the email recipient was on) as well as whether or not the recipient responded to the census to approximate the response rate per message per community, as opposed to simply calculating historical frequencies.

Let’s code it up

Code
from sklearn.linear_model import LogisticRegression
from scipy.stats import randint

As alluded to, the “context” will refer to which side of town a given email recipient is from, and the predictive model will be a simple logistic regression model. Additionally, we will encode the context and “reward” (ie whether or not the recipient responds to the email) with the following two functions, which will be useful to simulate the delivery of emails.

Code
def context_fn():
    """
    Returns a 0 25% of the time (East side)
    Returns a 1 75% of the time (West side)
    """
    context = bernoulli(p=0.75).rvs()
    return context


def reward_fn(context: int, message: int):
    # from the West side
    if context:
        # data-privacy message
        if message:
            p = 0.67
        # civic-duty message
        else:
            p = 0.5
    # from the East side
    else:
        # data-privacy message
        if message:
            p = 0.177
        # civic duty message
        else:
            p = 0.9
    return bernoulli(p).rvs()

And finally, it is time to translate our contextual MAB into code. At it’s core, it’s nearly identical to the ε-greedy implementation, except it now fits a logistic regression model for each message at each update. I did include some additional utilites for easier plotting. I’d encourage you to skim over each function, but don’t get bogged down in the details.

Code
class ContextualEpsilonGreedy:
    def __init__(
        self,
        context_fn: Callable[[], int],
        reward_fn: Callable[[int, int], int],
        epsilon: float,
        contexts: List[str],
        arms: List[str],
        k: float = 2,
        epsilon_decay: bool = False,
    ):
        """
        inputs:
            context_fn: function returning binary context
            reward_fn: function returning binary reward, given context and message
            epsilon: float, where exploit with prob 1-ε
            contexts: names of each context
            arms: names of each message
            k: number of messages
            epsilon_decay: bool, whether or not to decrease epsilon over time
        """
        # utilites to run simulations
        self.context_fn = context_fn
        self.reward_fn = reward_fn
        self.epsilon = epsilon
        self.exploit = (
            lambda t: bernoulli(1 - (self.epsilon / np.log(t + 2)))
            if epsilon_decay
            else bernoulli(1 - self.epsilon)
        )

        # contexts/arms are helpful for saving history appropriately
        self.contexts = contexts
        self.arms = arms
        self.k = len(self.arms)

        # instantiating one model per message and history
        self.models = [
            LogisticRegression(random_state=742, solver="lbfgs") for _ in range(self.k)
        ]
        self.history = pd.DataFrame(columns=["context", "arm", "reward"], dtype=int)

        # dataframe to save history of models' predicted probs
        self.context_arm_combos = [
            f"{context}, {arm}" for context in self.contexts for arm in self.arms
        ]
        self.estimated_probs = pd.DataFrame(
            dict(
                **{"time step": [-1]},
                **{context_arm: 1 / self.k for context_arm in self.context_arm_combos},
            ),
            dtype=float,
        )

    def pull(self, t, context):
        if self.exploit(t).rvs() and t > 0:
            # pull from argmax of estimated probs
            contexts_arms = [
                context_arm
                for context_arm in self.context_arm_combos
                if self.contexts[context] in context_arm
            ]
            probs_given_context = self.estimated_probs[contexts_arms].iloc[-1].values
            arm = probs_given_context.argmax()
            reward = self.reward_fn(context, arm)
        else:
            # pull from random arm
            arm = np.random.randint(0, self.k)
            reward = self.reward_fn(context, arm)
        return arm, reward

    def update_history(self, context, arm, reward):
        self.history = self.history.append(
            {"context": context, "arm": arm, "reward": reward}, ignore_index=True
        )
        return None

    def update_probs(self, t):
        updated_probs = {"time step": t}

        # ensuring there has been at least one of each arm-reward combo
        if self.history.groupby(["arm", "reward"]).size().shape[0] == 4:

            # re-training each message's model
            for a, arm in enumerate(self.arms):
                data = self.history.loc[(self.history["arm"] == a)]
                X = np.array(data[["context"]])
                y = np.array(data["reward"])
                self.models[a].fit(X, y)
                for c, context in enumerate(self.contexts):
                    updated_probs[f"{context}, {arm}"] = cmab.models[a].predict_proba(
                        np.array([[c]])
                    )[0, 1]
        else:
            for arm in self.arms:
                for context in self.contexts:
                    updated_probs[f"{context}, {arm}"] = 1 / self.k

        # updating history of model's predictions
        self.estimated_probs = self.estimated_probs.append(
            updated_probs, ignore_index=True
        )
        return None

    def play(self, n):
        # simulating n pulls from the MAB
        for t in range(n):
            context = self.context_fn()
            arm, reward = self.pull(t, context)
            self.update_history(context, arm, reward)
            self.update_probs(t)

At this point, we have all the tools necessary to simulate the delivery of emails. Below, we will instantiate our contextual MAB, then have it shoot out 1,000 emails, according to our ε-greedy policy.

# instantiating and simulating 1,000 emails sent from contextual MAB
cmab = ContextualEpsilonGreedy(context_fn, reward_fn, epsilon=.1, contexts=["East", "West"], arms=["Civic duty", "Data privacy"])
cmab.play(1_000)

# plotting estimated probs for each context/message
plot_cmab_probs(cmab)

Using the email recipient’s location, our contextual MAB was able to get a reliable estimate of the traction associated with each message on each side of town. This sort of analysis balances the exploration-exploitation tradeoff in a way that is both practical (delivering the optimal message the majority of the time), and informative to local governing authorities!

A few notes

  • Notice, it takes many more emails to get an accurate estimate of the conversion probabilities for location-message combinations that perform poorly, because we do not explore them as much.
  • In this case, we only leverage a single feature, location, and use a simple logistic regression model. In a real-world scenario, it is likely that we would have access to more features. It is straightforward to extend this implementation to handle more features and more complex models.

Conclusion

In this post, I outlined a technique to identify the optimal messaging to include in an email campaign in order to improve census response rates on a community level, using contextual multi-armed bandits. Though this concludes the post, there are many extensions we could add to the approaches proposed in this article to optimize messaging delivery.

Possible extensions

  • Instead of using point estimates for the coefficients in the logistic regression models associated with each message, we could model each as its own distribution, and leverage thompson sampling as Guilherme Duarte Marmerola does in his awesome blog post.
  • In order to find the optimal ε to use in our ε-greedy algorithms, we could carry out a more formal regret analysis to balance the tradeoff between exploration and exploitation.
  • In the algorithms implemented above, we update our models after each new data point. In reality, we would likely want to do some sort of batching.

Assumptions

  • We have assumed that we immediately know whether the email recipient responded to the census upon sending the email. In reality, this process is much more lagged, and attributing the census response to an email is more involved. Attribution is a challenging problem, and one that we ignored for simplicity in the examples above.