Part 2: Technical Definitions and (Some) Implementation

In my last post, we introduced the concept of optimizing an AirBnB listing using a statistical technique called Multi-Armed Bandits. In this post, we’ll refine the problem with more technical language and implement a bandit with Python that can interact with the AirBnB listing. But first, some definitions:

  • Message: the content that gets displayed to users.
  • Variables: the various “knobs” we can turn to alter the message— title, description, photos, captions, pricing… etc.
  • Variant: a particular combination of variable settings present in the message. In the language of multi-armed bandits, a variant is also called an arm.
  • Reward: a measurable (and possibly noisy) signal indicating the result of a variant’s application.
  • Context: variables we observe every trial that may impact a user’s behavior—day of the week, month of the year, etc.
  • Agent: the algorithm that decides which variant to present at the start of each trial. We’ll be using contextual bandits. Technically, the agent is responsible for learning a policy, which maps states to actions.
  • Trial: When a user encounters the listing (with a variant of the message selected by the agent) and the reward is measured.

Our goal is to develop an agent that learns from past trials to select the variant with the highest expected reward.

But What Are We Actually Optimizing?

In order to make this setup work, we need to we have a couple things:

  1. Variables that actually impact the reward and can be altered at the start of a trial
  2. A measurable reward signal correlated to something we care about (has specific business value)

This requires us to understand the guest “journey” as they interact with AirBnB and decide to spend their money or not—as well as understanding the metrics and key performance indicators that hosts use to evaluate listing performance.

Based on my own use of AirBnB for personal and work travel, a typical user’s experience follows this pattern:

  1. Open the AirBnB app or website
  2. Enter some search parameters (e.g. travel date and number of guests)
  3. Select some filter parameters (e.g. amenities, instant-book, superhost)
  4. Scroll through the results and maybe thumb through some photos
  5. Click on listings that catch my eye, thumb through their photos and read the listing description and host bio
  6. “Favorite” my top candidates
  7. Review the candidates and book one of them
  8. Message the host if necessary
Android app: Search Filters → Results → Listing Details

Remember, we are searching for knobs we can turn for my listing that will influence the user’s search and booking experience. A few things already catch my eye:

  • The first impression a user forms about my listing occurs when they see its card in the search results, which only contains the listing title, cover photo, and nightly price. These will be our three most important variables.
  • The room type and city are displayed above the listing title. This probably makes the word “private” in my title redundant. A variant without that word is something our agent could test!
  • Hosting forums often claim that Instant Booking (allowing a guest to book without host approval) improves a listing’s search placement and booking rate. We could test variants that toggle it.
  • A guest has to click + scroll a bit more to find the room description and host bio. While these variables could still be quite influential to our guest, I don’t think they should take priority.

Metrics are things that are measurable, like the number of views my listing gets each day. Key Performance Indicators (KPIs) are metrics or functions of metrics that inform how well the project is meeting its objectives. For me, the top KPIs are average monthly revenue and revenue variance. Here’s what AirBnB gives us on the host performance dashboard:

The AirBnB host performance dashboard

Let’s review some candidate metrics we could use for our reward signal, starting with the three displayed on the AirBnB host performance dashboard:

  • 30-Day Views: I assume this means “how many users clicked into our listing page”, either from web or mobile. From our study of the user conversion funnel, we know that only the listing title, cover photo, and nightly price are displayed to a user before they might do that (if they arrive from the search page). So those are the only three variables we’d get to optimize if we used views as our reward. More importantly, I don’t care about views. I care about consistent monthly revenue. While it’s true that a user can’t book the listing without viewing it, increased viewership is no guarantee of increased booking revenue.
  • 30-Day Bookings: The number of bookings is more directly associated with revenue, but it doesn’t tell me how much each booking is worth. If our agent drops the nightly price to $1, we’d have a ton of bookings but little revenue.
  • Booking Rate: It’s just n_bookings / n_views. This metric seems to capture how well the message in my listing details “sells” my listing to a user.

One easy-to-miss problem: see in the bottom left where it says “Data may be delayed up to 3 days”? That means we must design our agent to handle delayed rewards if we scrape these metrics.

So far, these metrics are pretty lame. We will need to find more granular metrics hidden within the AirBnB API if we are going to make this work.

Reverse-Engineering AirBnB’s API

AirBnB doesn’t expose their official API to little fish like me, so I needed to find another way to programmatically interact with my listing. Fortunately, the age-old trick of monitoring HTTP traffic from my browser as I clicked around my hosting page turned up the appropriate endpoints for PUTting changes to my listing settings and GETting all the data I’d need to measure performance. Here’s everything I turfed up:

  • Listing: Basic info about the listing. Kinda useless.
  • Images: Image content, captions, and arrangement/order. Can PUT and GET.
  • Host stats: The 30-day views and bookings (same as what’s on the dashboard). GET only.
  • Calendar: Listing pricing and availability. Can PUT and GET.
  • Reservations: Data for every booking, including booking date and revenue. GET only.

It’s no surprise I didn’t uncover a way to serve variants to individual viewers, so we’ll need to redefine “trial”. Instead of having our agent optimize the message variant for every user view, let’s define a frequency, say 24 hours, for which an agent will run a variant and we will capture metrics for that period. This “aggregated” trial period still captures a signal relating variants to expected reward.

The Linear Upper Confidence Bound (LinUCB) Agent

If we assume the reward is a linear function of our variables, we can train the LinUCB algorithm to select the right variant for each trial. For every variant, the algo learns a linear model reward ~ variables. The variables can be contextual variables and/or per-variant variables (explained later). Importantly, the algo computes confidence ellipsoids for each variant (centered on the variant model’s reward estimate, scaled by the variance in the variant’s reward estimate), and for each trial chooses the arm with the best-looking upper value on that ellipsoid. This is called Optimism in the Face of Uncertainty and explains the algo’s name: a linear model computes the centers of the ellipsoids, and the ellipsoid (variant) with the highest upper threshold is selected. Notice this means that a variant with a lower estimated reward but higher variance could be prefered to one with a higher estimate but lower variance. This is how the algo can efficiently exploit and explore the action space. After each trial, the estimates and ellipsoids are recomputed with the addition of the new data collected in the trial. I used the tf-agents implementation of LinUCB.

Contextual Variables

External factors may influence booking behavior, and we call this context. Perhaps users are more likely to make a booking in the afternoon on a sunny day, or are willing to pay more before a holiday.

To make sure our agent can model context appropriately, let’s simulate an environment where the reward signal is a Gaussian, linear function of both the (one-hot encoded) month of the year and the day of the week. Two arms will have a higher reward when it’s the first half of the week, and two will have higher reward when it’s the first half of the year—four arms total. One-hot encoding the weekday and month results in (7–1)+(12–1) free parameters our LinUCB agent will learn.

A contextual bandit running for 1000 simulated daily trials learns the correct arm for each day of the week and month of the year… after 2 years…

We can see it takes about a year of simulated daily trials for the agent to really start acting consistently optimal. That’s how long it took for regret (defined as the difference between the agent’s choice and the optimal choice) to converge.

Instead of using the month of the year as a context variable, we can reduce model complexity and hopefully speed up convergence by instead mapping the day of the year onto the unit circle and using its sin and cos as features instead. This not only reduces our context space dimensionality by 9, but also captures seasonality—January is “closer” to December than June; anecdotally, short-term vacation rental bookings tend to peak in the summer and trough in the winter. Here we use four arms again, keeping the same reward structure for day of week, but using a term for thesin of the day of the year—no cos term to keep things simple.

The contextual bandit running 1000 daily trials, with a unit circle feature representation of the day of the year. The agent converged must faster!

We see that regret converges a little quicker — after about 6 months of simulated daily trials. Neat!

Per-Arm Features

What if we could also parameterize the variants/arms? That way, the agent only needs to learn one reward function, instead of one per arm. This makes more efficient use of the data and simplifies the model. This is called Hybrid LinUCB—because some of its coefficients apply to every arm, while others are specific to each arm. This allows us to easily add/modify/remove variants without requiring the agent to (re)learn models for each. In a future post, we will explore this modification. For now, the vanilla LinUCB model should serve as a good baseline model.

Next Steps

Now that we have an implementation of a toy model and an understanding of the training dynamics in a toy environment, the next step would be to deploy the thing. The devil is in the details here, and so I will reserve that for a future post.




Machine Learning Engineer at Recursion Pharmaceuticals

Love podcasts or audiobooks? Learn on the go with our new app.

What are the best ways to speed up a mobile app development process?

Deploy Elasticsearch with Kubernetes on AWS in 10 steps

Getting started with Android and Kotlin

Get Set. Go! Faster!

Practical Software Engineering is too vast to be narrowed down to just programming

Evolution of business logic from monoliths through microservices, to functions

How to Strengthen Code Confidence Using Automated Tests?

AWS IoT TwinMaker Features Overview

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jake Schmidt

Jake Schmidt

Machine Learning Engineer at Recursion Pharmaceuticals

More from Medium

Top Factors Affecting Pricing Strategy on Airbnb in Seattle

The golden rules of crowdsourced labelling

Predicting Parkinsonian Symptom Severity from Voice Recordings

SkyCube: A Free “No-code” Machine Learning Platform