# Part 1: An introduction

In October, I decided to convert my guest bedroom into an AirBnB. After months of designing, painting, furnishing, and installation, I thought I was finally finished with the project. But when I listed the room on AirBnB, I realized the number of knobs I could turn maximize revenue, including:

- Listing title
- Listing description
- Pricing
- Discounts
- Cover photo
- Photo order
- Photo captions

I didn’t want to leave cash on the table, but I didn’t want to evaluate every combo by hand. Luckily, there are smarter, easier ways to find the optimum using some scriptin’ n’ stats.

Jump to Part 2 (coming soon!) for a more technical specification of the problem and implementation of the solution, or read on for an introduction to the family of algorithms we will use to maximize listing revenue.

**Optimization with Multi-Armed Bandits**

A multi-armed bandit is a statistical method for figuring out what choices to make to achieve “optimal performance”, for some definition of optimal performance. I define optimal performance for my AirBnB in Part 2, but for now, let’s briefly review the basics of multi-armed bandits with the example that gave them their name.

## Hacking Vegas

Imagine you’re a gambler on the floor of the MGM Grand, sauntering towards the slot machines (known as single-armed bandits). While the house always wins on average, let’s say there are slight variations in each slot machine that makes some “luckier” than others — they hit the jackpot more often. Which machine would you play? The luckiest one! How would you determine that? Well, it makes sense to:

- Play each machine some (collect data samples).
- Form an opinion about how lucky each machine is (estimate each machine’s expected value).
- Decide which machine is luckiest, then stick to that one.

However, there’s a problem. You burn some of your hard-earned cash (and your time) every time you play on an unlucky machine, but the *quality* of your opinion for each machine’s luckiness depends on having a lot of plays on it. In other words, you’re spending money to collect information about each machine, but how much information do you need before you can determine which machine is luckiest? This is the *exploration / exploitation tradeoff*, and multi-armed bandits learn to balance the two in order to maximize your gains in the long run (over many plays).

Hopefully, you can see the connection between the gambling scenario and the AirBnB listing. In each scenario, an **agent** decides on an **action** to (repeatedly) take, and we can collect data to figure out which action will likely give us the best **reward** going forward.

We haven’t yet defined exactly what the concept of “reward” looks like for our listing optimizer because it’s not as straightforward as counting the money coming out of a slot machine. We will do that and more in Part 2, where we give a more technical specification of the AirBnB Listing Optimization problem as well as our initial implementation with Python. Stay tuned!