In a reinvigorated effort to look into more detail at the implementation of Bayesian Bandits, today I came across a paper from last fall (2013) by some folks at LinkedIn: Automatic Ad Format Selection via Contextual Bandits by L Tang, R. Rosales, A Singh, and D. Agarwal. Tang is a Ph.D. student at Florida International, and the others are in the Applied Relevance Science group at LinkedIn.
This paper does not describe the use of Bandits for the choice of ads, as that seems to be driven more by other methods - auction, etc. Rather, they look at (offline) methods to determine the "best" format for the ad itself when it is placed in its slot on the page. This speaks again to the general applicability of Bandits.
They compare a bunch of different Bandit algorithms (close to 50), but these are generally various parameterizations of a small group of approaches - ucb, ε-greedy, softmax, ε-softmax. However, there is a single "Thompson Sampling" approach included (which is the one I am most familiar with). The results for each method are summarized in Figures 4 and 5 of the paper.
There's lots for me still to digest here, but here's the bottom line as I read it:
Bernoulli Thompson sampling is overall the best policy in our evaluation for ad format selection.(end of section 4.1 of Automatic Ad Format Selection via Contextual Bandits, Oct/Nov 2013)