Q: Did you see Elon Musk is trying to buy or maybe not buy Twitter?
A: No, I have been on Mars for the last month, in a cave, with my eyes shut and my fingers in my ears
Q: <poop emoji>. But the bots? Sampling 100 accounts and no AI?
A: There are two issues here: estimating the number of bots, and removing spam accounts
Q: But don’t you need to know how many there are to remove them?
A: Not at all. You block porn bots and crypto spammers and terfs, right?
Q: Yes?
A: How many?
Q: Basically all the ones I run across.
A: That’s what Twitter does, too. Well, obviously not the same categories. And they use automation for that. Their court filing says they suspend over a million accounts a day (paragraph 65)
Q: But the 100 accounts?
A: They also manually inspect about 100 accounts per day, taken from the accounts that they are counting as real people — or as they call us, “monetizable daily active users” — to see if they are bots. Some perfectly nice accounts are bots — like @pomological or @ThreeBodyBot or @geonet or the currently dormant @tuureiti — but bots aren’t likely to read ads with the same level of interest as monetizable daily active users do, so advertisers care about the difference.
Q: Why not just use AI for estimation, too?
A: One reason is that you need representative samples of bots and non-bots to train the AI, and you need to keep coming up with these samples over time as the bots learn to game the AI
Q: But how can 100 be enough when there are 74.3 bazillion Twitter users?
A: The classic analogy is that you only need to taste a teaspoon of soup to know if it’s salty enough. Random sampling really works, if you can do it. In many applications, it’s hard to do: election polls try to take a random sample, but most of the people they sample don’t cooperate. In this case, Twitter should be able to do a genuine random sample of the accounts they are counting as monetizable daily active users, and taking a small sample allows them to put more effort into each account. It’s a lot better to look at 100 accounts carefully than to do a half-arsed job on 10,000.
Q: 100, though? Really?
A: 100 per day. They report the proportion every 90 days, and 9000 is plenty. They’ll get good estimates of the average even over a couple of weeks
Recent comments on Thomas Lumley’s posts