July 26, 2022

Briefly

  • Derek Lowe writesLate last week came this report in Science about doctored images in a series of very influential papers on amyloid and Alzheimer’s disease. That’s attracted a lot of interest, as well it should, and as a longtime observer of the field (and onetime researcher in it), I wanted to offer my own opinions on the controversy.”  As he says, the interest in amyloid is not just (or primarily) driven by the allegedly fraudulent research. There’s a lot of support for the importance of beta-amyloid from genetics: mutations that cause early-onset Alzheimer’s, and perhaps even more convincingly, a mutation found in Icelanders that protects against Alzheimers. The alleged fraud is bad, as is the current complete failure of research into treatments, but the link between the two isn’t as strong as some people are implying.
  • Prof Casey Fiesler, who teaches in the area of tech ethics and governance, is developing a TikTok-based tech ethics and privacy course
  • ESR’s Covid wastewater dashboard is live.  This is important because Everyone Poops. We don’t have an exact conversion from measured viruses to active cases, and the conversion could vary with the strain of Covid and with age of the patients, but at least it won’t depend on who decides to get tested and report their test results.
  • The wastewater data will be an excellent complement for the prevalence survey that the Ministry of Health is starting up. The survey, assuming that a reasonable fraction of people go along with getting tested, will give a direct estimate of the true population infection rate, but it will not be as detailed as the wastewater data, which can give estimates for relatively small areas and short time frames.
  • Briefing on the Data and Statistics Bill from the NZ Council of Civil Liberties. If you follow StatsChat you’ve seen these points before. And you will see them again.

NRL Predictions for Round 20

 

 

Team Ratings for Round 20

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Panthers 14.17 14.26 -0.10
Storm 11.32 19.20 -7.90
Rabbitohs 6.28 15.81 -9.50
Sharks 3.35 -1.10 4.50
Roosters 2.59 2.23 0.40
Sea Eagles 2.07 10.99 -8.90
Cowboys 1.90 -12.27 14.20
Broncos 0.62 -8.90 9.50
Eels -0.52 2.54 -3.10
Raiders -1.53 -1.10 -0.40
Dragons -2.91 -7.99 5.10
Bulldogs -6.09 -10.25 4.20
Titans -7.16 1.05 -8.20
Knights -7.27 -6.54 -0.70
Wests Tigers -9.27 -10.94 1.70
Warriors -9.54 -8.99 -0.60

 

Performance So Far

So far there have been 144 matches played, 98 of which were correctly predicted, a success rate of 68.1%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Eels vs. Broncos Jul 21 14 – 36 5.00 FALSE
2 Dragons vs. Sea Eagles Jul 22 20 – 6 -4.10 FALSE
3 Knights vs. Roosters Jul 22 12 – 42 -3.80 TRUE
4 Raiders vs. Warriors Jul 23 26 – 14 13.80 TRUE
5 Panthers vs. Sharks Jul 23 20 – 10 14.50 TRUE
6 Rabbitohs vs. Storm Jul 23 24 – 12 -4.00 FALSE
7 Bulldogs vs. Titans Jul 24 36 – 26 2.90 TRUE
8 Cowboys vs. Wests Tigers Jul 24 27 – 26 16.00 TRUE

 

Predictions for Round 20

Here are the predictions for Round 20. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Sea Eagles vs. Roosters Jul 28 Sea Eagles 2.50
2 Warriors vs. Storm Jul 29 Storm -15.40
3 Eels vs. Panthers Jul 29 Panthers -11.70
4 Titans vs. Raiders Jul 30 Raiders -2.60
5 Sharks vs. Rabbitohs Jul 30 Sharks 0.10
6 Broncos vs. Wests Tigers Jul 30 Broncos 12.90
7 Knights vs. Bulldogs Jul 31 Knights 1.80
8 Dragons vs. Cowboys Jul 31 Cowboys -1.80

 

July 19, 2022

NRL Predictions for Round 19

 

Team Ratings for Round 19

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Panthers 15.25 14.26 1.00
Storm 11.64 19.20 -7.60
Sea Eagles 6.52 10.99 -4.50
Rabbitohs 6.30 15.81 -9.50
Roosters 3.96 2.23 1.70
Sharks 3.85 -1.10 5.00
Cowboys 1.96 -12.27 14.20
Eels 0.71 2.54 -1.80
Broncos 0.02 -8.90 8.90
Raiders -0.73 -1.10 0.40
Dragons -6.88 -7.99 1.10
Titans -7.55 1.05 -8.60
Bulldogs -7.64 -10.25 2.60
Knights -9.13 -6.54 -2.60
Warriors -9.37 -8.99 -0.40
Wests Tigers -10.90 -10.94 0.00

Performance So Far

So far there have been 136 matches played, 96 of which were correctly predicted, a success rate of 70.6%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Cowboys vs. Sharks Jul 15 12 – 26 3.20 FALSE
2 Eels vs. Warriors Jul 15 28 – 18 16.60 TRUE
3 Roosters vs. Dragons Jul 16 54 – 26 11.90 TRUE
4 Sea Eagles vs. Knights Jul 16 42 – 12 17.00 TRUE
5 Titans vs. Broncos Jul 16 12 – 16 -4.70 TRUE
6 Wests Tigers vs. Panthers Jul 17 16 – 18 -25.90 TRUE
7 Storm vs. Raiders Jul 17 16 – 20 18.00 FALSE
8 Bulldogs vs. Rabbitohs Jul 17 28 – 36 -11.50 TRUE

Predictions for Round 19

Here are the predictions for Round 19. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Eels vs. Broncos Jul 21 Eels 3.70
2 Dragons vs. Sea Eagles Jul 22 Sea Eagles -10.40
3 Knights vs. Roosters Jul 22 Roosters -10.10
4 Raiders vs. Warriors Jul 23 Raiders 14.10
5 Panthers vs. Sharks Jul 23 Panthers 14.40
6 Rabbitohs vs. Storm Jul 23 Storm -2.30
7 Bulldogs vs. Titans Jul 24 Bulldogs 2.90
8 Cowboys vs. Wests Tigers Jul 24 Cowboys 15.90
July 18, 2022

Briefly

  • Training data for emotions/sentiment from Google appears to be badly wrong (Inconceivable!)
  • About 12% of people surveyed in the UK said they knew “a great deal” or “a fair amount” about a non-existent candidate for leader of the Conservative Party.  More reassuringly, the proportion who had ‘never heard of’ this candidate was much higher than for the real candidates.
  • The New York Times asks what’s the chance that Trump adversaries McCabe and Comey got tax audits — and, much more usefully, shows how the answer to this question depends on how you define the comparison
  • Hilda Bastian looks at the evidence on whether female national leaders handled the pandemic better, now that we have more follow-up
  • From the President of the Royal Society (of London), the need for data literacy, but also the need to “avoid shoehorning everything to do with numbers into a box labelled “Maths”, which has negative connotations for many. If you use that box as a place to pigeonhole quantitative literacy, you are shooting yourself in the foot.” (disclaimer: he’s a statistician)
  • A re-analysis suggests that the vaccine effectiveness data for the Sputnik coronavirus vaccine cannot possibly be correct. Among other red flags, the estimated effectiveness in different age groups was far more similar than would be expected even if the true effectiveness was identical in the groups.

Sampling and automation

Q: Did you see Elon Musk is trying to buy or maybe not buy Twitter?

A: No, I have been on Mars for the last month, in a cave, with my eyes shut and my fingers in my ears

Q: <poop emoji>.  But the bots? Sampling 100 accounts and no AI?

A: There are two issues here: estimating the number of bots, and removing spam accounts

Q: But don’t you need to know how many there are to remove them?

A: Not at all. You block porn bots and crypto spammers and terfs, right?

Q: Yes?

A: How many?

Q: Basically all the ones I run across.

A: That’s what Twitter does, too. Well, obviously not the same categories.  And they use automation for that.  Their court filing says they suspend over a million accounts a day (paragraph 65)

Q: But the 100 accounts?

A: They also manually inspect about 100 accounts per day, taken from the accounts that they are counting as real people — or as they call us, “monetizable daily active users” — to see if they are bots.  Some perfectly nice accounts are bots — like @pomological or @ThreeBodyBot or @geonet or the currently dormant @tuureiti — but bots aren’t likely to read ads with the same level of interest as monetizable daily active users do, so advertisers care about the difference.

Q: Why not just use AI for estimation, too?

A: One reason is that you need representative samples of bots and non-bots to train the AI, and you need to keep coming up with these samples over time as the bots learn to game the AI

Q: But how can 100 be enough when there are 74.3 bazillion Twitter users?

A: The classic analogy is that you only need to taste a teaspoon of soup to know if it’s salty enough.   Random sampling really works, if you can do it.  In many applications, it’s hard to do: election polls try to take a random sample, but most of the people they sample don’t cooperate.  In this case, Twitter should be able to do a genuine random sample of the accounts they are counting as monetizable daily active users, and taking a small sample allows them to put more effort into each account.  It’s a lot better to look at 100 accounts carefully than to do a half-arsed job on 10,000.

Q: 100, though? Really?

A: 100 per day.  They report the proportion every 90 days, and 9000 is plenty.  They’ll get good estimates of the average even over a couple of weeks

 

July 12, 2022

NRL Predictions for Round 18

Team Ratings for Round 18

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Panthers 16.64 14.26 2.40
Storm 12.93 19.20 -6.30
Rabbitohs 6.58 15.81 -9.20
Sea Eagles 5.71 10.99 -5.30
Cowboys 2.99 -12.27 15.30
Roosters 2.98 2.23 0.70
Sharks 2.81 -1.10 3.90
Eels 1.24 2.54 -1.30
Broncos 0.08 -8.90 9.00
Raiders -2.02 -1.10 -0.90
Dragons -5.90 -7.99 2.10
Titans -7.60 1.05 -8.60
Bulldogs -7.92 -10.25 2.30
Knights -8.32 -6.54 -1.80
Warriors -9.90 -8.99 -0.90
Wests Tigers -12.29 -10.94 -1.40

 

Performance So Far

So far there have been 128 matches played, 90 of which were correctly predicted, a success rate of 70.3%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Sharks vs. Storm Jul 07 28 – 6 -10.80 FALSE
2 Knights vs. Rabbitohs Jul 08 28 – 40 -11.90 TRUE
3 Wests Tigers vs. Eels Jul 09 20 – 28 -11.00 TRUE
4 Broncos vs. Dragons Jul 10 32 – 18 8.00 TRUE

 

Predictions for Round 18

Here are the predictions for Round 18. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Cowboys vs. Sharks Jul 15 Cowboys 3.20
2 Eels vs. Warriors Jul 15 Eels 16.60
3 Roosters vs. Dragons Jul 16 Roosters 11.90
4 Sea Eagles vs. Knights Jul 16 Sea Eagles 17.00
5 Titans vs. Broncos Jul 16 Broncos -4.70
6 Wests Tigers vs. Panthers Jul 17 Panthers -25.90
7 Storm vs. Raiders Jul 17 Storm 18.00
8 Bulldogs vs. Rabbitohs Jul 17 Rabbitohs -11.50

 

July 5, 2022

NRL Predictions for Round 17

Team Ratings for Round 17

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Panthers 16.64 14.26 2.40
Storm 14.78 19.20 -4.40
Rabbitohs 6.57 15.81 -9.20
Sea Eagles 5.71 10.99 -5.30
Cowboys 2.99 -12.27 15.30
Roosters 2.98 2.23 0.70
Eels 1.48 2.54 -1.10
Sharks 0.96 -1.10 2.10
Broncos -0.40 -8.90 8.50
Raiders -2.02 -1.10 -0.90
Dragons -5.42 -7.99 2.60
Titans -7.60 1.05 -8.60
Bulldogs -7.92 -10.25 2.30
Knights -8.31 -6.54 -1.80
Warriors -9.90 -8.99 -0.90
Wests Tigers -12.53 -10.94 -1.60

 

Performance So Far

So far there have been 124 matches played, 87 of which were correctly predicted, a success rate of 70.2%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Sea Eagles vs. Storm Jun 30 36 – 30 -7.80 FALSE
2 Knights vs. Titans Jul 01 38 – 12 -0.80 FALSE
3 Panthers vs. Roosters Jul 01 26 – 18 17.90 TRUE
4 Bulldogs vs. Sharks Jul 02 6 – 18 -4.70 TRUE
5 Cowboys vs. Broncos Jul 02 40 – 26 5.30 TRUE
6 Rabbitohs vs. Eels Jul 02 30 – 12 6.70 TRUE
7 Warriors vs. Wests Tigers Jul 03 22 – 2 6.50 TRUE
8 Dragons vs. Raiders Jul 03 12 – 10 -0.90 FALSE

 

Predictions for Round 17

Here are the predictions for Round 17. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Sharks vs. Storm Jul 07 Storm -10.80
2 Knights vs. Rabbitohs Jul 08 Rabbitohs -11.90
3 Wests Tigers vs. Eels Jul 09 Eels -11.00
4 Broncos vs. Dragons Jul 10 Broncos 8.00

 

June 21, 2022

NRL Predictions for Round 16

Team Ratings for Round 16

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Panthers 17.27 14.26 3.00
Storm 15.63 19.20 -3.60
Rabbitohs 5.86 15.81 -10.00
Sea Eagles 4.87 10.99 -6.10
Cowboys 2.43 -12.27 14.70
Roosters 2.35 2.23 0.10
Eels 2.20 2.54 -0.30
Sharks 0.38 -1.10 1.50
Broncos 0.16 -8.90 9.10
Raiders -1.79 -1.10 -0.70
Dragons -5.65 -7.99 2.30
Titans -6.06 1.05 -7.10
Bulldogs -7.34 -10.25 2.90
Knights -9.86 -6.54 -3.30
Warriors -10.73 -8.99 -1.70
Wests Tigers -11.70 -10.94 -0.80

 

Performance So Far

So far there have been 116 matches played, 82 of which were correctly predicted, a success rate of 70.7%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Dragons vs. Rabbitohs Jun 16 32 – 12 -12.10 FALSE
2 Sea Eagles vs. Cowboys Jun 17 26 – 28 6.50 FALSE
3 Storm vs. Broncos Jun 17 32 – 20 19.70 TRUE
4 Sharks vs. Titans Jun 18 18 – 10 9.70 TRUE
5 Warriors vs. Panthers Jun 18 6 – 40 -20.90 TRUE
6 Eels vs. Roosters Jun 18 26 – 16 1.80 TRUE
7 Raiders vs. Knights Jun 19 20 – 18 12.40 TRUE
8 Bulldogs vs. Wests Tigers Jun 19 36 – 12 5.10 TRUE

 

Predictions for Round 16

Here are the predictions for Round 16. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Sea Eagles vs. Storm Jun 30 Storm -7.80
2 Knights vs. Titans Jul 01 Titans -0.80
3 Panthers vs. Roosters Jul 01 Panthers 17.90
4 Bulldogs vs. Sharks Jul 02 Sharks -4.70
5 Cowboys vs. Broncos Jul 02 Cowboys 5.30
6 Rabbitohs vs. Eels Jul 02 Rabbitohs 6.70
7 Warriors vs. Wests Tigers Jul 03 Warriors 6.50
8 Dragons vs. Raiders Jul 03 Raiders -0.90

 

Top 14 Predictions for the Final

Team Ratings for the Final

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
La Rochelle 7.78 6.78 1.00
Stade Toulousain 7.44 6.83 0.60
Bordeaux-Begles 6.06 5.42 0.60
Racing-Metro 92 5.75 6.13 -0.40
Clermont Auvergne 4.78 5.09 -0.30
RC Toulonnais 4.50 1.82 2.70
Montpellier 3.93 -0.01 3.90
Castres Olympique 3.61 0.94 2.70
Lyon Rugby 3.61 4.15 -0.50
Stade Francais Paris -0.71 1.20 -1.90
Section Paloise -1.81 -2.25 0.40
USA Perpignan -3.05 -2.78 -0.30
Brive -4.06 -3.19 -0.90
Biarritz -10.48 -2.78 -7.70

 

Performance So Far

So far there have been 185 matches played, 133 of which were correctly predicted, a success rate of 71.9%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Castres Olympique vs. Stade Toulousain Jun 18 24 – 18 -4.60 FALSE
2 Montpellier vs. Bordeaux-Begles Jun 19 19 – 10 -3.00 FALSE

 

Predictions for the Final

Here are the predictions for the Final. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Castres Olympique vs. Montpellier Jun 25 Montpellier -0.30

 

June 18, 2022

Some Questions About Rugby and Rugby League Predictions

I have been asked a number of questions about the predictions I post for Rugby and Rugby League competitions. Here are some questions and my answers.

Fascinated by your model and appreciate your input, ever considered doing AFL? (Daniel Levis)

The methodology I use is based on the work of Stephen Clarke and others at Swinburne University and was used for predicting AFL.

My predictions developed from a request to have purely statistical predictions for a TV program. I looked for a quickly implementable approach and chose the exponential smoothing approach because of the minimal data requirements and my knowledge that exponential smoothing is an excellent method for forecasting which is simple to implement.

I am very familiar with AFL as a lifelong Collingwood supporter who grew up in Victoria. I am predicting a number of competitions now which is only possible because I have been able to automate most of the work of obtaining data, creating the predictions and posting them. I have enough to do as just one person with that, so am not keen to take on extra competitions. The cancellations and variations during the covid epidemic have been very time-consuming also, because they can’t be easily automated, but require a lot of individual modification and data entry which is time-consuming and error-prone.

Hi David, when assessing a team, how many points do you think a ‘Home Ground’ advantage should be? (Dr Douglas Wilde)

I aim for my predictions to be statistically based and free of subjectivity as far as possible so I select the home ground advantage as a parameter in my models based on past data.

For each competition I do a grid search over all the parameters in the model to select values which give the best predictions over a number of years. There are actually two home ground advantage parameters for some of the competitions, one for games between two teams from the same country and another where the teams are from a different country.

Home ground advantage is a question where some subjectivity arises. In the NRL for example when two Sydney teams are playing each other, should home ground advantage apply? I do apply it except when they have the same home ground, based on my reading about home ground advantage in other sports. Also what to do about the Warriors based in Australia? Or in Super Rugby the Fijian Drua based in Australia?

Can you explain briefly how we could use the percentage/performance to do calculations ourselves or it is to difficult and can only be done by you? (Eugene Matthew)

My short answer is that I don’t think you could do that.

First of all there is the data problem. My predictions essentially give the mean number of points difference between the first team score and the second team score. To start assigning probabilities to even the probability of the first team winning, you need to model the distribution around that mean value, and for that you require the errors observed in the past as the basic data to model the distribution.

You then have to model the distribution, which is not as straight forward as you might imagine, because the distribution of errors is not a normal distribution, it is heavy-tailed. As it happens I am quite experienced at modelling heavy-tailed distributions since I have written a number of R software packages to handle distributions of that sort which are commonly used in mathematical finance. I have in the past done a preliminary exercise modelling the errors but nothing suitable for prime time.

There have also been requests for probabilities of margins being in particular ranges: 0 to 12, or larger. Here you start to run into problems of accuracy. My guess without doing some actual investigation is that those probabilities would be highly variable.

That sort of calculation is likely done by betting companies which I have at their disposal qualified statisticians and substantial computing power. Talking to betting company statisticians, they have told me that even outside of the betting companies, professional gamblers these days use a lot of data and computer analysis to inform their betting.

It is important to remember here that the only data I use are the scores of past games and home ground advantage.

All in all I think your predictions are very good and could all be near perfect if players had values also like the home advantage value it would make some games more accurate since they are the most important part of the predictions/stats.
example; cowboys vs Storms a few weeks back when Papenhouzen, RSmith, Solomona and Hugh’s didn’t play causing your stats to be significantly wrong. (Eugene Matthew)

There are two reasons why I don’t do this.

The first is practical: there is much, much more data required, which has to be collected and then utilised. I would have to have team sheets for all games and assign some sort of value to each player. That would be very difficult to automate as a single part-time person. It is the sort of thing a betting company would do however because they have the resources.

The second reason is more philosophical. The idea that if you use more data you will get better predictions is somewhat misplaced. There is in team sports inherent randomness that cannot be dealt with no matter how much data is available. It is easy to point to games where you would never pick a team to win using any methodology but they still do win. Extra data and more sophisticated methods generally bring only marginal improvements in accuracy. Statistically the more data is used, the more variability is introduced in estimating effects.

You can always simply use some subjective estimates:

I give – 5 points for a fullback or 5/8 missing from the 17. -3 points for a Marquee player missing. -2 points for a regular player missing. I also give scores to aspects like Home/Away game, Avg points scored in a game, Avg unforced errors and so on. I have 30 key markers/aspects I score to make my prediction. (Dr Doug Wilde)

I would always want to have sound statistically based estimates of any quantities I used in a model, I am a statistician after all.

My aim in producing these predictions is to show the efficacy of very simple statistical methods using only limited data, and exponential smoothing in particular. I don’t recommend betting using these predictions and I never bet myself. I do know that a number of people use the predictions in tipping competitions. My advice is to use the predictions to indicate the form of teams in the competition adjusted for home ground advantage, then if there are other factors you consider important (injuries, the weather, a long-standing voodoo, …) modify my forecasts as you see fit.