I have only skimmed the paper but one thing I don't see any discussion of is whether komi (the handicap given to white for going second) is correct.
They do say the rules used for all games, including self-play, set komi consistently to 7.5 .
If the strongest AI was consistently winning predominantly with one color it would be an indication that komi isn't fair for the best play.
Of the 20 games released for the strongest play it appears white won 14 times and black 6. I don't think that is enough to be conclusive but maybe komi is too high.
I wonder if different "correct" play at the strongest levels would be learned with a 6.5 komi.
You can only change komi by full point increments. There is a .5 to break ties, but a komi of 7.5 is identical to one of 7.4.
From a theoretical standpoint, any non-integer komi should lead to one player winning 100% of the time. So even if the actual win ratio is 14:6 at komi=7.5 that might still be the best value.
If you had an estimate of the real difference, you could switch to breaking ties randomly. Black wins 60% of the ties, white wins 40%. There will be a ratio at which each side should win 50% of the time.
I agree that with perfect play, it will be a 50% of a tie to each side. But it is still interesting to ask for a better estimate of practical play.
Michael Redmond mentions this in the AlphaGo vs AlphaGo review series he's doing with the AGA. AlphaGo selfplay games are with 7.5 komi under Chinese rules, and apparently, Deepmind has stated that black vs white wins is almost exactly 50/50. IIRC Redmond mentioned that white (?) only had some sub 1% advantage in the entire self-play corpus.
I don't remember where I read it but in some earlier versions of AlphaGo they tried a komi of 6.5 and black ended up winning more often. That indicates the correct komi value is 7, but since Go doesn't have ties, you have to pick which side you want to favor to break the tie. (White seems reasonable.)
Well in Japanese rules the komi is 6.5 so that's the alternative that tends to come up. Some quick searching I found a transcript from one of the games where DeepMind said 7.5 slightly favors white but they didn't say anything about 6.5 or 5.5, while a random comment from r/baduk claims that pro game analysis shows 6.5 slightly favors black and 7.5 slightly favors white.
the correct komi number has puzzled Go players for centuries, now we might finally have a chance to figure out the right answer (although not without some reservations). over the last 5 decades, komi has consistently been raised to keep the game more leveled between white and black (black makes the first move, so has the advantage). historically, there was no komi, and people kept an even game by always playing even number of games with each player switching sides after each game.
for whatever reason, it's no longer feasible in modern pro game (not to mention that this could result in no winner if each player wins half the game), so komi was introduced. at first at 5.5, and steadily climbed higher to 7.5 at present. In pro game, even a change of 1 is considered a big deal, so from 5.5 to 7.5 is hardly trivial.
Now with alphago playing "perfect" games against itself, we might finally be able to put to rest the debate of the correct komi (the Japanese Go associations for decades have kept meticulous records of every professional game, in order to find the correct komi).
There is a big "but" though. The correct komi at Alphgo Zero's level might not be the correct komi for human level players (AlphaGo is estimated to be 2-3 handicaps above human play; this is a bigger gap between the average pro player and the best amateurs).
Indeed, the change from 5.5 komi to 7.5 komi also had a lot to do with the change in play style rather than simply zooming in on the "correct" komi number. In the 70s and 80s, predominant play style was more conservative, and 5.5 might well be the correct komi for the time (defined as resulting in 50:50 chance of winning for either side). As play style shifted to become more aggressive and confrontational (actually fueld somewhat by the introduction of komi), it was discovered that komi needs to be raised to keep chances of winning at 50:50.
To make an analogy, suppose one is playing a casino game of chance that gives the house a slight advantage (similar to the first mover advantage for black in go). If one only makes small bets, the house will end up winning only a small amount. in other words, the player needs to be compensated by a small amount to make the game "fair".
If however one makes big bets (i.e. more aggressive game play), then the compensation needs to be bigger too, to make the game "fair", even if the underlying probabilities have not changed.
following this logic, while 7.5 komi is fair for Alphago vs. alphago games, it might not be the right number for human games. I suspect it might be samller for humans.... if only we could calibrate Alphago to the average human level and generate millions of self-play games...
They do say the rules used for all games, including self-play, set komi consistently to 7.5 .
If the strongest AI was consistently winning predominantly with one color it would be an indication that komi isn't fair for the best play.
Of the 20 games released for the strongest play it appears white won 14 times and black 6. I don't think that is enough to be conclusive but maybe komi is too high.
I wonder if different "correct" play at the strongest levels would be learned with a 6.5 komi.