Peer reviewing in ML | Çağatay Yıldız

My notes from the amazing What to do about NeurIPS Reviewer 2 talk:

Junior reviewers are as good as seniors
- Their review quality goes down over years
- Getting trained to do better reviewing also loses its positive impact over years
If a reviewer knows a paper was rejected before, then his/her score significantly decreases.
Single vs double blind experiments; there is a bias towards top authors/universities but no country, gender, academia vs industry bias
Only 10-20% of review scores change after rebuttal
Seeing others’ scores, reviewers tend to decrease their score rather than to increase.
If reviewers are friends of authors, they may tell authors who are rejecting their paper. In turn, senior authors may pressure reviewers.
Review examples
- Dr. Fox effect: “if you can’t convince reviewers, confuse them”.
- Surprisingness bias: reviewers tend to find results “expected” and reject. Hence, authors (should?) stress the unpredictability of results and make reader think about the counterfactual.
- Confirmation bias: reviewers favor papers whose main claim is aligned with their own views.
- Positive-outcome bias: if a paper reports negative findings, reviewers tend to reject a lot more.
- Citation bias: reviewers who are cited tend to give higher scores
Meta-reviewers’ and ACs’ evaluation of a good vs bad review is independent of the review score
Meta-reviewers find reviews unnecessarily long reviews more useful
Tendency to reject if rejecting other papers seems to increase chances of acceptance
PC members and authors ill-intentionally try to get assigned to one another’s paper and proposals.
on average, reviewers catch 30% of all errors in other disciplines. one experiment done in a major ML conference showed that only 1/80 reviewers was suspicious of the error.
[2014] 57% of the papers accepted by one committee would be rejected by another.
[2021] More than half of the spotlights would be rejected by another committee.
Reviewer scores are uncorrelated with citations.
Review scores of “impact” does not correlate with the citation count but with social media counts.
Even authors don’t always agree on how impactful their shared papers are.