Peer reviewing in ML

My notes from the amazing What to do about NeurIPS Reviewer 2 talk:

  • Junior reviewers are as good as seniors
    • Their review quality goes down over years
    • Getting trained to do better reviewing also loses its positive impact over years
  • If a reviewer knows a paper was rejected before, then his/her score significantly decreases.

  • Single vs double blind experiments; there is a bias towards top authors/universities but no country, gender, academia vs industry bias

  • Only 10-20% of review scores change after rebuttal

  • Seeing others’ scores, reviewers tend to decrease their score rather than to increase.

  • If reviewers are friends of authors, they may tell authors who are rejecting their paper. In turn, senior authors may pressure reviewers.

  • Review examples
    • Dr. Fox effect: “if you can’t convince reviewers, confuse them”.
    • Surprisingness bias: reviewers tend to find results “expected” and reject. Hence, authors (should?) stress the unpredictability of results and make reader think about the counterfactual.
    • Confirmation bias: reviewers favor papers whose main claim is aligned with their own views.
    • Positive-outcome bias: if a paper reports negative findings, reviewers tend to reject a lot more.
    • Citation bias: reviewers who are cited tend to give higher scores
  • Meta-reviewers’ and ACs’ evaluation of a good vs bad review is independent of the review score

  • Meta-reviewers find reviews unnecessarily long reviews more useful

  • Tendency to reject if rejecting other papers seems to increase chances of acceptance

  • PC members and authors ill-intentionally try to get assigned to one another’s paper and proposals.

  • on average, reviewers catch 30% of all errors in other disciplines. one experiment done in a major ML conference showed that only 1/80 reviewers was suspicious of the error.

  • [2014] 57% of the papers accepted by one committee would be rejected by another.

  • [2021] More than half of the spotlights would be rejected by another committee.

  • Reviewer scores are uncorrelated with citations.

  • Review scores of “impact” does not correlate with the citation count but with social media counts.

  • Even authors don’t always agree on how impactful their shared papers are.