Standard ordinary least squares regression assumes that the error term has the same variance across all the observations. When the units are polls, we know immediately that this assumption will be violated. The error in a poll in inversely proportional to its sample size. The “margin of error” that pollsters routinely report is twice the standard error of estimate evaluated at 50%, the worst case with the largest possible variance. That comes from the well-known statistical formula

SE(p) = sqrt(p[1-p]/N)

where N is the sample size. This formula reaches its maximum at p=0.5 (50%) making the standard error 0.5/sqrt(N).

Weighted least squares adjusts for these situations where the error term has a non-constant variance (technically called “heteroskedasticity”). To even out the variance across observations, each one is weighted by the reciprocal of its estimated standard error. For polling data, then, the weights should be proportional to the reciprocal of 1/sqrt(N), or just sqrt(N) itself. I thus weight each observation by the square root of its sample size.

More intuitively we are weighting polls based on their sample sizes. However, because we are first taking the square roots of the sample sizes, the weights grow more slowly as samples increase in size, just as does the accuracy of prediction.