Thoughts/etc:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3982162/ talks about this in a temporal sense, since using the average of the minimum and maximum daily temperatures at a given location isn't really a great way of determining average temperature. However, it doesn't discuss geographical random sampling/gridding.
http://onlinelibrary.wiley.com/doi/10.1002/joc.4580/full notes this problem exists, and takes uniformly gridded measurements, but it's limited to a specific region and time period (1979-2012).
I know climate scientists slice the Earth up into grids to avoid clustering bias, but that's not the same thing, and isn't useful if the original readings don't accurately represent the slice/region.
I also realize that climate scientists have other measures of global warming, but linear regression of actual temperature measurements seems to be the most used to convince the public, so their accuracy seems important.
As a skeptic, I'd also like to know if, in general, most of the arguments for global warming are statistical in nature (ie, linear regression on measured variables), or the statistical ones are just the most "photogenic" for public consumption? In other words, is the whole non-randomly-sampled/gridded temperature argument a red herring?
EDIT (to clarify question):
To determine the Earth's mean surface temperature, we can employ one of these methods:
Measure the Earth's temperature at every point and average. Of course, this is physically impossible, since a point is a 0-dimensional mathematical abstraction, but we can do something close with satellites.
Select a large number of random points on the Earth's surface (this random distribution is uniform in longitude, but not in latitude-- in latitude, it would look like a cosine curve), measure the temperature, and average. In addition to giving us a mean, it would give us a standard deviation so we can say "we are 95% confident that the Earth's true mean temperature is X plus or minus Y".
Take a uniformly spaced grid (non trivial, since the distance between longitudes vary by latitude), measure the temperature at those points and average. This is similar to the first approach, but with fewer points. Unless we believe our grid points introduce a bias, this should be as accurate as random sampling.
My problem: temperature measurements in the past were made using NONE of these methods. The points where temperature was measured were not chosen randomly or in a gridded fashion. Therefore, how can they be an accurate measurement of historical temperature, even if we only consider temperature changes?
NOTE: I realize surface temperature isn't the best measure of global warming, since water has a much higher specific heat than land (among other things), but that's my focus for this question.
The fact that the sampling points do not move is essential, we know temperature is affected by regional conditions if the samples were re-randomized(moved) with every measurement it would make it less accurate not more. Remember what is being measured, the change, because the sampling points are not moved the the change will be accurate because it essentially becomes a stratified sample. If I am measuring changes in engine temperature for instance I don not want to measure at a different point each time, as long as the points(locations) are consistent the sample will retain high accuracy. A random sampling would be LESS accurate because it would invite confounding because we know the distribution of temperature across the engine (or globe) is not random. Any shift in location between measurements would invite confounding data. Almost no science uses a truly random samples, it's just not possible. Consider things like exhaustive sampling, cluster sampling, stratified sampling, and systematic sampling all are used more often than true random sampling and each is more accurate than random in the right circumstances.
Consider an example, say you are trying to measure the temperature change in an engine over time. Where on the engine I attach my sensors does not matter as long as I do not move them, especially if I put many sensors on it. I could put thirty sensors all on the left side engine, and it would measure the change in temperature very accurately, compared to moving the sensors between every measurement. Don't fall for the perfect solution fallacy. Also remember this is an observational/descriptive study by its very nature.
Each point on the map is more like a repetition, the real independent is the time at which they are sampled, which is either stratified or clustered depending on which study you refer too. Note that multiple sets of data points are also compared. NOAA, BEST, etc. are each independent data sets that can be compared, and show the same pattern.
High and low are used for measurements because that is all that was recorded in the oldest measurements, so changing the format would require throwing out all that data, drastically shortening the sample size (loosing more than half the time span). In this case the accuracy gained by the much larger number of samples is more than would be gained by a random or grid location. Random is rarely possible with historic data which is why the size and consistency of the data set is so important. The nice thing is these are also compared to other sampling methods on other time scales to test to see if they show the same pattern. Historic scientists are aware of the limitations of their data which is why independent verification is so important.
Now consider ice core data, I was surprised when you said surface temperature was the most used, I see ice core data far more often, because it records a much longer span of time, and records other things (like $CO_{2}$ content) as well. Again each core is a repetition and the core can be sampled in a random or stratified way, stratified is the most common because it is more exhaustive in a core sample. Ice cores are also compared to ice cores for m other locations.
Another consideration is cross-comparison, that is the use of multiple independent forms of measure, ice core compared to satellite, compared to surface, etc. Dozens of different forms of measurements/experiments are compared and show the same pattern.
This is probably one of the best overviews of the science I have seen. It is a little old (2013) so if anyone has seen a more recent version I would love to use it instead.