I've got a phenomenon of interest, $y$. It is determined in large part by the weather. Naturally, one becomes interested in what climate change may do to $y$, so one fits a model of the form $y = f(\mathbf{X}) + \epsilon$, where $\mathbf{X}$ is a potentially high-dimensional representation of historical weather.
To get a sense of impacts into the future, one uses the fitted model $\hat{f}$ applied to (downscaled) GCM projections of weather. Given uncertainty in which model provides the most faithful representation of the distribution of weather given CO2, one should compute a distribution of $\hat{y}$ as
- $\hat{f}(GCM_1)$
- $\hat{f}(GCM_2)$
- $\hat{f}(GCM_3)$
- $\vdots$
- $\hat{f}(GCM_k)$
A few questions:
- GCMs get some things badly wrong, like convective precipitation. Is it good practice to make sure that your model $f$ and your training data $\mathbf{X}$ represent things like convective precip at an appropriate level of coarseness, such that the GCM can be roughly "correct"? For example, should I only use monthly summaries of precip from a GCM, even if the GCM's raw output gives it to me sub-daily? What are some other examples of things that GCMs don't do well, that one might consider coarsening?
- One shouldn't average GCM output across multiple models, obviously. But it makes more sense to average $\hat{y}_{GCM_1}, \hat{y}_{GCM_3}, \hat{y}_{GCM_3}, \dots, \hat{y}_{GCM_k}$. But should this be a simple average, or should the average be weighted somehow to account for the fact that some models are more similar to one another than other models? And how would this be practically done, given that the GCM's are so high-dimensional that it becomes impossible to calculate distance metrics?
- Given that historical simulations exist, one can compute a "backcast" projection $\tilde{y} = \hat{f}(GCM_k^{historical})$, and then compute a ``bias'' term $B = E[\tilde{y}] - E[y]$. (Naturally one would use a difference in means, rather than a pointwise difference, because GCMs attempt to model the distribution of the weather, rather than realized weather.) If the estimated bias $B$ is nontrivial, should I add it to my projections, forming $\hat{y}_{BC} = \hat{y} - B$? Or is there a better way to achieve the same end?