Friday, May 30, 2008

Coming out of the closet?

There is surely less to this story than meets the eye. I reckon she just took a wrong turning on the way home one day and didn't realise she had ended up in someone else's walk-in wardrobe.

Thursday, May 29, 2008

Oops

This seems pretty embarrassing for all concerned. Remember that mid-century cooling that people have been desperately fiddling their models to reproduce for years? It now turns out that it was just an artefact of dubious assumptions about measurement bias (at least, a large chunk of it - I haven't seen any official revised global surface temperature data).

It seems like it won't make much difference to climate predictions, although maybe one should expect it to reduce our estimates of both aerosol cooling and climate sensitivity marginally (I haven't read that linked commentary yet, so don't know how much detail they go into). It will also make it easier for the models to simulate the observed climate history. In fact one could almost portray this as another victory for modelling over observations, since the models have always struggled to reproduce this rather surprising dip in temperatures (eg SPM Fig 4). I've asked people about this problem myself in various seminars, and never got much of an answer. It's pretty shocking that such a problem could have been overlooked for so long.

It wasn't overlooked by everyone, actually. But I anticipate that plenty of people will try their best to avoid looking and linking in that particular direction...

Sunday, May 25, 2008

Once more unto the breach dear friends, once more...

...or fill up the bin with rejected manuscripts.

I wasn't going to bother blogging this, as there is really not that much to be said that has not already been covered at length. But I sent it to someone for comments, and (along with replying) he sent it to a bunch of other people, so there is little point in trying to pretend it doesn't exist.

It's a re-writing of the uniform-prior-doesn't-work stuff, of course. Although I had given up on trying to get that published some time ago, the topic still seems to have plenty of relevance, and no-one else has written anything about it in the meantime. I also have a 500 quid bet to win with jules over its next rejection. So we decided to warm it over and try again. The moderately new angle this time is to add some simple economic analysis to show that these things really matter. In principle it is obvious that by changing the prior, we change the posterior and this will change results of an economic calculation, but I was a little surprised to find that swapping between U[0,20C] and U[0,10C] (both of which priors have been used in the literature, even by the same authors in consecutive papers) can change the expected cost of climate change by a factor of more than 2!

We have also gone much further than before in looking at the robustness of results when any attempt at a reasonable prior is chosen. This was one of the more useful criticisms raised over the last iteration, and we no longer have the space limitations of previous manuscripts. The conclusion seems clear - one cannot generate these silly pdfs which assign high probability to very high sensitivity, other than by starting with strong (IMO ridiculous) prior belief in high sensitivity, and then ignoring almost all evidence to the contrary. Whether or not such a statement is publishable or not (at least, publishable by us), remains to be seen. I'm not exactly holding my breath, but would be very happy to have my pessimism proved wrong.

Monday, May 19, 2008

Question: when is 23% equal to 5%?

Answer: when the 23% refers to the proportion of models that are rejected by Roger Pielke's definition of "consistent with the models at the 95% level".

By the normal definition, this simple null hypothesis significance test should reject roughly 5% of the models. Eg Wilks Ch 5 again "the null hypothesis is rejected if the probability (as represented by the null distribution) of the test statistic, and all other results at least as unfavourable to the null hypothesis, is less than or equal to the test level. [...] Commonly the 5% level is chosen" (which we are using here).

I asked Roger what range of observed trends would pass his consistency test and he replied with -0.05 to 0.45C/decade. I then asked Roger how many of the models would pass his proposed test, and instead of answering, he ducked the question, sarcastically accusing me of a "nice switch" because I asked him about the finite sample rather than the fitted Gaussian. They are the same thing, Roger (to within sampling error). Indeed, the Gaussian was specifically constructed to agree with the sample data (in mean and variance, and it visually matches the whole shape pretty well).

The answer that Roger refused to provide is that 13/55 = 24% of the discrete sample lies outside his "consistent with the models at the 95% level" interval (from the graph, you can read off directly that it is at least 10, which is 18% of the sample, and at most 19, or 35%).

But that's only with my sneaky dishonest switch to the finite sample of models. If we use the fitted Gaussian instead, then roughly 23% of it lies outside Rogers proposed "consistent at the 95% level" interval. So that's entirely different from the 24% of models that are outside his range, and supports his claims...

I guess if you squint and wave your hands, 23% is pretty close to 5%. Close enough to justify a sequence of patronising posts accusing me and RC of being wrong, and all climate scientists of politicising climate science and trying to shut down the debate, anyway. These damn data and their pro-global-warming agenda. Life would be so much easier if we didn't have to do all these difficult sums.

I'm actually quite enjoying the debate over whether the temperature trend is inconsistent with the models at the "Pielke 5%" level :-) And so, apparently, are a large number of readers.

Sunday, May 18, 2008

Putting Roger out of his misery

OK, we've all had our fun, but perhaps it is time to put an end to it. There's obviously a simple conceptual misunderstanding underlying Roger's attempts at analysis, which some have spotted, but some others don't seem to have so I will try to make it as clear as possible.

The models provided a distribution of predictions about the real-world trend over the 8 years 2000-2007 inclusive. However, we have only one realisation of the real-world trend, even though there are various observational analyses of it. The spread of observational analyses is dependent on observational error and their distribution is (one hopes) roughly centred on the specific instance of the true temperature trend over that one interval, whereas the spread of forecasts depends on the (much larger) natural variability of the system and this distribution is centred on the models' estimate of the underlying forced response. Of course these distributions aren't the same, even in mean let alone width. There is no way they could possibly be expected to be the same (excepting some implausible coincidences). So of course when Roger asks Megan if these distributions differ, it is easy to see that they do. But what is that supposed to show?

People tend to get unreasonably hot under the collar in discussions about climate science, so let's change the scenario to a less charged situation. Roger, please riddle me this:

I have an apple sitting in front of me, mass unknown. I use some complex numerical models make a wild guess and estimate its mass at 100±50g (Gaussian, 2sd). I also have several weighing scales, all of which have independent Gaussian measuring errors of ±5g. I have two questions:

1. If I weight the apple once, what range of observed weights X are consistent with my estimate of 100±50g?

2. If I weigh the apple 100 times with 100 different sets of scales (each set of scales having independent errors of the same magnitude), what range of observed weight distributions are consistent with my estimate for the apple's mass of 100±50g. Hint: the distribution of observed weights can be approximated by the Gaussian form X±5g for some X. I am asking what values for X, the mean of the set of observations, would be consistent with (at the 95% level) my estimate for the true mass.

You can also ask Megan for help, if you like - but if so, please show her my exact words rather than trying to "interpret" them for her as you "interpreted" the question about climate models and observations. You can reassure her that I'm not looking for precise answers to N decimal places to a tricky mathematical problem so much as a understanding of the conceptual difference between the uncertainty in a prediction, and the uncertainty in the measurement of a single instance. It is not a trick question, merely a trivial one.

Or, dressing up the same issue in another format:

If the weather forecast for today says that the temperature should be 20±1C, and the thermometer in my garden says 19.4±0.1C, then I hope we would all agree that the observation is consistent with the forecast. Would that conclusion change if I had 10 thermometers, half of which said 19.4±0.1C and half 19.5±0.1C? Of course, in this case the distribution of observations is clearly seen to be markedly different from the distribution of the forecast. Nevertheless, the true temperature is just as predicted (within the forecast uncertainty). If there is anyone (not just Roger) who thinks that the mean observation of 19.45C is inconsistent with the forecast, please let me know what range of observed temperatures would be consistent.

Friday, May 16, 2008

Roger gets it right!

But only where he says "James is absolutely correct when he says that it would be incorrect to claim that the temperatures observed from 2000-2007 are inconsistent with the IPCC AR4 model predictions. In more direct language, any reasonable analysis would conclude that the observed and modeled temperature trends are consistent." (his bold)

Unfortunately, the bit where he tries cherry picking a shorter interval Jan 2001 - Mar 2008 and claims "there is a strong argument to be made that these distributions are inconsistent with one another" is just as wrong as the nonsense he came up with previously.

Really, I would have thought that if my previous post wasn't clear enough, he could have consulted a numerate undergraduate to explain it to him (or simply asked me about it) rather than just repeating the same stupid errors over and over and over and over again. This isn't politics where you can create your own reality, Roger.

So let's look at the interval Jan 2001-Mar 2008. I say (or rather, IDL's linfit procedure says) the trend for these monthly data from HadCRU is -0.1C/decade, which seems to agree with the value on Roger's graph.

The null distribution over this shorter interval of 7.25 years will be a little broader than the N(0.19,0.21) that I used previously, for exactly the same reason that the 20-year trend distribution is much tighter than the 8-year distribution (averaging out of short-term variability). I can't be bothered trying to calculate it from the data, but N(0.19,0.23) should be a reasonable estimate (based on an assumption of white noise spectrum, which isn't precisely correct but won't be horribly wrong). This adjustment doesn't actually matter for the overall conclusion, but it is important to be aware of it if Roger starts to cherry-pick even shorter intervals.

So, where does -0.1 lie in the null distribution? About 1.26 standard deviations from the mean, well within the 95% interval (which is numerically (-0.27,0.65) in this case). Even if the null hypothesis was true, there would be about a 21% probability of observing data this "extreme". There's nothing remotely unusual about it.

So no, Roger, you are wrong again.

Thursday, May 15, 2008

The consistently wrong chronicles

...or, how to perform the most elementary null hypothesis significance tests.

Roger Pielke has been saying some truly bizarre and nonsensical things recently. The pick of the bunch IMO is this post. The underlying question is: Are the models consistent with the observations over the last 8 years?

So Roger takes the ensemble of model outputs (8 year trend as analysed in this RC post), and then plots some observational estimates (about which more later), which clearly lie well inside the 95% range of the model predictions, and apparently without any shame or embarrassment adds the obviously wrong statement:
"one would conclude that UKMET, RSS, and UAH are inconsistent with the models".
Um....no one would not:


Update: OK, there are a number of things wrong with this picture. First, these "Observed temperature trends" stated on the left, calculated by Lucia, are actually per century not per decade, although I think they have been plotted in the right place. When OLS is used on the 8-year trends (to be consistent with the model analysis), the various obs give results of around 0.13 - 0.26C/decade, with my HadCRU analysis actually being at the lower end of this range. Second, the pale blue lines purporting to show "95% spread across model realizations" are in the wrong place. Roger seems to have done a 90% spread (5-95% coverage) which is about 20% too narrow, in terms of the range it implies.

I challenged this obvious absurdity and repeatedly asked him to back it up with a calculation. After a lot of ducking and weaving, about the 30th comment under the post, he eventually admits "I honestly don't know what the proper test is". Isn't thinking about the proper test a prerequisite for confidently asserting that the models fail it? Anyway, I'll walk through it here very slowly for the hard of understanding. I'll use Wilks "Statistical methods in the atmospheric sciences" (I have the 1st edition), and in particular Chapter 5: "Hypothesis testing". It opens:

Formal testing of hypotheses, also know as significance testing, is generally covered extensively in introductory courses in statistics. Accordingly, this chapter will review only the basic concepts behind formal hypothesis tests...[cut]

and then continues with:
5.1.3 The elements of any hypothesis test

Any hypothesis test proceeds according to the following five steps:

1. Identify a test statistic that is appropriate to the data and question at hand.

This is a gimme. Obviously, the question that Roger has posed is about the 8-year trend of observed mean surface temperature. I'm going to use an ordinary least squares (OLS) fit because that is what is already available for the models, and it is also by far the most commonly used method for trend estimation and has well understood properties. For some unstated reason, Roger chose to use Cochrane-Orcutt estimates for the observed data that he plotted in his picture, but I do not know how well that method performs for such a short time series or how it compares to OLS. Anyone who wishes to repeat the analysis using C-O should find it easy enough in principle, they will need to get the raw model output (freely available) and analyse it in that manner. I would bet a large sum of money that this will not change the results qualitatively.

2. Define a null hypothesis.

Easy enough, the null hypothesis H0 here that I wish to test is that the models correctly predict the planetary temperature trend over 2000-2007. If anyone has any other suggestion for what null hypothesis makes sense in this situation, I'm all ears.

3. Define an alternative hypothesis.

"H0 is false". This all seems too easy so far....there must be something scary around the corner.

4. Obtain the null distribution, which is simply the sampling distribution of the test statistic given that the null hypothesis is true.

OK, now the real work - such as it is - starts. First, we have the distribution of trends predicted by the models. As RC have shown, this is well approximated by a Gaussian N(0.19,0.21). (I am going to stick with decadal trends throughout rather than using a mix of time scales to give me less chance of embarassingly dropping factors of 10 as Roger has done in several places in his post. He has also plotted his blue "95%" lines in the wrong place too, but I've got bigger fish to fry.) There are firm theoretical reasons why we should expect a Gaussian to provide a good fit (basically the Central Limit Theorem). This distribution isn't quite what we need, however. The model output (as analysed) uses perfect knowledge of the model temperature, whereas the observed estimate for the planet is calculated from limited observational coverage. In fact, CRU estimate their observational errors at about 0.025 for each year's mean (at one standard deviation). This introduces a small additional uncertainty of about 0.04 on the decadal trend. That is, if the true planetary trend is X, say, then the observational analysis will give us a number in the range [X-0.08,X+0.08] with 95% probability.

Putting that together with the model output, we get the result that if the null hypothesis is true and the models' prediction of N(0.19,0.21) for the true planetary trend is correct, then the sampling distribution for the observed trend should also be N(0.19,0.21). I calculated 0.21 for the standard deviation there by adding the two uncertainties of 0.21 and 0.04 in quadrature (ie squaring, adding, taking the square root). This is the correct formula under the assumption that the observational error is independent of the true planetary temperature, which seems natural enough.

So, as I had guessed in my comments to Roger's post, considering observational uncertainty here has a negligible effect (is rounded off completely), so we could have simply used the existing model spread as the null distribution. Using this approach generally makes such tests stiffer than they should be, but it is often a small effect.

5. Compare the observed test statistic to the null distribution. If the test statistic falls in a sufficiently improbable region of the null distribution, H0 is rejected as too unlikely to have been true given the observed evidence. If the test statistic falls within the range of "ordinary" values described by the null distribution, the test statistic is seen as consistent with H0 which is then not rejected. [my emphasis]

OK, let's have a look at the test statistic. For HADCRU, the least squares trend is....0.11C/decade. That is a simple least squares to the last 8 full year values of these data. (I generally use the variance-adjusted version, on the ground that if they think there is a reason to adjust the variance, I see no reason to presume that this harms their analysis. It doesn't affect the conclusions of course.)

So, where does 0.11 lie in the null distribution N(0.19,0.21)? Just about slap bang in the middle, that's where. OK, it is marginally lower than the mean (by a whole 0.38 standard deviations), but actually closer to the mean than one could generally hope for, even if the null is true. In fact the probability of a sample statistic from the null distribution being worse than the observed test statistic is a whopping 70% (this value being 1 minus the integral of a Gaussian from -0.38 to +0.38 standard deviations)!

So what do we conclude from this?

First, that the data are obviously not inconsistent with the models at the 5% level.

Second...well I leave readers to draw their own conclusions about Roger "I honestly don't know what the proper test is" Pielke.

Monday, May 12, 2008

Are you avin a laff?

Round and round the mulberry bush...

Roger Pielke, 30 April:

there is in fact nothing that can be observed in the climate system that would be inconsistent with climate model predictions. If global cooling over the next few decades is consistent with model predictions, then so too is pretty much anything and everything under the sun.
Me (in the comments):
over the 30 year time frame there will be strong warming
Roger:

I see that you neglected to address my central question

Me:
I explicitly wrote "over the 30 year time frame there will be strong warming" - and actually 20y would be a safe bet too.

Roger:

I see that you have once again avoided addressing this question.

Me here in more detail:
Warming over 30 years is assured, 20 years must be "very likely", 10 years I would certainly say "likely" but that is a bit of a rough estimate.

I could do a detailed calculation about the probability of different trends over the next 30 years, but that's already been done.

Roger:

James, when you write, "Warming over 30 years is assured, 20 years must be "very likely", 10 years I would certainly say "likely" but that is a bit of a rough estimate" you are much closer to what I am looking for. I am asking for somewhat less "roughness" in these estimates, and grounding them more quantitatively than this sort of hand-waving which is a common response.
Me:
you asked for more quantitative estimates, but did you read the link I provided, where such quantitative estimates were explicitly presented 6 years ago?

If, after reading that (and the two papers it refers to) you still have a question then feel free to follow up.


Roger:

If you think that I'm focused on 2020-2030 (the subject of the essay in Nature that you linked to) then you are not really paying attention.


Me:

Roger, you started off with "if global cooling over the next few decades..." (my emphasis) which remains on your blog even after several people have pointed out that it is a gross mischaracterisation of the Keenlyside paper. So I pointed you to explicit probabilistic predictions about the next few decades which are as clear as day about the probability of cooling over that time frame.

Now you say you are not focussed on the next few decades...

If you want shorter term, I'm sure you have already seen the Smith et al Science paper, within which 50% of years post 2009 are predicted to beat the 1998 record. But as you can see, this is still a rather young area of science, and Keenlyside disagree to some extent (although not as strongly as some have portrayed it - I think their 10y mean forecast could still validate even if we see some new records).

Roger again (not in response to me, but bringing up the topic again on his blog):


You can just clear all of this up by answering my original question:

What observations of the climate system to 2020 would be inconsistent (lets say at the 95% level of certainty) with the climate model projections of the IPCC AR4? It is a simple question. use global average surface temps from UKMET as the variable of interest if you'd like, since that is what we've been discussing, or use a different one.


Me, hopefully for the last time:
Stott and Kettleborough estimate that the global mean temperature in the decade 2020–30 will be 0.3–1.3 K greater than in 1990–2000 (5–95% likelihood range)
and
Knutti et al. find that the projected distribution of likely surface warming is independent of the choice of emission scenario for the next several decades; that the probable warming for 2020–30 relative to 1990–2000 is about 0.5–1.1 K (5–95% likelihood range)
(these being direct quotes from the paper I linked to earlier).

As I mentioned back then, I think these forecasts do have some limitations, but since I pointed them out to Roger 10 days ago it is more than a little tendentious of him to repeatedly insist that they do not exist, and furthermore to pretend that he's been met with nothing but dodging and evasion in response to his question about which observations over the next few decades would be inconsistent (at the 5% level) with the model forecasts. The reason why that paper specifically looked at decadal averages over a 30 year interval is because on this time frame the GW signal is clearly visible above natural variability, but its magnitude is not very sensitive to emissions scenarios (within reason). But it is a simple matter of reading off the graphs for anyone who wants a different forecast interval. However, it seems quite clear that Roger is more interested in pretending that the answer has not been provided, than in actually looking at it. He's avin a laff.

Also relevant: Eli Rabett and RC.

Sunday, May 11, 2008

This is a local hospital for local people - there's nothing for you here.

This is an ugly story which I heard on the grapevine, and which is unlikely to feature in the Japanese press (unlikely to be made into a bizarre comedy either). Recently someone here in Japan needed some medical treatment, so they found an official web-site listing local hospitals with English-speaking staff, and when they tried to go there, they were refused on the grounds of nationality - the head doctor had simply decided they were going to stop taking any foreigners! It was not an urgent case, and they found treatment elsewhere, but it's still a rather shocking reminder of how this sort of bigotry is casually accepted at all levels in society here. They may be desperate for foreign tourists to come and spend their money here, and for "guest workers" to come to prop up their economy (so long as they don't get big ideas about settling here, and go home after a few years), but a large proportion don't actually think foreigners are human, and a "foreigners not welcome" attitude, although thankfully rare (except when renting accommodation, where it is the rule rather than the exception) is still considered quite acceptable.

Kerosene-soaked man catches fire after trying to smoke at Nagoya police station

Friday, May 09, 2008

Comments about comments

So our comment, Nicola Scafetta's comment, Reto Knutti's comment, and Steve Schwartz' reply (combined to all of us) are all on line and some people seem to be getting very excited by it all. In his reply, Schwartz was quick to jump at Scafetta's suggestion that the "pertinent time constant" can actually be diagnosed as about 8y, or maybe 12y, and seems happy to admit that his original analysis (5y) was wrong. Unfortunately, the reviewer(s?) and Editor gave him free rein to present a completely new analysis, based on a new model - hardly the point of a Reply, I thought - which is pretty much just as bogus as the original although the numbers don't turn out quite as absurd. The basic point, that we made in our comment, is that it is trivial to check that such proposed "novel" analyses actually diagnose something useful for systems where the answer is known in advance (and furthermore, whether such extreme simplifications have some chance of capturing any useful information about complex systems), and such testing should normally be something that a researcher should have a go at themselves, before claiming to have overthrown a few decades of climate science. It will be no surprise to anyone who actually works in the area that the new analysis fails just as dismally as the last one, and for much the same reasons.

I can't be bothered with a detailed analysis so will just highlight what appears to be the most immediately fatal flaw in the whole idea. The analysis rests on the claim that the climate system can in fact be characterised by two well-separated time constants, and further that these can be diagnosed from the time series of global mean surface temperature. The foundation for this second claim seems to based on a test case where Scafetta attempts to fit his curve to the output of a synthetic time series with known properties. He used a time scale of 12y for the test, and gets an estimate of about 8y out (which matches his analysis of the real climate data), and thus claims that the real answer is either 8 ± 2, or 12 ± 3 - and apparently sees no irony in the fact that these two answers don't even overlap. There is no indication of what these ± values are supposed to indicate, and it is apparent to the naked eye that his lines (in all of his figures) are not actually best fits to the data. Despite this evidence of bias, he finally presents the the lower value of 8 as the "observed value" in his conclusions, only mentioning 12 as a "hypothetical" possibility even though his own analysis, limited as it is, indicates that higher value as a best estimate (after correcting for his estimate of the bias) and admits some additional uncertainty above that value.

So his test is kind of like what we did in our comment, except we did it properly, using a wide range of time scales (5-30y) in the synthetic time series and looking not only at the mean bias but also the uncertainty of a single replicate, in order to get a good handle on how the analysis performs. Of course, we found that both the bias and the uncertainty grow for the larger time scales, which follows directly from known properties of the (Bartlett's rule) method. In fact at the highest-valued time scale of 30y (corresponding to a sensitivity of 6C in this simple model) the analysis generates an estimate that is typically less than half the true value and quite possibly as low as 5y. The obvious conclusion is that you can't reliably diagnose the time constant of an AR(1) time series by this method unless the length of data set available is much much longer (by a surprising margin) than the characteristic time scale of the system. Adding in another two free parameters, as Scafetta does, can hardly improve matters, and neither does using the monthly data (even though Schwartz says this adds "many more independent data points", it clearly does nothing of the sort given a decorrelation time scale of several years).

Lucia in the posts linked above has taken it upon herself to check the accuracy of the method. Her latest post indicates at least that she is starting to look on the right lines, although she's not quite got there yet and her belief that the uncertainty "should drop dramatically" when she uses monthly data is mildly amusing. Eventually, if and when she checks the method against synthetic data with a really long time constant, she will probably realise that the output of the analysis is so inaccurate that it doesn't tell us much at all. And remember that this is in the best possible case where the data actually are generated by a system which exactly satisfies the hypothesis of a simple autoregressive system with white noise forcing. Once you consider that the real world is rather more complicated (both in terms of multiple time scales of response, and the strong but non-linear external forcing) it is a bit of a lost cause.

So, if Lucia does her homework properly, she ought to get there in the end. Whether or not she will have the manners to retract her stupid statement that "two of the criticisms are flat out wrong" (in our comment) remains to be seen.

Steve Schwartz also looked up a method (from Quenouille, 1949) that reduces the bias of Bartlett's Rule in estimating the autocorrelation coefficients. What he failed to observe in his reply is that although the mean bias is indeed reduced by this approach, the uncertainty gets larger, which (depending on the specifics) roughly compensates. That is, one can still get estimated time constants that are much smaller than the true value, but the modified method can also generate estimates that are much larger, with the estimated correlation often exceeding 1. But instead of any meaningful analysis about how this impacts his results, Schwartz prefers instead to waffle on about Einstein and electrical circuits, presenting yet another simple model (different from Scafetta's) without making any attempt to explore how well it can either be identified from the data (it can't) or matches any plausible model of the climate system (it doesn't). I would have hoped that any competent scientist could have worked that out for themselves prior to even bothering to submit this sort of stuff for publication, but climate science work in mysterious ways sometimes.

It is curious how sceptics are quick to dismiss numerical models that actually represent the broad details of the atmospheric and oceanic circulations reasonably well, in favour of some simple approximations that make no attempt to do so. The issue here of course is not whether the models are "correct", but whether a method that makes no detailed assumptions about the behaviour of the climate system (only really requiring that it conserves energy) can actually diagnose the behaviour of any system, simple or complex. Schwartz' and Scafetta's various methods fail dismally on all counts.

Thursday, May 08, 2008

Jumping on the betting bandwagon

I have been waiting for the RC take on that Keenlyside et al paper. To be honest I had been wondering if they were going to duck the debate, having quickly decided that they couldn't find anything good to say about it. So I'm pleased to see that in fact they were doing some behind-the-scenes checking with the authors (to make sure that the media coverage was accurate) and have now issued a bold challenge offering to bet against the prediction of "slight cooling relative to 1994-2004 conditions".

No-one who has read my comments will be surprised to hear that I strongly favour the RC side. Indeed I just recently made another bet that is rather more confident of a more significant warming by 2011 (which I don't consider to be a sure thing, but do consider to be in my favour). I see that even William Connolley has been tempted to come back from retirement to get a piece of the action!

Keenlyside and his colleagues can hardly refuse the offer given their confirmation of the reported statements (at least it wold be a humiliating climb-down for them to do so). I hope they will learn a useful lesson - and that other scientists who are tempted to make extravagant claims (in order to get their papers into Nature?) may also think twice about the risk of having their bluff called so publicly. It's one thing making essentially unfalsifiable claims about 100 years of change (since we won't be around to see the results) but quite another to say something meaningful about the next few years!

Wednesday, May 07, 2008

o noes!

All I do is place a modest bet for some warm temperatures and a socking great big volcano goes and erupts. Well at least I didn't put much money on it.

Actually, it's not all doom and gloom. There are several factors that weigh against it really making too much difference to my probability of winning. First, it is not really that big yet, although it may get worse before it gets better. Second, it is at a fairly high latitude (42S) so the plume may not spread over the tropics where it would have most effect. Third, it is going to be winter down there for the next few months so there isn't much sun to reflect anyway. Fourth, these things usually don't have much effect past the first year, and I'd already basically written off 2008 due to the coolish start it's had. By 2009, let alone 2011, it may well be ancient history.

Probably someone is already running some predictions of the effect it is likely to have. I'd try it myself it I had the necessary tools. Of course it's having plenty of effects right now for the people who actually live there.

There are some impressive pictures here and here.

Saturday, May 03, 2008

Train wreck on Wikipedia: Confidence interval

There was I, minding my own business as usual, when I chanced upon the Talk page for Confidence interval - Wikipedia. There's some odd stuff going on there...

I freely admit that I was confused about Bayesian and frequentist probability a few years ago. In fact I wince whenever I re-read a particular statement I made in a paper published as recently as 2005 - no, I'm not telling you where it is. In my defence, a lot of stuff I had read concerning probability in climate science (and beyond) is at best misleading and sometimes badly wrong - and hey, the referees didn't pick up on it either! But really, given some time to think and some clear descriptions (of which there are plenty on the web) it is really not that difficult to get a handle on it.

A confidence interval is a frequentist concept, based on repeated sampling from a distribution. Perhaps it is best illustrated with a simple example. Say X is a unknown but fixed parameter (eg the speed of light, or amount of money in my wallet), and we can sample xi = X+ei where ei is a random draw from the distribution N(0,1) - that is, xi is an observation of X with that given uncertainty. Then there is a 25% probability that ei will lie in the interval [-0.32,0.32] and therefore 25% of the intervals [xi-0.32,xi+0.32] will contain the unknown X. Or to put it another way, P(xi-0.32 lt X lt xi+0.32)=25% (and incidentally, I hate that Blogger can't even cope with a less than sign without swallowing text).

Note that nothing in the above depends on anything at all about the value of X. The statements are true whatever value X takes, and are just as true if we actually know X as if we don't.

The confusion comes in once we have a specific observation xi = 25.55 (say) and construct the appropriate 25% CI [25.23,25.87]. Does it follow that [25.23,25.87] contains X with probability 25%? Well, apparently some people on Wikipedia who call themselves professional statisticians (including a university lecturer) think it does. And there are some apparently authoritative references (listed on that page) which are sufficiently vague and/or poorly worded that such an idea is perhaps excusable at first. But what is the repeated sample here for which the 25% statistic applies? We originally considered repeatedly drawing the xi from their sampling distribution and creating the appropriate CIs. 25% of these CIs will contain X, but they will have different endpoints. If we only keep the xi which happen to take the value 25.55, then all the resulting CIs will be the same [25.23,25.87], but (obviously) either all of them will contain X, or none of them will! So neither of these approaches can help to define P(25.32 lt X lt 25.87) in a nontrivial frequentist sense.

In fact in order for it to make sense to talk of P(25.32 lt X lt 25.87) we have to consider X in some probabilistic way (since the other values in that expression are just constants). If X is some real-world parameter like the speed of light, that requires a Bayesian interpretation of probability as a degree of belief. Effectively, by considering the range of different width confidence intervals, we are making a statement of the type P(X|xi=25.55) (where this is now a distribution for X). The probability axioms tell us that

P(X|xi=25.55)= P(xi=25.55|X)P(X)/P(xi=25.55)

(which is Bayes Theorem of course) and you can see that on the right hand side we have P(X), which is a prior distribution for X. [As for the other terms; the likelihood P(xi=25.55|X) is trivial to calculate, as we have already said that xi is an observation of X with Gaussian uncertainty, and the demonimator P(xi=25.55) is a normalisation constant that makes the probabilities integrate to 1.] So not only do we need to consider X probabilistically, but its prior distribution will affect the posterior P(X|xi=25.55). Therefore, before one has started to consider that, it is clearly untenable to simply assert that P(25.32 lt X lt 25.87) = 25%. If I told you that X was an integer uniformly chosen from [0,100], you would immediately assign zero probability to it being in that short confidence interval! (That's not a wholly nonsensical example - eg I could place a bag-full of precise 1g masses on a mass balance that has error given by the standard normal distribution, and ask you how many were in the bag.) And probably you would think it was mostly likely to be 25 or 26, and less likely to be more distant values. But maybe I thought of an integer, and squared it...in which case the answer is almost certainly 25. Maybe I thought of an integer and cubed it... In all these cases, I'm describing an experiment where the prior has a direct intuitive frequentist interpretation (we can repeat the experiment with different X sampled from its prior). That's not so clear (to put it mildly) when X is a physical parameter like the speed of light, or climate sensitivity.

But anyway, the important point is, the answer necessarily depends on the prior. And once you've observed the data and calculated the end-points of your confidence interval, your selected confidence level no longer automatically gives you the probability that your particular interval contains the parameter in question. That predicate P(xi-0.32 lt X lt xi+0.32) is fundamentally different from P(25.23 lt X lt 25.87) - the former has a straightforward frequency interpretation irrespective of anything we know about X, but the latter requires a Bayesian approach to probability, and a prior for X (and will vary depending on what prior is used).

The way people routinely come unstuck is that for simple examples, those two probabilities actually can be numerically the same, if we use a uniform prior for X. Moreover, the Bayesian version (probability of X given the data) is what people actually want in practical applications, and so the statement routinely gets turned round in peoples' heads. But there are less trivial examples where this equivalence comes badly unstuck, and of course there are also numerous cases where a uniform prior is hardly reasonable in the first place. [In fact I would argue that a uniform prior is rarely reasonable (eg at a minimum, the real world is physically bounded in various ways, and many parameters are defined as to be non-negative), but sometimes the results are fairly insensitive to a wide range of choices.]

Fortunately a number of people who do seem to know what they are talking about have weighed in on the Wikipedia page...

Friday, May 02, 2008

Woohoo! Corbyn nails it!

Well, he got it right one month in a row. Honestly, I'm impressed. I'll be even more impressed if he gets it right for the next few months, though. For the record, April was close to average temperature and just above average rainfall, both comfortably within his predicted ranges. So now his score is up to 4 out of 8 for the year, which is still not close to his claimed 80% success rate, but slightly better than it was last month. Since it is late on a Friday night, I'll leave the binomial probability thing as an exercise for my readers :-)

Corbyn thinks that May should be about average for rain (90-115% of normal) but rather chilly, at 0.5 - 1C below average for the month. There aren't any exciting weather events forecast (you can't go far wrong with "variable: some showers" in May).

"What observations would be inconsistent with climate model predictions?"

Roger Pielke keeps on posting the same question, and keeps on ignoring the replies, so rather than just posting it again in his comments I'll write it out in full here.

First, I should make it clear that I don't accept the true/false dichotomy implicit in his question. Is the Newtonian view of gravity "falsified" by relativity? I suppose so, but in practical terms for everyday applications, it does just fine. Is relativity "falsified" by the Pioneer anomaly? Maybe, at some level. I wouldn't like to bet on that one. No-one is going to "falsify" the fact that CO2 absorbs LW radiation - that doesn't make this statement an act of "faith", it simply makes it true.

That said, some observations would strongly modify our views on the impact of this on the Earth's climate. Most obviously, sustained cooling on the multidecadal time scale would greatly change our estimates. (I'm ignoring the theoretical possibility of major external shocks such as meteors, volcanoes, or nuclear winter). Warming over 30 years is assured, 20 years must be "very likely", 10 years I would certainly say "likely" but that is a bit of a rough estimate. Note that despite the press coverage of the Keenlyside et al work, they don't actually predict any cooling in the future (although they do seem to think it cooled from 1990 to 1998)! Of course there are also upper limits to the expected warming trend (of roughly double the central model projections, at the same level of confidence as I've given the warming/cooling threshold). But I've seen presentations which explicitly draw attention to the possibility of ~5y cooling trends even with a strong background warming, so obviously on the very short time scale we can't expect global mean temperatures to tell us anything conclusive or even highly informative. After all, global temperatures decreased by a whopping 0.22C/year only 10 years ago (1998 to 1999) - that's a rate of 22C per century!! Scary ice age is going to kill us all!!! Not.

I'm sure Roger will find some way of not reading or understanding this. After all, I wrote it in the first comment to his post, and then posted it again in direct answer to his question, and he still ignored it (and according to a comment, Gavin said the same thing previously). I could do a detailed calculation about the probability of different trends over the next 30 years, but that's already been done. FWIW, I think there are some problems with the work described in that article, and intend to have a go myself shortly, but I don't expect to see any really large changes - the issues are similar to those in the climate sensitivity stuff I've talked about before, but I expect the impacts to be smaller in the transient case.

Interesting fact in passing: if the 2008 anomaly is merely +0.3C or greater, then the newest 10y trend 1999-2008 will actually get steeper compared to the current last 10y trend (which is already positive, despite starting in 1998). I predict that were this to occur, some people will start looking hard at 11 or 9 year trends :-) Currently 2008 is running a little colder than 0.3, as Jan and Feb were very cold (but still warmer than the 1961-1990 baseline). But the March anomaly is up above 0.4....

Comment on Schwartz: final version

It's been finally accepted, after a rather Byzantine review process, but not such a long delay as seemed possible at one time. There were no real substantive changes to the original version, but we took out the reference to monthly data as Schwartz hadn't actually made any claims about it. The pdf is now up on my home page. I've no idea how long it will take to appear in print.

Was it worth it? Well, on the plus side I get a Mann and Schmidt numbers of 1 (and jules gets a 2!), but I doubt there were any scientists who have been on the edge of their seats for the last 6 months wondering if the whole theory of climate change was about to come tumbling down :-)

Thursday, May 01, 2008

Another decadal prediction...

...is appearing in Nature tomorrow, accompanied by some rather odd press (and blog) coverage. The paper itself also seems a little odd to me (but not as odd as the coverage). The authors have nudged sea surface temperatures to observations, which is probably the simplest plausible first step in coupled model initialisation (and I think has been used in seasonal prediction for some time). Mostly, they are looking at regional results (and predictions), especially the North Atlantic, but they also include some global analysis. For this, they are predicting very little change in mean temperature for the next few years, after which the trend will revert strongly upwards. Roger Pielke somehow saw "the world may cool over the next 20 years" which seems comprehensively contradicted by what the paper actually shows in their Fig 4. I've posted a comment on his blog.

Curiously, the global temperature prediction is contained in a graph which also appears to show (confirmed in the supplementary info) that the free-running model integrations actually tracked historical temperatures better than the nudged ones. This does not look like a good sign to me! In fact they only seem to show that their system has some skill in representing the phase of natural oscillations in some areas, and (unless I've missed something) never actually claim that it has any skill at predicting mean global temperatures (hence their use of correlations rather than the more usual RMS errors). They also have a rather odd graph of the IPCC results, which seems to imply that these models predicted a ~0.3C mean rise over the current decade (and the lead author quotes that value in the Telegraph article). I have not read the paper carefully enough to work out how they managed that, since as I just showed yesterday, the IPCC models on average generate a very linear response (and it's just under 0.2C/decade currently). It may be something to do with how they splice the 20th century simulation on to the A1B scenario projection - eg there could be an abrupt change in forcings, since the scenarios originally started in 1990. But anyway, that doesn't seem like a very fair comparison.

I suppose I should interpret this paper as implying that I'm less likely to win my bet than I thought yesterday. It would be biased of me to cling to Smith et al. as support and simply choose to ignore this less convenient one. But I honestly think Smith et al did a better job at demonstrating the value of their method, especially for predicting global temperatures over the 5 year time scale which is most directly relevant to me. It does seem that there is (almost certainly) a strong contradiction between the results, so one of them has to be wrong!