The former TES journalist writes for NAHT on current education issues. The views expressed do not necessarily reflect those of NAHT
Digging beneath the surface of Ofsted’s “record rise” claim
What did the “record improvement” in schools’ Ofsted judgements, announced by chief inspector Sir Michael Wilshaw earlier this month, really mean?
It may sound world weary, but my first thought these days, on hearing of any rapid rise in results of any sort, is to be sceptical and to question whether the change is a function of the measurement system itself, rather than simply reflecting underlying improvements in what is being measured.
The news that the proportion of schools judged good or outstanding had risen by nine percentage points, to 78 per cent overall as measured by each school’s last inspection, though undoubtedly evidence of a very welcome trend for readers of this blog, was another such occasion.
With such a sudden rise since a new inspection system was introduced a year ago, I wondered, in particular, if the statistics might simply be a product of the categories of schools which Ofsted chooses to prioritise for its inspections, rather than of any of the various reasons - to do with Ofsted having galvanised the system in the past year - which Sir Michael put forward in the inspectorate’s press release.
Was I right to be so cynical? Well, having looked at the inspection figures, I am unsure. For, while it looks as if these improved figures may be in large part a statistical quirk - a reflection of the fact that Ofsted is choosing to focus its inspections on schools which previously were not adjudged good, and thus, in a sense, had more room for improvement than those already seen as good or outstanding – it is also true that there does seem to be a genuine movement of schools from satisfactory/ requires improvement to good.
But do the inspection judgement figures support, on their own, the assertions made for them by the chief inspector? With Ofsted making seemingly major claims on the direction of travel of school quality and leadership, and the impact of the inspectorate itself, it is disappointing that the evidence base in terms of causality seems so limited.
So, now to the detail. Ofsted’s press release (http://bit.ly/17RJzYT) centred on that welcome rapid rise in the proportion of schools now judged good or outstanding.
The release said: “78 per cent of schools are now judged good or outstanding – compared to less than [should it be fewer than?] 70 per cent a year ago.”
Sir Michael was then quoted as saying: “The unprecedented rate of national improvement that this new data shows is cause for celebration.
“Thanks to the work of dedicated teachers and outstanding headteachers up and down the country, England’s school system is making some genuine and radical advances. It means that thousands more children are getting at least a good standard of education. I am delighted to be able to come here and deliver the good news.”
The press release continued: “Sir Michael said he believed [my italics] changes to Ofsted’s school inspection framework that came into force 12 months ago was [were, surely?] clearly having a galvanising effect on England’s schools system.”
Sir Michael’s quotes continued: “Headteachers are using the ‘requires improvement’ judgement as a way of bringing about rapid improvement in their schools, especially in the quality of teaching. And the national improvement we are seeing is all the better for taking place under the terms of a more rigorous school inspection framework.
'I am determined to use the power and influence of inspection to improve our education system. The message from Ofsted is unequivocal – the acceptable standard of education in this country now starts at ‘good’.”
So, statistically, the press release hinged on that large improvement in the proportion of schools now rated good or outstanding at their last inspection. But my hunch was that the category of schools being selected for inspection was making this outcome more likely. And the inspection statistics, I think, show that this is right.
So Ofsted’s current philosophy is to concentrate inspection resources on schools which are felt to be in need of inspection. This is fair enough as a strategy, but it could have the perhaps counter-intuitive effect of biasing figures towards a sense of improvement (note1) . To put it very simplistically, if you concentrate inspection resources on schools nearer the bottom in terms of previous inspection judgements, there may be a sense in which the only way is up: there is greater room for improvement than in schools which were already doing well, in Ofsted’s terms. In addition, if the corollary of this is that previously good or outstanding schools are not being inspected in such numbers, so that their largely positive previous inspection result is allowed to stand for another year, it is possible to see how the overall total of good and outstanding schools may be likely to nudge up.
To put it another way, using an exaggerated example, if Ofsted had chosen only to inspect, in this past academic year, schools which previously had been adjudged inadequate or satisfactory, while leaving the stock of good or outstanding schools uninspected, then the overall number of “good” or outstanding schools would be almost guaranteed to rise, as some of the previously less than good schools would become judged as good or better, and none of the good or outstanding schools could lose their previous rating.
This is not quite what happened last year. But Ofsted’s figures do show that many more schools, among those inspected in the past year (note 2), came into these inspections with either inadequate or satisfactory/requires improvement previous judgements than had been adjudged good or outstanding.
The breakdown is as follows: 2,857 of those schools inspected in 2012-13 had previous judgements of outstanding or good, compared to 4,051 which had pre-existing judgements of satisfactory/requires improvement or inadequate.
This represents an oversampling of schools from Ofsted’s bottom two categories at last inspection, compared to the overall national total, and a corresponding undersampling of those in the top two categories.
To look in more detail, there were 2,014 schools previously adjudged good compared to 3,803 going into the inspection having previously been rated satisfactory/requires improvement. Indeed, two thirds of schools with a pre-existing satisfactory/requires improvement judgement were inspected last year, compared to only one in five among previously good schools.
Now consider what has to happen for the existing stock of good or outstanding schools to go up, or down. Previously good or outstanding schools either have to fall to a satisfactory/requires improvement or inadequate judgement, in which case the total number of good or outstanding schools will fall; or previously inadequate or satisfactory/requires improvement schools have to improve to good or outstanding.
In practice, the statistics show that relatively few schools move dramatically in inspection judgements from one inspection to the next, so the figures will hinge on how many good schools move down a notch to requires improvement, and how many previously “satisfactory” schools are becoming good.
The figures above for the types of schools inspected – 2,014 previously adjudged good versus 3,803 previously seen as satisfactory/requires improvement - made it much more likely that Sir Michael’s statistics for the total stock of schools judged good or outstanding at their last inspection would nudge up than down.
To show how this works, imagine that 50 per cent of the previously satisfactory/requires improvement schools were found, at their latest inspection, to be good or outstanding, with the rest either standing still or going backwards. 50 per cent of 3,803 schools is 1,902 (rounding up). If we think only about what is happening with previously satisfactory and good schools, this will mean a rise in the overall stock of good or outstanding schools so long as fewer than 1,902 previously good schools slide back to requires improvement or worse.
But 1,902 would be virtually all – 94 per cent - of the 2,014 schools previously adjudged good which were inspected in 2012-13. So the only way for the total of good or outstanding schools to fall, in the situation where half of those previously adjudged as satisfactory/requires improvement improve to good, would be for 94 per cent or more of previously good schools fall back.
Indeed, if more than 2,015 of the 3,803 – or 53 per cent – of the previously satisfactory schools improved to good last year, then the stock of good or outstanding schools would rise, even if 100 per cent of the previously good schools declined in rating.
So the types of schools inspected last academic year – as measured by the judgement of their previous inspection – made it more likely that the stock of good or outstanding schools would rise.
In that sense, then, my hunch was right.
Indeed, I’ve also tried to do a rough-and-ready calculation as to what would have happened to the total stock of good and outstanding schools if the schools chosen for inspection last year had not been so heavily weighted towards the less than good schools, but instead were chosen at random, ie with previously outstanding and good schools just as likely to be inspected as those rated less than good. By my calculations, instead of rising by nine percentage points, the proportion of good or outstanding schools would have remained static. (Note 3 )
However, it’s now important to look at a further break-down of the figures. Forget, now, about the imaginary scenarios above. What proportion of previously good schools actually did fall back to less than good? And what percentage of previously satisfactory institutions ended up at good or better?
Well, the most striking thing is that more than half – 58 per cent – of previously satisfactory/requires improvement schools improved to good (55 per cent) or outstanding (two per cent). By contrast, only 26 per cent of those previously adjudged good fell back to requires improvement (22 per cent) and inadequate (three per cent).
So, although only a relatively small proportion of schools previously adjudged satisfactory/requires improvement would have needed to have improved for the total stock of good or outstanding schools to rise, in fact the proportion, as well of the overall numbers, of such schools improving was much higher than the proportion of previously good schools declining.
So my hunch was only half right. Ofsted’s systems for selecting which schools to inspect did indeed make it more likely that Sir Michael’s figures would rise. Indeed, the stock of good or outstanding schools rose more during the last academic year than it would have done had the same numbers of previously good and satisfactory schools been inspected, so the selection of schools seems to have contributed a lot to the scale of the increase in good or outstanding schools overall. But, nevertheless, the proportion of previously satisfactory schools improving was very high, and certainly much higher than the number of good schools going the other way, so some genuine improving trend in the judgements does seem to have been in evidence.
So what has been going on? What has been driving it? Here, we are unfortunately in the realm of conjecture.
Sir Michael seems to be offering two possible explanations in the press release. First, he suggests that the improvements are down to “the [presumably, more effective than before] work of dedicated teachers and outstanding headteachers up and down the country”. Second, his own changes to the inspection regime, replacing satisfactory with requires improvement as a judgement, were having an effect, he suggests, saying: “Headteachers are using the ‘requires improvement’ judgement as a way of bringing about rapid improvement in their schools, especially in the quality of teaching”.
On the other hand, another plausible explanation, which I have heard suggested elsewhere( see http://bit.ly/15FqWpu) , is that inspectors have perhaps rightly viewed requires improvement as a judgement with bigger negative implications for schools than the old satisfactory verdict, and thus that they have been more reluctant to use the current judgement than they were in relation to its predecessor. Sorry, school leaders reading this, but the change in statistics, then, might again be a product of the changed measurement system, rather than any underlying change in the quality of what is being inspected.
So, in summary, having ruled out statistical effects from explaining all of the change, the reality is that it is very difficult to be sure exactly what has happened to drive the major move in the data which Sir Michael has described.
This surely suggests caution in terms of interpretation, especially following a major change in the inspection regime. A well-known saying in assessment circles is: “If you want to measure change, don’t change the measure”. That is exactly what has happened in this case, and yet big conclusions have been drawn by those presiding over the measurement system.
For a system presided over by a chief inspector who has a key purpose of telling it how it is in our schools system – spelling out whether provision is improving or not, and if it is, why it is – this is both surprising and disappointing.
It may be that Sir Michael has a huge array of anecdotal evidence suggesting both that Ofsted’s changes have spurred teachers on to greater efforts, and thus provoked an improvement in teaching quality; and that heads have responded as he says they have. If so, it is a shame that more information on this evidence was not put out as part of the press release, and that there is no note on the seemingly large effects that statistical sampling – the choice of schools to be inspected – could have on the overall figures.
As it is, this press release just looks very unscientific, with Sir Michael’s statements as to what has driven a major change in the data coming across as unsubstantiated assertions, at best, and cheerleading for his own changes – he is hardly an impartial observer in all of this, of course – at worst.
Ofsted is being seen by ministers as the model for a new inspection system for hospitals. But I don’t think doctors, supposedly steeped in the use of evidence in medical studies, would put up with this level of analysis from a chief inspector. And, and I may have said this in a blog here before, this lack of caution with regard to evidence is strange in coming from someone in Sir Michael’s position, both as he is a supporter of supposed academic rigour – I am afraid it is largely absent here – and, in general, because he is someone heading a schools system which surely should regard being careful with the interpretation of statistics as one of the key qualities it should promote in pupils.
Note 1: It could also have the effect, when we are not talking about improvement from inadequate/satisfactory to good or outstanding but about the overall proportion of school inspection judgements as a whole in any one year, of biasing findings as to overall school quality downwards, of course. In other words, if the chief inspector is choosing to focus most of his inspectors’ efforts in any one year on schools which, before the inspector’s visit, were seen as less than good, and then publishing an annual report on the findings of inspection reports on these schools, the danger is that it does not fully reflect the good work being done in schools as a whole, which will tend to have better pre-existing Ofsted reports than those sampled.
Note 2: I’ve based my analysis on Ofsted’s latest spreadsheet of inspection judgements, which gives the latest judgement for every school inspected. http://bit.ly/16fQrme My stats for schools inspected in the last academic year cover inspections from September 2012 to the end of June 2013 that feature in these spreadsheets.
Note 3: I assumed that the same number of schools were inspected last year, and that the success rates of schools in each of the four categories were the same as actually happened last year: ie the same proportion of previously outstanding schools went on to be adjudged outstanding; good; requires improvement; and inadequate, the same for previously good schools, and so on. But I changed the numbers of each school inspected so that this sample was not weighted towards those schools previously rated less than good, but rather was in proportion to the existing “stock” of schools in each Ofsted category. With around 70 per cent of schools already rated good or outstanding, my sample for inspection reflected this. My calculations would have seen the proportion of outstanding schools fall – compared to the situation at the start of September 2012 - from 21 to 19 per cent; good schools rising only marginally, from 50 to 51 per cent; satisfactory/requires improvement falling from 27 to 26 per cent; and the proportion of inadequate schools rising from 2 to 3 per cent.
Note 4: I probably should have put the Ofsted terms “outstanding”; “good”; “satisfactory/requires improvement” and “inadequate” in quotation marks in this piece. Given that they are used so many times in the text, however, I decided not to.