Friday, November 05, 2010

State of the Blogosphere - needs a lesson in stats

Technorati's Annual  State of the Blogosphere report used to be good when David Sifrey was producing it.  However that's not happened since 2007 and the quality of the 2010 report is not very good at all.  In fact it includes some very basic mistakes - and this post is by way of a health warning!

Back in October I highlighted that there were problems with this year's survey - see Making a Mark: Problems with the 2010 State of the Blogosphere Survey.  The main problem was that it contained the sort of mistakes which get made by undergraduates the first time they get loose to do a proper survey.  The software was c**p and the questions did not appear to have been tested as they were faulty.  I never finished it because it was so bad - and I heard others who made a similar decision for similar reasons.

However Technorati - minus David Sifry - this week published the results of the survey - and guess what, this year they've decided to focus on women.  Apparently we hold sway and have influence when it comes to marketing products and brands!

These are the links to the reports
Count the mistakes!

The author Jon Sobel - reveals no experience of stats in his bio.  I'm not surprised.

Here's the mistakes I've spotted so far:
  • the analysts/author fails to understand the difference between a "population" as in "all bloggers" and a representative sample frame.  they also don't appear to understand that it'a actually quite important to know something about the people who (a) get sent a survey (b) how many read in the language it was written in and (c) who responds to it.  Statistics are representative when the sampling frame is sound.  This one fails to persuade.  The very fact that there is no reference anywhere in the report as to how they ensured the sampling frame was either representative, or at the very least random in the statistical sense, makes me think it never occurred to them that this might be a good idea.  
  • There's a vast amount of difference between me reporting that readers of this blog think..... and using a very similar approach (here's a survey, please fill it in!) but extrapolating to making statements about the state of the blogosphere as a whole.
  • Anybody spot a small problem with this pie chart (see below) which appears on Day 1 of the Technorati report?  I've given you a hint.....the question posed below is mine.  Plus whatever happened to China and India - did they drop off the planet?  Maybe somebody forgot to make the survey accessible in other languages.  The bottomline - the blogosphere is not all English speaking bloggers!!!!

  • Plus we also get nonsense statements such as.....
As blogging is now firmly a part of the mainstream, we see that the average blogger has three or more blogs
  • I think he means that the very small percentage of bloggers who blog for money (corporate/self-employed) have an average of three or more blogs.  The very fact that the table in the Day 1 reports suggests that all bloggers have on average two blogs points up for me just how biased the sample is. I spend a lot of time looking at other people's blogs and checking their bios and I can confidently report that the vast majority of women bloggers I encounter have precisely one blog!
  • There's a question which looks at reasons why people might be blogging less which completely ignores the current state of the economy and the fact that for many people there are other things which are rather more important right now
  • He confuses bloggers and blogging teams.  Thus he reports that the Top 100 bloggers generate almost 500 times the articles as all bloggers (ie 470 posts) while forgetting to realise that the top 100 blogs are most likely to be written by blogging teams.  An average of 470 posts a month equates to 15 posts a day.  I've never ever seen a lone blogger produce 15 posts a day except very short ones when they are specific events over a very short period.
  • He makes the statement that "33% of bloggers reported having worked for traditional media" without stopping to blink and think!  This is just plain wrong - and indicates how biased his sample is.  The conclusions drawn in the report starts to look very weak when you realise some of the statements made might have been very self-serving.
That's just Day 1!  I've come to the same conclusion as I did when doing the survey - I can't go on......

Looking beyond that, the stats and conclusions might be interesting but they will definitely be  undermined by poor survey practice.  For example - in his wrap-up conclusion he states
The influence of women and mom bloggers on the blogosphere, mainstream media, and especially brands has never been higher.

Whereas the reality is that none of his stats establish this since all relate to Mom bloggers alone.  I think he may be in need of a bit of education from the feminist movement!

Basically the results might be representative of a group of people but we have no idea how that group is made up.  The indicators are that it includes very many more professional/business oriented bloggers than one would expect to see in a sample - as in WordPress is the most popular blog hosting service, used by 40% of all respondents which is just not true of the majority of ordinary bloggers.

There are blokes out there chunking up the report and eagerly producing digests (eg see Essential Statistics from Technorati’s State of the Blogosphere 2010).

I wish they'd looked a bit harder at what was being said.  They might then have realised that the content is not that distant from hot air.

The original and still the best

If you want to read the original and the best go to David Sifry's one page blog which lists all the reports he produced.

Please come back David and do your report again - I want to read a report written intelligently about some good quality stats!

No comments:

Related Posts Plugin for WordPress, Blogger...