CMIE Survey Limitations May Bias Unemployment Data, Says Pronab Sen
Amid the Covid-19 crisis, which led to significant job and income losses, researchers and policymakers turned to unemployment data compiled by the Center For Monitoring Indian Economy to understand the extent of damage. In the absence of official high frequency on unemployment, this became the go-to data set to assess the impact of the crisis, particularly on the informal sector.
But how reliable is that data?
A recent column by Jean Dreze and Anmol Somanchi asserted that CMIE's Consumer Pyramid Household Survey was biased towards better-off households and the bias was rising over time. In particular, a concern was raised whether sampling is too focused on 'main streets' within towns and villages where the better-off may reside, while under-representing those that may live in the interiors.
CMIE's Mahesh Vyas defended the data saying that while there are limitations to the survey, there is no bias.
Still, the concerns raised have stirred up a debate on what this implies for the unemployment data, which is collected alongside the CPHS. Explaining the methodology of its unemployment data, CMIE, on its website, says, "The individuals surveyed are members of a panel of households included in CMIE's Consumer Pyramid survey." As such, any shortcomings with the CPHS data will impact the unemployment data as well.
In a conversation with BloombergQuint, former chief statistician of India Pronab Sen said the limitations of the CMIE data could mean that unemployment rates must be interpreted with caution.
Excerpts from the conversation:
What are your views on the concerns raised by Jean Dreze regarding the sampling of CMIE's Consumer Pyramid Household Survey?
The basic argument made by Dreze is that there is a bias towards higher income groups.
Now in case of the pandemic, certain kinds of income earners were worst affected. If in the sampling, the worst impacted are left out and these figures are then extrapolated for the country as whole then there is a bias.
The sample used by the National Sample Survey Office is not simple. It is a "stratified random sample". House listings are categorised into rich, middle and poor and samples are drawn from all of these categories.
CPHS draws a simple random sample. When you draw a simple random sample and select every seventh or the eighth house, for instance, it does not solve the problem [of covering all income groups].
Circular random sampling [which includes all households by going around a village, for instance] is a popular method as it covers all income types. If it is a linear random sample [where you may collect along a single street] then it won't. In case of the CPHS, I don't know what kind of sample is being collected.
Dreze says that households that are poor will be away from the central street and hence will not be adequately captured. The rebuttal by Mahesh Vyas does not quite answer this.
Does that mean that CMIE unemployment rate data also suffers from a similar bias?
It can. But one needs to be a little careful on that.
The question that needs to be asked is, during the pandemic, who are the ones most affected in terms of employment. If we look at urban India, people with salaried jobs were not affected as much. Some of them of course, were affected, such as those who work in MSMEs. By and large, the salaried class was less affected. The casual workers and the daily contract labour were much more affected. In rural areas, in a similar trend, landowners were less affected, while landless labour would be more affected.
If the survey does not capture the different income categories adequately, then yes it would bias the results.
Is there any available data set that the government can put out at higher frequency, even if as preliminary data, to get a sense of the unemployment picture?
On employment — no.
Employment data can be collected in only two ways. Either from the responses of employers or from household surveys. In the U.S., the high-frequency unemployment data is collected from payrolls. Essentially, it is collected from the employer.
In India, where 85% of your workforce is in the informal sector, you are not going to get that information.
We prefer to do a household survey because it is actually more accurate, simply because when you do enterprise surveys you don't know how many enterprises have died and how many new enterprises have been born. Enterprise data in an informal economy like ours is problematic.
So we go by household data. The CPHS is household data. They track households. So in that sense the CPHS is less objectionable. Now, there may be problems with the sample selection. That's a different issue.
In the absence of government high-frequency data on unemployment, is there any option but to rely on these private indicators even if they aren’t perfect?
That's the problem. What's the alternative? The Periodic Labour Force Survey was supposed to fill that vacuum. But, for various reasons it has fallen behind.
For the country as a whole — rural and urban taken together — the PLFS will release only the 2020 data next and that will happen only sometime next month. For rural India, the only data you have is pre-Covid. For urban areas, the PLFS was supposed to be quarterly and the results were supposed to be released within two months of the end of the quarter. Due to various reasons that has not happened so even the urban data is lagging.
If you don't have that information, what do you rely upon?
The government, for instance the Finance Ministry, has been talking a lot about the EPFO registrations as a proxy indicator. I think that's very biased. You are again talking about the salaried only.
Alternatively, you look at something like the CPHS.
Do you believe the EPFO data is of any use?
Frankly, not much. Although the Ministry of Statistics is compiling that, statistically, there are a lot of question marks.
Every few years, we get a committee to give us recommendations for more robust employment data, but even they have not recommended monthly unemployment data. Is that simply because a household survey is just that much more complicated?
The reason for that is that monthly data can be given if and only if you use what is called a panel survey — you take a fixed set of households. You cannot do this kind of monthly data with a cross sectional survey by and large.
Now, the PLFS is a hybrid — it's partly cross sectional, partly panel. When you do that to get that sort of high-frequency data is problematic.
Alternatively you can do exactly what CPHS has done, where it is a pure panel-based survey. The panels were selected five or six years ago when the survey was first started. The only question that they ask on a really high-frequency basis is on employment status and they do it telephonically, which is the only way they can do it. You can't send a guy to collect the data. You don't have time for that.
The problem is that our questionnaires are very complicated. Even in the PLFS, when they do the quarterly surveys, the questionnaire is fairly long. Now, these questions are very difficult to do telephonically.
CPHS focuses on a single data point. So in that sense it has been clever about it. It works if pure unemployment is your consideration.
The questions about the representativeness of the sample is a different issue altogether.
Can the government's Periodic Labour Force Survey take a leaf out of the CMIE survey to give us an official data set?
It can, but again, to be able to do that it will have to be a panel. It cannot be anything other than that and panel data has its own problems.
The nature of the problem with panel data is that even if the panel is totally representative at the point at which the sample was selected, over the course of time it becomes less representative.
Which is why the PLFS does an occasional panel, which means the panel keeps getting updated.
It is theoretically possible for the ministry to do the equivalent of a CPHS, which to ask short and simple questions to a panel by phone, and administer the full questionnaire maybe once a quarter.
However, telephonic interviews have their own problems and you have to be careful about it. Unless the data collector has established a relationship with the respondent, he is not going to get answers. You have to be able to actually visit and establish your relationship and build confidence in the mind of the respondent.
How useful would data of that kind have been in the current situation?
It depends. It's fairly useful for tracking what's happening to the economy.
The reason why I think it is important is that the employment data is the only data that is able to give a reasonable sense of what is happening in the informal sector. At the moment, for instance, we have no information on the informal sector at all. So if you find that workers in the informal sector are getting laid off, that's important news for policy.
Considering all of this, is our best bet still to tweak the existing PLFS?
Yes. Perhaps on a monthly basis we can do the existing panel, and ask a single question by phone.