Data Solutions
Data Systems in India: The Case for an Audacious Overhaul
Adarsh Mathew

This blog is an outcome of a discussion on the current state of data systems in India and the case for an audacious overhaul in light of 'Digital India' initiative with Mr TCA Srinivasa Ragavan. Mr Raghavan is a Senior Associate Editor in the Hindu Business Line. He has been a consultant to the Reserve Bank of India's history project, advisor to the Director of ICRIER. He is also a Distinguished Fellow of the Institute of Peace and Conflict Studies.

Amidst the current data revolution and the popularity of terms like 'Big Data' and 'Data-driven policymaking', the need for reliable, robust data is often cited as a pre-requisite to design programs and policies that are both efficient and impactful. Data availability is a major concern for analysts and policymakers, especially in the social and human development sectors. Reports suggest that data availability for the Millennium Development Goals across the globe has never been more than 70% - of which close to 30% have been modelled estimates. Independently verified and validated data accounts for 10% and the remaining 30% is government data. With most data being self-reported and not sufficiently verified, reliability of macro indicators is often brought into question.

The Sustainable Development Goals aims to track 170 indicators over 15 years for 17 goals, starting from 2016. A lot of this data is not collected at the national level, but at sub-national and municipal levels, where the systems in place for tracking and validation of data are not robust, to say the least. The problems of self-reporting and validation extend here too. An organic accumulation of these errors and inefficiencies makes one question the reliability of this data and its utility for policymaking and evaluation purposes.

Data and the Development community have always shared a close relationship. The thrust on the collection of human development indicators across the globe arose from the need of international aid agencies and governments to assess the impact of aid programs and investments. As Mr. Raghavan pointed out, this was predicated by tax-payers in donor nations calling for greater accountability on all government spending, including international aid. He added that this insistence on data collection had a seismic impact on economic thought—the discipline moved from theorizing about the economy to one that was measuring and modelling economic activities. This data-centric overhaul of economics and business at large was further amplified by the strides made in computing capacity, making large and complex computations easier. But it was in the 1990s that the data-centric approach to analysis witnessed increased penetration and acceptance. The personal computer improved access to powerful computation capability. This and the increased digitization of transactions - creating millions of data points a day - formed the critical elements that marked the onset of the data revolution as we now know it.

Harnessing these technological advancements and improvements in data-driven methodologies for the social sector require the understanding of a critical question: Who is the user? Who would use this data? And how would it immediately benefit them? Mr. Raghavan adds that it's important to identify the micro or even nano-level user of this data. For public activities in India, data is critical at the operational level—for the municipal workers who clean your streets, for the traffic controllers who look to optimize vehicular flow. They need localized, timely data to do their job efficiently. The MPLAD scheme is another example highlighting the importance of data. District Collectors need to allocate limited resources among competing priorities, and they need robust data on where that money will be best used to justify expenditure estimates and generate savings. He suggests targeting and exploiting these constraints to initiate programs that implement reliable and robust data collection mechanisms.

Regarding data privacy, Mr Raghavan recommended avoiding over-regulation and instead, make anonymous very precise and minimal set of data metrics to maintain data security and privacy. He quoted the Pareto principle - if the availability of a particular data point would result in the gain of a single individual without any loss to another, then the privacy clause need not apply to it.

"Data wins elections," says Mr Raghavan. This and the associated political economy of data point to an increased acceptance of the need for data - even if there is no consensus yet on what needs to be done to collect the kind of granular, timely data that would be of value to stakeholders. The private sector has set the benchmark on data collection standards, motivated primarily by the need to achieve greater sales and market efficiencies. While there might be significant gains to be made by designating all data as a public good and treat it as a global commons for consumption, pricing of datasets may ensure greater resources towards validation efforts, and the market would be willing to pay for reliable data. Mr Raghavan believes that designing systems to ensure data standards at the governmental level is the critical challenge that developing countries face.

The design and creation of robust national data systems pose a systemic challenge to governments of developing countries. It is in this light that the Global Partnership for Sustainable Development Data has been launched on the sidelines of the recently concluded UN General Assembly. With a self-professed goal to " data-driven decision-making by catalyzing more open, new, and usable data (to achieve the SDGs)," this partnership seeks to commit financial and technical resources to strengthen data and statistical systems in emerging economies by leveraging new-age technologies and data sciences. Building the capacity to generate, share, and use data at multiple levels of governance is important to fill these data gaps, and support multi-stakeholder data initiatives that harness the data revolution will be required to achieve that.

India needs these systems too, as we move towards greater industrialization and seek to provide wider and more efficient services to our citizens. Mr Raghavan suggests that the primary task should be to raise the profile and awareness levels amongst the bureaucracy, elected officials and citizens about the relevance of data-driven decision-making processes. And once having convinced them of the need for it, it should be followed up by looking to implement projects of mutual interest which set in place systems to collect and disseminate data effectively, following global standards and practices. This gives us a great opportunity to take stock and overhaul our systems and make it more relevant and responsive to domestic development goals. It allows us to enhance the data literacy of the entire government machinery by investing in data dissemination, visualization and decision support systems. We could make citizen engagement integral to policy and development programmes through timely and relevant distribution of data, and allowing open-access to data sets for anybody to analyze and make localized gains. Data allows us to do so much more.