Making Non-Personal Data Public Is An Idea Whose Time Has Come

  • Non-personal data sets have great economic value, both for existing and upcoming businesses.

    There is an urgent need for regulation to be brought in on sharing non-personal data.

In a country of 1.3 billion people, with millions moving to the cities each year, there are innumerable opportunities for entrepreneurs. Already, the likes of Zomato and Swiggy have inspired chains of cloud-kitchens in cities like New Delhi, Pune and Bengaluru. The success of Uber in the Indian market has inspired other local delivery and transportation services.

However, most of these companies depend on the abstract and unquantified potential of the Indian market. Their investments are driven by the success of other first-movers, and not by elaborate data insights or their own research. “If that idea worked, ours can/should/will too,” is the go-to plan for most startups, as of right now, across industries.

This is where non-personal data can help.

Cab-hailing app Uber has hosted over a billion riders across the world. For each active rider, Uber stores data corresponding to the trips, like the frequency, the route, the traffic on the route, the most active spots within a city for drivers, and so on.

Now, if Uber were to put all the data gathered, across the world, and segment it under different cities for students, policy experts, traffic management experts, and local administrations to study, not only it would enable enhanced decision-making but also drive better urban development, and that is exactly what they did with Uber Movement.

Uber Movement, an online tool, allows users to access data about travel times across different cities and their localities. The data is gathered from more than 10 billion rides taken across the world and can be used for urban transportation and management solutions. As of today, the data is free and can be accessed via their website.

This is a basic application of non-personal data (NPD). Any data that does not trace back or identify an individual is termed as non-personal data. While the macro view of travel times does originate from every single rider ever registered within a said city, the data cannot be used to identify any single rider or Uber user.

NPD is not about your cab or food delivery alone, for it can extend to the data from an e-commerce platform, public health systems, or any other digital activity where humans are receiving or offering services.

Another category of NPD is data collected from machines, for instance, on climate, agriculture, industrial machines, and so on. In both cases, relevant data is anonymised to ensure no individual can be tracked down or identified.

Then, there is the final category that contains both personal and non-personal data, known as mixed data sets. However, given this category also has personal data, laws applicable to personal and sensitive data must cater to mixed data sets, irrespective of the significance of the NPD within these sets. This may well change in the late 2020s as data from smart devices become critical to decision-making on various fronts.

NPD sets have great economic value, both for existing and upcoming businesses. In a country like India, where the scale of business in terms of people, socio-economic variations, and regional and geographic variations enable the production of precise data sets with multiple perspectives and use-cases, there is an urgent need for regulation for a number of reasons.

The data gathered by the likes of Amazon, Walmart, Uber, Google, Apple, and so on, must be available to Indian companies. Not only this would enable them to serve local users, but also ensure access to micro, small and medium enterprises (MSMEs) in various businesses.

By not allowing access to the NPD sets, it does become hard for an outlier to venture into a business dominated by conglomerates. This shall ensure both competitiveness and development of MSMEs.

However, do these same companies owe or have an obligation when it comes to sharing NPD sets? There are a number of unaddressed concerns here.

First, what if the data sets that are made public are reverse engineered to track or identify individuals. For instance, a travel time NPD set may not amount to much but supplemented by other NPD sets from a smart city initiative, e-commerce and food deliveries, and any independent public health system can be used to single out a community, area, or even a locality, thus enlarging the prospects of identification.

Two, given how critical data is going to be for numerous business interests going forward, there is also the question of intellectual property rights (IPR) and business secrets. The importance of IPR in the NPD data cannot be underplayed.

Already, China has a towering example of how things can go wrong if enough emphasis is not laid on IPR. Therefore, businesses that choose to invest heavily in data research, analytics, processing, and collection may rightfully be apprehensive about making it public. Also, with many players in the game, where does one draw the line when it comes to deciding the intricacies of data collection? In the end, the question that remains unanswered is where do we stop?

Governments, however, are thinking beyond these risks.

In November 2018, the European Union introduced a new regulation that went into force earlier this year in May. The aim of the regulation was to eliminate the obstacles in the storage of NPD by allowing companies to store data anywhere in the EU, enabling cross-border transfer and exchange. Seeing the entire EU as a single digital economy, the regulation allowed the use of NPD for business development.

In 2017, a telecom paper in India proposed the creation of a ‘sandbox’ that would gather anonymised data that can be used for the development of new products. In 2018, government think-tank NITI Aayog argued, in a paper, in favour of sharing NPD sets of good governance and planning.

The committee under Justice Srikrishna also highlighted the importance of NPD sets. In September 2019, a committee was constituted under K Gopalakrishnan to come up with a data governance framework for NPD.

The biggest challenge for the Gopalakrishnan committee would be to balance the naysayers and optimists when it comes to the accessibility of NPD. While there is obviously the risk of IPR violations and reverse engineering NPD sets to identify individuals, there is also tremendous economic value to it.

What the committee and the government of India must understand is that NPD would require an evolving data framework.

In the 2020s, when the nation shall move from smartphones to smarter devices, and from 3G and 4G to 4G and 5G, the framework will have to evolve alongside.

The evolving framework must ensure that the data accessed is used for progressive means, the users meet all the necessary security requirements (if any), and the users also sign an undertaking for which the data accessed shall be used. The framework must advocate the creation of a data-sharing ecosystem where everyone can be held accountable.

For now, the committee can start working with the basics, with data from the likes of Uber Movement. However, going forward, and for the greater good, the focus must be on a framework that ushers in data accessibility with accountability on the part of the ones who choose to use it.

It’s an intricate balance, but one that must be achieved for the socio-economic potential it holds. NPD sharing is an idea whose time has come.