Issues - Open Data

Regarding the digitization of the administration, Romania is constantly among the last positions in terms of Digital Economy and Society Index (DESI) indicators. According to a recent study by the European Commission (Digital Governance Fact Sheet), based on data provided by Eurostat, Romania is ranked last at European level in terms of the number of citizens interacting digitally with the state (9%).

According to the data we collected in the interviews conducted with experts in the field, Romania had a growth period between 2013 and 2015 when the construction of open data infrastructure took place, which placed Romania in the Global Open Data Index in the first 15 positions in 90 states. At the end of 2018, however, Romania has fallen to the 24th position, and from the point of view of the DESI ranking, our country is constantly on the last or penultimate place.

Problem #1

The existing data are of very low quality

The quality of the data is very low and the minimum standards of formatting and coherence of the information are not respected. The minimum standards for the collection and delivery of public data contain a series of rules regarding the collection, consolidation and maintenance of information sets. The data should be presented in a way that is simple and easy to track, the format of the information in the tables must be kept consistent throughout the document, and the document must be delivered as a .csv, .xml or .json file in order to be processed by a computer. It is common for non-compliance with the agreed format of a data set to alter the information in it. For example, if the standard agreed in a document for the column dedicated to the calendar date is dd/mm/yy, any entry in the database in another format (such as dd/mm/yyyy) will affect the automatic analysis process of the information in the dataset). All these inconsistent practices lead to the perpetuation of a profoundly erroneous mechanism of collection and the difficulty of opening the data and transforming it into useful information. The lack of effective digital collection tools, in turn, contributes to the reduced data quality. If the platforms were configured in such a way as to introduce field limitations according to the standards, errors could be easily prevented. Without real penalties, compliance with data quality standards is often ignored, and the lack of specialists able to evaluate and fine-tune the errors leads to a perpetuation of an unsatisfactory state of the collected data.

Problem #2

There are no data analysis experts in the public administration

As in the case of the experts in the elaboration of normative acts, the lack of specialized human resources in data collection and analysis in the public administration is a reality faced by both the central apparatus and the local institutions, the latter being very vulnerable from this perspective. The low degree of competitiveness of the Romanian state on the labor market, as a result of the wage ceilings makes this type of specialist avoid an administrative function. This has the greatest impact on the quality of the data collected, on the coherence of the methodologies of collection and analysis and on the capacity of the institutions to publish data in open format. According to the interviews conducted during the research period, we also found that there is a bottleneck in the training of such specialists among the staff present in institutions because the human resource is very volatile, which means that it is very difficult to maintain a continuity in working processes with data.

Problem #3

Methodologies for data collection, processing, analysis and use are often not properly designed and implemented.

A consequence of the lack of experts in the field of public administration is, as mentioned above, a poor quality of data collection and processing methodologies.

Problem #4

Lack of consistency in the data collection process

Looking at the situation at a wider level, there is a problem in the lack of connectivity between institutions. With very few exceptions, each institution has its own taxonomy, which makes it almost impossible to correlate complex data sets and consequently, analyzing them contextually. An example of this is represented by the data collected on the phenomenon of domestic violence in Romania. At this time a case of domestic violence cannot be tracked between bodies such as the Romanian Police, the Forensic Institute or the General Directorate of Social Assistance and Child Protection, these being just three of the many potential contact points of a victim with the system. Due to the fact that the information collection forms and the processing methodologies are distinct, the institutions - even if they share data with each other - cannot easily consolidate the information. Unfortunately, the situation is similar even within the same institution with multiple directions that collect data individually.

There is another phenomenon that contributes to this major problem, namely the poor software services that the institutions use for information collection and processing. As a result of the lack of standards in the construction of digital solutions for the public sector, the tools that are generally available to the public administration are not constructed in such a way as to favor the opening of at least inter-institutional data. Thus, institutions, even if they intend to automate these processes, are trapped in development and maintenance contracts that make the process extremely difficult, if not impossible. In addition, the absence of strategic thinking when it comes to the construction of systems that use automatically or manually collected data sets causes the phenomenon to deepen in the long run.

Problem #5

Lack of standards for the delivery of public data and mechanisms to control their compliance

The publication of public data is regulated, in that there is an obligation to open them. However, data delivery standards are guides and recommendations, which makes compliance with them optional in some cases. Delivery standards say that there are three ways of publishing open data - either on the platforms of those who collected them, on the platforms of third parties (for example data transmitted by an institution for publication to or publication through a digital API service. About the latter method, the Open Data Publishing Guide of 2015 states:

“For institutions that provide data sets through an online API service, the benefit is that they can always provide the latest data to users, in some cases even in real time. The disadvantage is that providing this online API service means material costs and requires more advanced technical expertise than providing files containing datasets. (...) On the other hand, publishing data only through an online API may not be sufficient in terms of wholesale access to data."

Complementary to this blockade is also the lack of control mechanisms that, even if they were defined and implemented, would not be effective due to the fact that, as we said, the delivery standards are presented as recommendations, not norms. The same Guide also includes methodologies for opening data and reporting, created in accordance with the OGP rules, accompanied by examples of data that can be opened, but in reality, they are not implemented due to the lack of know-how and major consequences. for non-compliance.

Problem #6

Lack of a technical infrastructure for data delivery automation

At the level of central and local administration there is no capacity to implement a technical infrastructure to automate data delivery. The third way to open data, mentioned above, is through online API services (set of functions and procedures that allow connection to existing data in an application) that automatically transmit information in one or more directions. This would mean that, using such a service and provided that a data set is standardized, its publication in open format in the own platform of an institution, respectively the automatic transmission of data to the national open data portal and feeding the same information to another system that can correlate them with other data sets for analysis and processing can be done simultaneously and in real time. Unfortunately, the construction of these small services is individual for specific data sets and subsequently they require minimal maintenance, two functions that, at least on the level of the local administration, cannot be usually provided inside the institution. One solution to this situation would be to impose APIs on any software product that will be used by the administration.

Problem #7

Low usage of the available data compared to the potential

The result of all the bottlenecks reported so far is a minimal use of available data. This means that we do not have enough data to make a substantial analysis in many of the key areas in Romania, and where we have data, we do not have the certainty of their accuracy nor do we have the ability to analyze patterns or manifestations of phenomena in order to make the correct decisions for implementation or regulation.

Problem #8

The data are not transformed into useful information neither for the institutions, nor for the actors in the private environment nor for the general public.

In fortunate cases where we benefit from private, semi-open or open data, there is another bottleneck - they are not translated to be useful to both institutions and the general public. Also due to the lack of specialists at the institutional level, there is an inability to analyze this data to help any actor in diagnosing his activity and making informed decisions. For an untrained audience, a data set, even if it is complete and correct, does not convey information until it is simplified and contextualized. An ordinary citizen does not have the ability to consult a table with multiple variables and understand what the impact of those data has on his life. In turn, a journalist who wants to report on a topic of community or broad interest needs a specialist to help him analyze the datasets so he can draw the right conclusions. At this moment there are no bodies that have the responsibility to transform existing data into useful information.

A simple way of "translating" data to the general public would be, for example, an interactive visualization application built to process data sets and render them in a graphically representative manner.

Last but not least, in a cultural space that does not encourage decision-making based on hard facts, and in the context where decision-making power is almost always in the hands of people who have a keen interest in maintaining their own image in the eyes of the general public, the use data to their true potential and according to the correct methodologies is most often not a priority.

Problem #9

The low degree of confidence and the lack of collaboration between actors, both at institutional level and between the state and citizens, civil society or the private environment

As a result of the repeated situations in which the quality of the data sets available to the general public was insufficient to make it useful, over time the level of trust between the private and state environment, as well as the public or civil society and the state, has decreased. Even if the data quality at local and national level improves with the help of technology and political will, there will be a long period of accommodation and restoration of trust between entities active in the sector of society. The same phenomenon is also happening internally, however, where we can see a low degree of trust, even among institutions which, coupled with the lack of interest in opening data, lead to situations where the data are never transferred from one institutional actor to another.

Problem #10

Very low level of use of open data

If we analyze the data that is collected and opened – sometimes with great efforts – by some administrations, we will find that the level of use of this data is very low. Both because there is no confidence that the data is correct and usable, given the history of open data in Romania, and because it is very difficult to find. Another factor contributing to this phenomenon is that those who maintain the data sets and those who, in theory, should transform the data into useful information, do not have the data analysis know-how needed to put it into practice. One of the main categories of users of open data should be the media, which does not currently use existing data, and thus other audiences that might be interested in these sets - companies, start-ups, the general public - are unaware of their existence and do not use them. The effect is that those institutions that make efforts to open information to the general public will no longer be encouraged to continue or invest in expanding the number of data sets they open.

Problem #11

Lack of political will to increase the openness of public data in some sectors

Last but not least, the current situation of open data in Romania is directly linked to the direct interest of the institutional actors to open information to the public. Even though the main reason is the lack of capacity (from the point of view of specialized human resources and technical know-how) to open the data, many of the situations indicate a fear of this practice because the data could reveal poor performance or critical situations we are facing. We cannot omit the lack of understanding nor the ignorance of the benefits of open data. There is another factor, less frequent, but relevant, namely the need for some institutions to self-finance, an example in this respect being the Trade Register, which provides most of the data on legal entities in Romania for a fee.