Abstract |
In today's data-driven world, data interchange plays a pivotal role in our daily lives. Every digital
transaction, from the simplest to the most complex, requires data exchanging between the
parties involved. From individuals and small businesses to large corporations, organizations, and
governments all store, process and exchange data. This situation, over time, has led to the
accumulation of large volumes of data, called Big Data. With the emergence of Big Data, it
became apparent that there were numerous opportunities in terms of their analysis and the
information results (insights) of such analyses, which could be highly beneficial to the data
processors’ goals. Of great assistance at improving the outcomes of such analyses was also
identified to be the enrichment and correlation of existing internal datasets with datasets
acquired from external sources. Obtaining third-party datasets used to entail approaching
specific data owners directly; however, with the emergence of digital data market places in
recent years, this situation has begun to change.
Until recently, data exchanges were carried out with little to no regard for privacy or the
protection of personal data. Recent legislative developments, such as the European Union's
GDPR data protection laws, have prompted many data providers and consumers to seek solutions
for both protecting individuals' privacy and assessing the privacy risks of the datasets under their
management. Following these developments, any data disclosure has to employ some form of
data sanitization prior to release, in order to protect the privacy of individuals' sensitive
information. Anonymization of data is an example of such a sanitization process, and it involves
the deduction or transformation of data in a privacy-preserving manner in order to achieve a
certain level of anonymity. One of the most difficult aspects of any anonymization process is
striking a balance between data utility and privacy. Under that scope, risk analysis and
anonymization tools are required in order to increase awareness of the privacy risks, aid in
regulatory compliance, and assist data processors with the anonymization process. Although
there are a few tools reported in literature, they do not offer a wide range of options in terms of
the types of data that can be analyzed, the support of data multidimensionality, and visual
exploration of the risk analysis results.
Aside from data privacy issues regarding the disclosures and exchanges of Big Data, there are also
challenges over their meaningful analysis. Visual analytics is a research area that focuses on
offering efficient and transparent methods of processing, visualizing, and analyzing large volumes
of data so that analysts may better understand them and extract insights that could support datadriven decision making. In the literature, a variety of Visual analytics applications are available.
Among the most common features of such applications is the ability to create dashboards in
order to support Big Data exploration. Dashboards are a collection of data visualizations and
filtering options designed to assist analysts and provide an interactive way for them to conduct
their analysis. However, most of the currently available solutions fall short when it comes to
dashboard-wide data exploration through drill-down or roll-up analysis. Data drill down refers to
the process by which an analyst can shift from a grouping of data to a more detailed and granular
group of data, whereas roll-up refers to investigating data in progressively less detailed levels.
The applications offering this functionality only provide it in a limited fashion and for specific
charts or graphs, without being able to support propagation of the drilling or rolling actions to
the rest of the dashboard's visualizations.
Our proposed methodology for dealing with the aforementioned issues involves the design and
development of a unified framework of applications aimed at the analysis, visualization, and
exploration of big data while ensuring security and privacy. These applications provide the ability
to analyze the risk of leaking personal data that may pass through a set of data, and also the
ability to anonymize them. Furthermore, they facilitate the visualization and exploration of large
datasets by combining previously owned datasets with those obtained from digital data
marketplaces and displaying them through interactive dashboards. These dashboards can be
adapted to the user's analysis framework requirements and provide data-drilling functionalities
based on the type of data under analysis, thus allowing users to gain new insights that they could
not have gained otherwise.
|