In this blog post, we will look at whether it is possible to ensure objectivity in the era of big data when the issues of privacy protection and data ownership have not been resolved.
People generate a huge amount of data every day without even being aware of it. On Facebook, one of the social networking services (SNS), hundreds of billions of posts are accumulated every month, and hundreds of billions of messages are sent and received every day on messenger apps such as WhatsApp, Zalo, and Telegram. In addition to these posts and messages, search term records, purchase information, web page visit records, and blog posts are all accumulated as data. According to the MIT School of Management in the United States, 250 quintillion bytes of information are generated per day through mobile devices, social media, and online commerce. We are creating more data than we can handle. This vast amount of data is called Big Data, and the term is gradually evolving beyond simply meaning large data to encompass the process of analyzing and utilizing data.
Big data is not a completely new concept. There are already many successful cases of using big data, and Google, Netflix, and Apple are representative examples. Google has also been able to forecast the flu faster than the Centers for Disease Control and Prevention (CDC) by analyzing the frequency of searches for fever, cough, etc. It has also improved the accuracy of its automatic translation by statistically comparing billions of documents. Netflix has developed the Cinematch service, which analyzes the tastes of its members based on the movies they have watched and recommends movies accordingly. Apple’s voice command software Siri is another example of big data being used. When a user asks a question through Siri, this data is sent to Apple’s main server, where Apple’s AI algorithm analyzes the question and sends an answer to the user. The AI algorithm is created based on vast amounts of data, and as questions continue to accumulate on Apple’s servers, the database is strengthened, making Siri’s responses increasingly sophisticated. As such, big data can be used to discover new information, which can create infinite value depending on how it is processed. So, does big data analysis simply need to be applied here and there? However, there are still important challenges to be solved in big data analysis.
First, an important challenge of big data analysis is the issue of invasion of privacy. The ultimate goal of big data is to cover everything. In other words, the goal of big data is to view everything that happens everywhere as data. In fact, expanding the scope of data by turning everything that passes by into data can bring about great change and progress. For example, consider a service that measures a person’s biological information and behavior patterns in real time to collect data and use that data to sound an alarm before high blood pressure or a stroke occurs. The data that can be considered at this time is very diverse, including the food an individual has eaten, real-time blood pressure, the number of times someone has visited the bathroom, gait, and sleep patterns. If a simple chip can be inserted into the body to collect such data in real time and send it to an analysis center to predict diseases, would this service be practical? While it is certainly attractive to be able to predict diseases in advance, most people will be skeptical about allowing their every action and physical information to be analyzed. In the previous example, individuals can choose whether or not to provide their information, but many people already provide their information without being aware of it. People simply post on social media for the purpose of communication, but those posts are analyzed to understand people’s needs, find product reactions and improvements. Such cases are now commonplace. However, just as we must ensure that photos taken without the consent of individuals are not used, such as in the case of portrait rights protection, we cannot say that it is okay to collect and analyze posts on social media just because the analyzers have access to them. As there is controversy over privacy violations, legal clarification or consultation between individuals, sites, and analytics companies is necessary.
In addition to the issue of privacy invasion, the unclear ownership of data is also a problem that cannot be ignored. People do not know how long companies keep personal information. It is also unclear to what extent companies can reprocess personal information and whether individuals have the right to completely delete personal information owned by companies. If personal data is transferred to the data center of the company’s headquarters in the United States through cloud services such as Google and Apple, who owns the data? The debate between individuals and companies over the ownership and use of data will not end easily. Moreover, data is crossing borders. That is why it is necessary to discuss issues such as who owns the data and how much information should be disclosed, not just within a country but on a global level.
Finally, big data analysis is inevitably not perfectly objective. In the past, there were assumptions that had to be made due to a lack of data, but now that we are dealing with vast amounts of data, those assumptions are no longer necessary. This has made it possible to exclude subjectivity from data analysis more than in the past, but that does not mean that big data analysis is very quantitative and objective. Depending on the subject or purpose of the analysis, the subjectivity of the analyst is inevitably involved, from what data to handle. Even if all the desired data is collected, the initial data is basically mixed with outliers and unnecessary values. Subjectivity also intervenes in the process of judging this and refining the data to be used in the actual analysis. In particular, the subjectivity of the analyst is inevitably involved in the process of identifying the most important meaning in the analysis. This intervention of subjectivity can undermine the true meaning of the original data, which risks deviating from the fundamental purpose of big data analysis. In other words, analysts extract valuable information from the original data to classify the data more accurately, and obtain results that are merely subjective, rather than making predictions. Therefore, it is inappropriate to blindly believe that big data analysis is unconditionally quantitative and objective just because it is based on a lot of data. We must recognize the subjectivity of analysis and seek ways to increase objectivity.
Big data analysis is attracting attention as a powerful tool in various fields, and successful cases seem to show a bright future for big data analysis. Companies fascinated by the advantages of big data analysis are racing to jump on the bandwagon. However, big data analysis is accompanied by the aforementioned issues of invasion of privacy, data ownership and usage rights, and objectivity of analysis. If efforts to solve these problems are not made in parallel, big data analysis will inevitably hit a wall.