AT THE FOREFRONT OF LEGAL RESEARCH BIG DATA FOR LAWYERS: HOW TO USE AND REGULATE?

This paper is a research on using a big data analysis for lawyers, attorneys and law firms. Big data is a new gold — has been noticed in media and many conferences. Today big data is driven by digital transformation. It helps to analyze a huge amount of data, creates opportunities for increasing performance and gets competitive advantage for private and public sector. A legal regulation of big data is still under the question in many countries. The main aspect and confusion are privacy concerns. Meanwhile, big data analysis actively used by governments, private and public market players. The market offers different legal tech innovations, software for analyzing big data and methods that we review in the article. Interestingly, there are different tools and application of big data for lawyers; and this what each legal professional must know. In this paper we are observing some aspects of using big data analysis for legal ndustry and discussing issues of legal regulation. We provide some practical ways on application of big data analysis for legal professionals. Some insights in this article adopted from author’s previous research made during the study in the Maastricht School of Management based in the Netherlands. The research is partly published in the Second Edition of Global Legal Insights AI, Machine Learning & Big Data in 2020. Collaborative approaches needed to close the big data skills gap.


Introduction
Modern economic theory has an assumption that economy develops by cycles. 2 Today we are living in a new cycle, the so-called Industry 4.0. that leads to change of the process of creating additional value: using more mechanization and automatization process. 3 We are observing deeper automatization, using modern information and communication technologies that lead to the so-called digitalization process. 4 Particularly, data is generated from different sources and technologies. A new generating data creates an opportunity to analyze it by applying technologies. The power of big data is in its massive reserves that is generating every day by each person, organizations, government bodies, courts etc.
Data analysis allows us to automatize gathering big data and analyze it in order to make necessary conclusions and predictions. It is necessary to say that big data analysis helps us to optimize our life and work processes. Lawyers are well familiar with one side of optimization process by using popular legal databases (e.g. Consultant Plus, Garant, LexisNexis, Westlaw etc). However, the other side is an availability of algorithms, software and analytical techniques that can help to perform "intelligent" solutions. One of such algorithms is a text analytics. Merging text analytics algorithms with machine learning could extremely increase effectiveness of lawyer's work.
Technologies can increase productivity and quality of life in terms of working conditions and work flow. Big data provides new research methods that could be used for effective decision making process. At the same time, there is an issue of legal regulation of such data and protection rights. Anticipation of technology by lawyers and legislators are costly for business and society. Meanwhile, the transition to digital economy is connected with high demand for qualified lawyers who could help legislators and businesses. In these terms, it seems obvious that modern lawyers and legislators should understand Industry 4.0 dynamics and, particularly, use the big data technologies and understand the scope of legal regulations.
In this paper we analyze big data application in the legal field by private and government sector. We define how big data could be used to improve the effectiveness and performance of the organisation by using big data models and algorithms.

Big Data is new gold
We cannot ignore the fact, that the idea connected with big data, i.e. that it will change everything, is a hype. Statistical approaches were developed a long time ago and used by scientists to make conclusions about the research subject. Paraphrasing Marx, we could say that statistical approach used by scientists was a way to interpret the world, in a different way. The point, however, is to predict it. Big data allows to predict customer's behavior, market tendency, find discrepancies, similarities or even more -help with decision making. The study of successful companies that use big data shows us cases of gathering, storing and analyzing data by means of new ways. According to Marr,5 there is a prediction, that every human will generate 1.7 megabytes every second by using devices, internet, GPS systems, photos and videos. Completely new approaches to analyzing and storing data make this process even faster. Specifically, cloud technologies and distributed computing tools could help to store and analyze the data. Some modern algorithms, such as speech and photo recognizing systems connected with artificial intelligence, machine learning create new possibilities to analyze the data and make predictions. Thus, digital development provides us with a myriad of tools to gather and analyze data, the socalled big data.
Big data itself could be characterized as a set of extracted information that could be analyzed by special software (e.g. Hadoop, Tableau, Microsoft Azure etc.). Doug 6 proposed to associate big data with the following key concept: (i) volume (big data has a huge volume of information), (ii) variety (there are different data sets with structure or without it), and (iii) velocity (high speed of gathering data, analyze and get results from it). These key features with possibility to use technology or software to analyze big data, allows to transform it into Kuta n University Law Review Volume 7 Issue 1 2020 Rustam Ra kov BIG DATA FOR LAWYERS: HOW TO USE AND REGULATE? value. 7 We believe that value here is not just a result of big data analysis with conclusions. It is more about evaluating patterns, for example, for business intelligence, eDiscovery, text analytics, legal case predictions and so on. At first glance, it could be a presumption that things are not connected, but analyzing them we can got an idea about specific pattern. 8 Data visualization allows us to read and understand the data transformed from long technological analysis to simplified conclusion. There are a lot of cases where big data is applicable for legal purposes. In addition, big data analysis is applicable in such fields as government and social security, education, banking, money laundering prevention, due diligence, customer relationships and billing for law firms etc.

The law and Big Data
At first glance big data associates more with data analytics related to STEM (science, technology, engineering and mathematics) disciplines. However, a lot of unordinary ways to use and apply big data are already known. Some companies got a competitive advantage by using big data. Thus, we do not associate big data analysis exclusively with data analytics or engineering. 9 We review big data analysis as an opportunity for lawyers and undefined field for legal regulation.
Legal field is complicated because it mostly deals with government regulation. Governments actively use big data analytics for national security, economic research, urban planning and even tax policy. In 2015 Russian tax authorities introduced Automated control system VAT-2 that analyzes information gathered from value-added tax declarations in order to find tax evasion schemes. This system made it possible to reveal tax evasion schemes and create obstacles for breach of law connected with false VAT declarations submitted by taxpayers. Russian government is also gathering data for analyzing and identifying productivity of its bodies. The government's data is available on Open Source web-site. 10 The AVATAR system used by US immigration and customs can use video and audio sensors to make probabilistic judgements about a personwhether he lies, has dangerous behavior or is hiding something. Big data United Nations Global Working Group unifies countries in order to develop new solutions and share experience on Big data analytics. United Nations Program Global Pulse helps countries implement innovations connected with Big data and develop policies. Governments actively use modern technological trends by cooperating with private sector. Russian tech-startup FindFace could analyze a database of social networks and find any person by face recognition. In 2018 the project was closed for public users due to the government project on finding solutions in face recognition. The possibility of data analytics to analyze and recognize myriads of people having personal profiles on social websites is a field for long legal debates. In this respect, modern lawyers should understand legal regulation of big data.
At the end of 2019, a first step toward legal regulation was made by the Russian legislator. Particularly, a new Article 783.1 of the Civil Code provides that parties may enter into the agreement on provision of information. Such an agreement can content an obligation of a party not to disclose such information to third parties. Before this amendment business used an agreement on provision of information and data based on the legal doctrine of "freedom of the contract". Meanwhile, there was a risk of breaching legislation on personal data. The reason is legal uncertainties. Big data is the impersonalized information, however, in order to gather it sometimes you need to process the data. Upon gathering and processing the data it usually transmitted to other party for analysis. In this case we got a debate on whether we need to notify a person or other third parties while processing and analyzing the data. One could argue that non-disclosure agreement could be incorporated in the terms and conditions on the agreement on provision of information (big data). However, we can keep the information or agreement as confidential, but the law and case law do not provide us with certain position on this issue. In this respect the new article in the Civil Code Kuta n University Law Review Volume 7 Issue 1 2020 Rustam Ra kov BIG DATA FOR LAWYERS: HOW TO USE AND REGULATE? seems a kind of signal to market for being protected by the statute while exchanging big data or providing it to third parties.
Big Data is recognized by practitioners as impersonalized data. However, personal data regulation is very strict in Russia. At the end of 2019, penalties for breach of personal data law were increased. In particular, these penalties were connected with a breach of law on the localization of personal data. Russian law requires the localization of personal data on Russian citizens on servers situated in the Russian jurisdiction. Any breach of this provision may lead to a penalty and ban in Russia. LinkedIn failed to comply with the regulation and was banned upon a court decision. In 2019-2020, the Russian authority Roskomnadzor (Federal Supervision Agency for Information Technologies and Communications) started court proceedings against Twitter and Facebook. Upon the court decision, a penalty was imposed on both Twitter and Facebook. Twitter appealed to the court but was unsuccessful. Failure to comply with laws on personal data may lead to a ban of service in Russia. LinkedIn was banned and lost the entire Russian market and is today almost unknown in the region. It is advisable to have a legal compliance team and to cooperate with Russian authorities.
The main difficulties for big data deals is personal data regulation. Today, the regulation is very broad and personal data-processing rules apply. However, we have seen common practice which involves many market players asking customers to consent to the processing of personal data that includes big data analytics (data anonymization and transfer of data). Gathering big data in compliance with personal data regulation is a good policy for companies. Due to legal uncertainties concerning the definition of big data, the best way to process and acquire personal data would be to proceed in compliance with the personal data law in Russia. 11 The process of gathering data also poses a question on privacy concepts. Whether is it good to gather information from the devices and make predictions on people's life? On the one hand, big data analysis is a blind-based analysis without specific personalization. On the other 11  hand, by segmenting people we can perform behavioral prediction of a particular group of people. People usually do not think about the flaws of big data and ready to sharing it till the moment, while such big data presents useful information. Whether the law should protect people? It seems reasonable that the law should require consent on gathering and sharing even impersonalized data. If so, then what can we do when we perform ex-post analysis using existing data? Some authors argue that such approach could be used in legal predictions of cases by analyzing case law. Simply speaking, we can create a system for automatically performed judgments. However, all analyses based on mathematical or statistical models are simplified analyses based on theory and interpretations of results. 12 In this case we need human's judgment on specific case details and evidence.
The overall technological improvements and results of using big data create more and more questions for legislators. Meantime, big data is a tool for improving effectiveness and decision making process. In this regard, modern lawyers must understand ways to apply big data analysis in their routine work.

Case analysis
Today most of court decisions are published in government sources or available through computer-assisted legal research software. Lawyers analyze court decisions in order to understand possible case outcome, find main rulings (precedent, case-law), judge's style or even find correlations of law enforcement. Usually law firms buy access to the software and delegate this work to paralegals, trainees. The case analysis made by legal automated systems using big data analytics could help to perform text analytics and do link analysis in order to find correlations and patterns. It is more important for lawyers from common law countries, where case law principle is a cornerstone. Meantime, understanding patterns in order to understand practice of Kuta n University Law Review Volume 7 Issue 1 2020 Rustam Ra kov BIG DATA FOR LAWYERS: HOW TO USE AND REGULATE? law enforcement could help law firms evaluate the case, its difficulty and possible needed time and resources. Understanding time and labor costs for a specific case could help to offer adequate price for customer.

eDiscovery
People communicate electronically. They use e-mails, chats, create and sign documents, use databases. Sometimes lawyers need to find evidence of entering into the contract, evidence of having communication about specific subject or find a fact of sending information to someone. All acts in computer systems or web are connected with creating footprints, such a metadata. People change information on their devices through workflow or intentionally, they also remove and destroy it. However, metadata is a part of the information that does not appear, but can be retrieved from the file. 13 The process of retrieving the information is an analysis of Big data in order to find the fact (pattern) or the document. Exploring database of a client during due diligence, lawyers could need to find specific metadata in billions of emails or documents. Clustering diverse information and presenting it using visualization could help to finalize due diligence process and evaluate risks for the client. The same could be said about law firm itself. Finding an essential document, e-mail, picture or any file created by employee in order to evaluate someone's effectiveness or quality of work is a necessary part of management.

Text analytics
Analyzing huge amounts of typical documents in order to find discrepancies and patterns could be an interesting option for law firms. Today many technologies companies are working on text recognition and text analysis. Law firms can use technologies for analyzing legal documents. Litman-Navarro 14 has analyzed length and readability of 150 13 S.C. Bennett & J. Cloud, Coping with metadata: Ten key steps. 61  privacy policies from popular tech companies. Using Lexile software, the researcher measured text's complexity and its difficulty for easy or hard understanding (see Figure 1).

Figure 1. Text analytics of different privacy policies (Litman-Navarro, 2019)
The applied link provides the analysis by researching all versions of Google privacy policies. Litman-Navarro found that Google changed policies by making it easier to understand (see Figure 2). This is a very practical study that shows how Big data analysis applies to text analytics and could help law firms to succeed. Instead of using huge number of documents, get people compare them, it is easier to automatize the whole process.
Another broad application of text analytics is an ability to search cases and legal acts for law firms. In order to perform Big data text analytics, we need to gather documents, structure the matrix and apply data tools, such as finding patterns, classification and cluster analysis. 15 The analysis of previous cases could help to predict case outcome and Kuta n University Law Review Volume 7 Issue 1 2020 Rustam Ra kov BIG DATA FOR LAWYERS: HOW TO USE AND REGULATE? estimate litigation costs. Washington University professors tested an algorithm for forecasting Supreme Court decisions. 16 The research showed that statistical method revealed better results than lawyers. The similar algorithms predicted 79 % of case outcomes of the European Court of Human Rights. 17 Particularly, in some specific cases predictions made by algorithms were higher than made by legal experts. This shows that prediction analysis could be enhanced and developed to deliver better quality.
Additionally, text analysis on e-discovery stage could be presented to a judge in order to proove evidence by applying mathematical models and text analysis outcome. However, this method should be clearly evaluated in order to be qualified and verified for a judge's decision. As  Scholtes and van der Herik pointed: "The first generation of e-discovery technology taught us how to deal with big data, the next generation will teach us how we can learn from big data and train our algorithms to provide better decision support and ultimately, on the condition it is properly fenced, defensible and understood, make better decisions by itself." 18 Text analytics can help with due diligence process where lawyers need to analyze a huge number of documents in order to find discrepancies or legal risks. In my personal practice I have done a huge job on document review and analysis. We completed a checklist to discover possible risks for clients. However, some modern tools evolving data analysis can automatize the routine work. For example, Kira System software provide tools to analyze different legal documents and create patterns for future link analysis. Need to say, there are a lot of software that can help law firm to use Big data analysis and enhance productivity by automatization. 19 Most of software providers associated with legal technologies develop the trend on the automatization for legal field. Automatization of legal field mostly concerns the private sector, lawyers, attorneys that can use big data and technology for better performance.

Big Data for lawyers
Law firms have different set of data that could be used in order to create big data model. Some data available in external sources, such as government sources and legal databases. If law firm uses cloud technology, the documents that are uploaded there could be analyzed as well. Additional internal sources of data available for the analysis are firm's customer management systems, billing software, intranet etc. These sources allow them to gather data about customer satisfaction, Kuta n University Law Review Volume 7 Issue 1 2020 billing time and provide ways to analyze customer segmentation, time spent for projects and employee's effectiveness.
In order to create algorithms for analyzing big data, law firm should apply some theoretical approaches. By using theoretical approaches, law firms could create big data model.

Decision trees
A classical approach of discussing a legal problem with the client is to ask questions. A lawyer will continue asking question till the moment when he finds a reasonable decision. A reasonable decision comes from the past experience, educational background and practice. There is an algorithm where the gathered data in the form of answers to specific questions applies to some rules and practical experience. Thus, we can choose some variables -a number of question needed to be answered. Analyzing documents during due diligence process, we could input specific questions in order to find similarity or patterns. For example, there could be the following questions for company due diligence (see Figure 3).  Having received the answers to these questions we compare them with some past experience and past decision -whether it is good to have unpaid charter capital or no evidence of its payment? We can also enlarge particular questions, e.g. how much is tax debt -more than 5 000 EUR or less.
The issue here is to create hierarchical structure of the tree -from less important questions to most important. If all the less important questions are positive, but most important is negative, than we have to decide whether it is a risk or not? In this case, the decision tree should be divided again to different decisions. The results with less risk should be the right decision. Researchers use various decisions algorithms. Some algorithms have more and more splitting or stopping criteria or help to prune a decision tree at the specific moment. 20 The decision tree should reflect the desire outcome of the data analysis. Thus, developing a decision tree model we need to evaluate questions that should be asked and algorithms of finding a right decision.

Cluster Analysis
Clustering helps to group and segment different things to the right order. In this method we do not find right decision, we just segment data for creating conclusion. Researcher just need to understand how many clusters should be created. Law firms could easily adapt cluster analysis to segment type of projects and understand which practice is more profitable or where more time has been spent. Lawyers can cluster cases or legislation to find better presumptions. These results could be used by law firms in order to prepare analysis of specific situations (e.g. making newsletters or legal alerts for clients) The broad application of clustering technique is a text analysis. Law firms could analyze documents in order to find similarities or discrepancies for specific topics. For example, they applied clustering methods in the above case of analyzing 150 confidential policies by technological firms. In order to create a cluster model, law firms should define the scope of specific cluster. It is people who prepare documents, thus lawyers use different approaches and techniques. In this case, we need to find patterns taking into account different techniques, legal formulas and expressions in the text. Clustering method could be strengthened by applying new models for machine learning: finding better ties and distance in data.

Survival analysis
The survival analysis helps to find out where the risk event could occur. A law firm can create some patterns that could signalize about several risks. For example, gathering data from internal sources of law Kuta n University Law Review Volume 7 Issue 1 2020 BIG DATA FOR LAWYERS: HOW TO USE AND REGULATE? firm, we can find grammar errors in our legal opinions, patterns that lead to a wrong model of a document. High-standard law firms have internal scripts for documents: it's format, specific disclaimers etc. Survival analysis could help to find discrepancies in order to prevent risks for the firm.

Rustam Ra kov
The algorithm can also predict the situation, where no updates made for specific case or project. This could be a signal for potential loss of a client or missing a deadline. Survival analysis should incorporate specific models and gather data from internal or cloud sources to help a law firm to get competitive advantage in customer's satisfaction.

Conclusion
In our research, we identified possible ways of big data models and algorithms usage for legal field and especially for lawyers, attorneys. We have concluded that using big data could increase effectiveness of a law firm and enhance performance of attorneys and lawyers. Using big data models could help to get competitive advantage for law firms. We tend to advice to use software available on the market and research additional ways of using big data analysis. As far as a big data a new gold, lawyers, government and firms should actively engage in applying new technologies.
Our study shows interesting application of big data not just as competitive advantage, but also for effectiveness of organization and cost reductions. Another finding is that using big data models by law firms could increase customer relationship, provide tools for case analysis and enrich law firm's market research (e.g. for case perspectives).
Surprisingly, there is a huge amount of data available for lawyers to analyse (e.g. government sources, databases etc). These data could be used to make better decision, make lawyer's work more effective and even play a cost reduction role. In this connection, we recommend to review data analysis tools and solutions on the market for analysing big data for any law firm despite of its size and for any practicing or researching lawyer.