30/6/2020
Summary
Introduction:
In the present world, there is an increasing demand for data-driven decision-making. However, systems that can turn data into information, and information into insight are missing. With the development of such systems, people will tend to trust more to information refined by AI, instead of their limited views, that may suffer from insufficient expertise or lack of data. It’s vital to develop an ethical AI that will bring fact-checked information and high-quality news.
Since recently, there is a rising awareness of the influential role of companies like Twitter or Facebook on society. Google follows as it decides which links billions of people see while searching. However, there are several limitations to searching on the internet. These limitations are present at both sides: On the one hand, search engines display along with factual information also low quality or misinformation links, on the other hand, single users apply their limited experience and own search strategies, that direct their search into alignment with their expectations. Often, such bias can lead to the spread of misinformation.
Our goal is to contribute to the process of orientation in a vast amount of information on the internet. Namely to build an extension to search engines that will be able to process and categorize relevant information, especially in cases with a wide spectrum of views typical for controversial topics (task 1).
Search engine platforms are not immune to the propagation of false or misleading information. Their ranking algorithms are almost blind to manipulation and misuse, such as information voids and data voids. We aim to build an application capable to enhance search ranking, based on filtering out misinformation (task 2). Within this aim, a specific task is to categorize and filter links according to their characterization from different perspectives, such as credibility, factual accuracy, manipulation, propaganda, or extremism.
In recent years Google introduced Knowledge Vault. This experimental feature was meant to deal with facts, automatically gathering and merging information from across the Internet into a knowledge base capable of answering direct questions. For this, it was necessary to select “confident facts”, that had a high probability of being true. However, this task is technically so demanding, that it couldn’t be applicable yet. We aim to build tools and measures that can contribute to such an assessment (task 3).
Methodology:
Our approach is based on the following premise: Rich set of descriptive features can reflect the factual status of the story, and hence predict its factual accuracy. By exploitation of text analytics, it’s possible to evaluate basic attributes of any news or published document on the web, including various aspects of text quality, references, author, and web reputation. Moreover, features such as sentiment, attitude, communication style, motivation, or emotions are extractable. Also, the imitation of cognitive human processes and strategies during searching may be helpful.
The question is to what extent it is possible to identify different characteristics of controversial topics in an automatic manner. The challenge is to distinguish facts from opinion, interpretation, and speculation. To find inconsistencies, estimate possible bias and misinformation (see Disinformation detection). Perhaps also to recognize logical fallacies and different types of bias.
Keywords: crawl data, programmatical approach, text mining, natural language processing, semantic analysis, feature space, classification in multidimensional data, feature reduction, machine learning, advanced statistical methods, artificial intelligence, network analysis, singular value decomposition, simulation of cognitive processes.
Status:
This project is in its early phase. We are open to collaboration.