DataYes>News>Content

Datayes News Sentiment Data: Make Informed Decisions and Stay ahead of the Cure

2023-12-14 14:44:41

1.Introduction to News Sentiment Data

News sentiment data reflects the different analysis and views of investors at all levels of the market on a company. Compared with traditional volume and price data and financial data, news sentiment data provides additional information in quantitative research.

In order to analyze the huge amount of daily news data, Datayes uses natural language processing (NLP) models to do sentiment analysis on each news article, and label each company mentioned in the news with positive, negative, and neutral sentiment labels by determining the company events and operating conditions described in the news text, the impact of the industry boom on the upstream and downstream companies in the industry chain, and so on.

Since the launch of the first version of Datayes news sentiment model in 2019, Datayes news sentiment data has been widely used by investors in the market. With the optimization and iteration of the model, a new model was launched this year, with a higher accuracy rate and including stock opinion, bond opinion, fund opinion, and futures opinion in terms of the dimensions of the product, which is more beneficial for investors.

2.Data Characteristics

The history of news big data of Datayes began in 2013, and more news sources have been growing ever since. Now it can keep more than 80,000 articles per day.

The distribution of news is not the same for different sentiment categories. The chart below counts the number of positive and negative news stories per day, respectively, and shows that the number of positive news stories per day is about three times more than the number of negative news stories, indicating that the media is biased towards reporting good news about the company.

Thanks to the abundance of news volume, the news sentiment data also has a high level of coverage across different investment domains, as shown in the chart below. After 2019, daily news coverage reaches more than 90% on both CSI 300 and CSI 800, and 70% on All-A.

3.Introduction to News Sentiment Modeling

Datayes News Sentiment Data is based on a set of Natural Language Processing (NLP) data analytics, which trains deep learning models with labeled training sample data to predictively label massive news data. Each news label, e.g., news affiliates, news sentiment classification, etc., corresponds to an NLP learning task. Datayes uses pre-trained language models combined with downstream task fine-tuning training to handle various NLP learning tasks.

To begin with, a large amount of unsupervised news corpus is used to pre-train a language model (Bert model); Secondly, a small number of labeled training samples are used for different NLP learning tasks and fine-tuning is performed to improve the accuracy of the model. This kind of pre-trained natural language model not only greatly reduces the number of labeled training samples, but also reduces the workload of various NLP learning tasks.

Sentiment analysis is to identify relevant listed companies from the news text using Named Entity Recognition (NER), and to classify the degree of association between the company and the news into strong and weak associations using a self-attention network (self-attention). Sentiment analysis is to extract relevant text for each associated company in the news, and uses Bert fine-tuning to accomplish the task of sentiment classification.

4.News Sentiment Factor Study

News Sentiment Factor Construction

Research has shown that short-term news sentiment is correlated with individual stock sentiment and long-term news sentiment is correlated with individual stock fundamentals. We examine here the long-term (90-day) news sentiment factor. Factor construction steps:

Step1: Eliminate market news by news title and news label, for example, news with the label of INDUSTRY_NAME_1ST as none, others, market; news with the title of “LHL”, “Divergence” and so on. News with titles such as “Dragon & Tiger List”, “Divergence”, and so on.

Step2 For each piece of news article, take the company with the highest association level, and keep only the strong associated news.

Step3 For each stock, sum up the sentiment scores of all the news in the past 90 days, and use it as the value of the sentiment factor of the stock for that day.。