Can we predict the macroeconomy by analyzing the narratives people share on social media? We dove deep into the world of Narrative Economics, using NLP models to analyze millions of viral tweets and see if they could help us predict the fluctuations of macroeconomic indicators. šØ Spoiler alert: it's not that easy! Join us as we explore the interesting relationship between narratives, social media, and macroeconomy, and uncover the challenges of turning narratives into treasure.
Narrative Economics is the study of how popular stories and ideas (a.k.a narratives) formed about the state of the economy could have real effects in the world. In this context, a ānarrativeā is a belief about the state of the world that is shared by the population, regardless of the actual state of the economy. For example, a narrative might be the belief that housing prices are increasing, whereas in reality (according to economic indicators) they are stagnating.
The central idea is that the spread of viral narratives can influence individual and collective economic behavior, leading to fluctuations in markets, changes in investment patterns, and even broader economic shifts.
The term āNarrative Economicsā is heavily attributed to Robert J. Shiller, a Nobel laureate economist and the founder of the field, who defined it as:
āSpread through the public in the form of popular stories, ideas can go viral and move markets āwhether itās the belief that tech stocks can only go up, that housing prices never fall, or that some firms are too big to fail. Whether true or false, stories like theseātransmitted by word of mouth, by the news media, and increasingly by social mediaādrive the economy by driving our decisions about how and where to invest, how much to spend and save, and more.ā
- Robert J. Shiller
The term ānarrativeā itself has different connotations in NLP compared to economics, which might lead to some confusion. šµāš«
In NLP, narrative commonly refers to the structure and elements of stories, encompassing aspects like plot, characters, setting, and theme. It involves understanding how these elements interrelate and contribute to the overall meaning and coherence of a narrative.
In Narrative Economics, as stated in the above section, narrative is a shared belief or idea that spreads through the population and potentially influences economic behavior.
Our research uses the term ānarrativeā in the economic sense. Weāre interested in how shared beliefs about the economy can be used to predict market trends.
The economic term is wide and it is undefined what requirements a story or idea must have in order to be considered as ānarrativeā. Yet, we can look at some characteristics Shiller mentions
Combined together, to capture a narrative, one would need a good measure of what many people are discussing, over time. Twitter (X), in this case, is an almost ideal source of information for capturing this distribution of opinions.
Aligning with Shillerās arguments and with existing literature, the extraction might be (for example) a sentiment
We have collected
Both datasets were carefully curated using targeted queries with the inclusion of specific keywords, and were analyzed to ensure quality and relevant for capturing economic narratives.
Pre-Pandemic Twitter Dataset: Utilizing Twitter API
Post-2021 Twitter Dataset: We wanted to test the predictive power of our employed LLM (OpenAIās Chat Completion API with GPT-3.5
An example tweet can be:
This is so silly. The deficit is decreasing because weāre not doing pandemic aid anymore, and federal receipts are up because of inflation. Congress, or the Biden administration, didnāt do anything to lower the deficit. https://t.co/kvlLPpsUO9
ā New Liberals ššŗš¦ (@CNLiberalism) May 15, 2022
This tweet from influential user @CNLiberalism exemplifies the economic narratives captured in our dataset. Posted in May 2022, it captures real-time public concerns about inflation and its impact on the deficit. This type of narrative can influence consumer behavior and market trends, which is central to our research.
To confirm the presence of narratives within our Twitter dataset, we conducted an analysis using RELATIO
We can see the evolving nature of these narratives over time, where the distribution is aligned with real-life related events. š
A more advanced technique to extract and analyze narratives is using LLMs. Prompting OpenAIās Chat Completion API, GPT-3.5
Hereās a snippet of such an LLM-based narrative analysis for inputs of dates 29/08/2022 to 28/09/2022. In this time period, the Federal Reserve raised interest rates in an effort to combat inflation and the US Supreme Court ruled that the Biden administration could not extend the pause on student loan payments:
This snippet demonstrates the LLMās ability to aggregate information, condensing and distinguishing between opinions and occurrences conveyed in the tweets. Moreover, the LLM links its insights to potential future consequences for the financial indicator, a pivotal initial move towards prediction. š
Now that we are familiar with Narrative Economics and have established how we gathered these narratives from Twitter, letās explore why we chose to focus on macroeconomics:
Macroceconomics studies the behavior of the economy as a whole, examining factors like inflation, unemployment, and economic growth. Microeconomics, on the other hand, is concerned with the decision-making of individuals and firms, examining indicators like a certain stock.
A core concept in Narrative Economics is that narratives can drive economics fluctuations. This is especially intriguing at the macroeconomic level, as the theory suggests that widely shared stories can influence the collective decisions of millions of individuals. Additionally, existing research focuses on microeconomic indicators within the context of Narrative Economics
However, studying this at the macroeconomic level is more complex than at the microeconomic level due to the complex interplay of various factors, the need for broadly covering narratives, and the inherent difficulty in isolating causal relationships.
We focus on predicting three key macroeconomic indicators:
Federal Funds Rate (FFR): The interest rate at which depository institutions, such as banks, lend reserve balances overnight to meet reserve requirements. The FFR serves as a Federal Reserve monetary policy tool, is influenced by public perception of economic stability, and its fluctuations impact various sectors, making it widely monitored.
S&P 500: A stock market index measuring the performance of the 500 largest publicly traded companies in the U.S. It reflects collective investor confidence in economic growth and risk appetite and is widely regarded as a barometer of the overall health of the US stock market.
CBOE Volatility Index (VIX): Measures market expectations of future volatility based on S&P 500 options prices, often referred to as the āfear gaugeā as it tends to rise during market stress and fall during market stability.
These indicators are well-suited for testing the predictive power of narratives in macroeconomics due to their daily frequency, which aligns with the rapid pace of Twitter, and their sensitivity to public sentiment and behavior.
The previous two sections discussed the theory of Narrative Economics and our curated Twitter dataset, which holds narratives within them, and the distinction between macroeconomics and microeconomics, explaining why it is interesting to research the theory at the macroeconomic level and which macroeconomic indicators we chose.
We can now delve into the series of experiments we tested to assess the central question - can economic narratives provide valuable insights for future macroeconomic movements?
Each experiment tests the predictive power of narratives from the curated datasets, for macroeconomic prediction of one (or more) of the financial targets introduced before: FFR, S&P 500, and VIX.
We wonāt be able to cover all the details of the experiments in this blog, but it is available in our paper
We test the predictive power of narratives on three tasks commonly used in macroeconomic literature
We divide our models into 3 categories based on their input signal:
Our goal is to effectively leverage insights from both textual narratives and historical financial patterns to improve prediction accuracy. The added value of incorporating textual narratives can be demonstrated if a model that utilizes both text and financial data (TF model) outperforms a model that relies solely on financial data (F model).
Financial baselines:
Counterfactual textual baselines:
During our experiments, we encountered some intriguing results that warranted further investigation. To ensure the validity of our findings and rule out any counterfactual explanations, we introduced counterfactual textual baselines. These baselines allowed us to rigorously test whether the observed improvements were truly due to the modelsā capabilities or stemmed from other factors. Unfortunately, these baselines revealed that the promising results were more elusive than we hoped.
Our model selection progresses from simpler models, frequently employed in the financial literature
Financial models: These include traditional ML models (e.g., Linear Regression, SVM), DA-RNN
Textual models:
Fusing textual and financial models: We experiment with several strategies for combining the representations from the T and F models for unified prediction:
Prompt-based fusion: LLM-based analysis of given tweets and historical financial values of the target indicator are fed together with raw historical values of the target to a T5 model as separate segments.
Given a TF model, we can derive a T or F model by omitting or zeroing either F or T component, respectively.
TL;DR:
We fed classic ML models with daily sentiments for FFR ānext valueā and ādirection changeā prediction (as separate tasks).
The results:
Type | Model | Accuracy |
---|---|---|
F baseline | As-previous | 0.812 |
F | Random Forest Numeric | 0.936 |
TF | Random Forest Numeric | 0.939 |
T | Logistic Regression | 0.885 |
Ā | SVM | 0.885 |
Type | Model | MSE |
---|---|---|
F baseline | Train-mean | 15.661 |
F | SVM | 15.416 |
TF | SVM | 15.416 |
T | SVM | 15.36 |
What can we learn? š¤ Sentiment analysis lacks the nuance necessary for accurate financial prediction, and traditional ML models have limitations in capturing complex market dynamics. ā”ļø We need an improved text representations and more advanced prediction models.
Here we turn to embedding-representations (as explained in the Experimental Setup) and to DA-RNN
We extensively evaluated various model configurations, target indicators (FFR and VIX), tasks (ānext valueā, āpercentage changeā, ādirection changeā and the last two together), prediction horizons (next-day, next-week), LLM architectures (see Experimental Setup), aggregation methods, and the daily number of tweets given as input. Additionally, we assessed the modelsā reliance on temporal context and relevant narratives using the counterfactual textual baselines.
To keep it short, we present results only for predicting the VIX ānext valueā of the next-day and next-week (as separate tasks). Additional experiments showed a recurring pattern to the presented results.
The results:
Next-day prediction: š” The non-learned āas-previousā F baseline outperforms all other models (3.079 MSE). This suggests that the input data may not be beneficial for such a short-term prediction.
Next-week prediction: Initially both TF models (13.148, 13.147) appeared to outperform the F model (13.463) and F baseline (16.172), implying a potential influence of the textual content.
š” However, the ārandom textsā TF baseline (13.056), which replaced actual tweets with randomly generated text, outperformed all other models, indicating the observed improvement was not driven by meaningful textual content.
We hypothesize that the presence of text improves performance, even when random, due to spurious correlations or random noise aiding generalization, similar to regularization techniques. A contributing factor may be the difficulty of effectively capturing and representing aggregated tweet information for financial prediction, as well as the inherent challenges in predicting future values of a volatile financial indicator, characterized by frequent random movements and fluctuations, using its historical values. Ā
What can we learn? š¤ Our models struggled to leverage tweets for the prediction, indicating that implicitly capturing and aggregating latent narratives within LLMs remains a challenge.
Can LLMs generate an accurate prediction? We first tried to directly predict the financial indicator (average weekly VIX or S&P 500) as a generative response of the web chat version of GPT
Repurposing the LLM analyses for a subsequent prediction model: The previous experiment revealed the LLMās ability to generate insightful analyses of tweets and financial data. To leverage this capability, we utilize these analyses as inputs for a dedicated prediction model to predict the S&P 500 ādirection changeā.
This approach addresses limitations of the two previous experiments by:
The results:
Type | Model | Accuracy | F1-Score |
---|---|---|---|
Ā | Train-majority | 0.424 | 0.0 |
Ā | Week-majority | 0.484 | 0.598 |
F-baselines | As-previous | 0.484 | 0.552 |
Ā | Inverse-previous | 0.517 | 0.511 |
Ā | Up-predictor | 0.576 | 0.731 |
Ā | Down-predictor | 0.424 | 0.0 |
F | T5 Base | 0.604 | 0.723 |
Ā | T5 Large | 0.593 | 0.727 |
TF | T5 Base | 0.626 | 0.738 |
Ā | T5 Large | 0.627 | 0.742 |
T | T5 Large | 0.587 | 0.726 |
T-baseline | Synthetic narratives | 0.489 | 0.254 |
Did it work? Unfortunately, not really. š” Results show that there is no significant difference between the best TF and F models, with a performance gap of ~2% on the limited test set of ~90 samples.
What can we learn? š¤ While LLMs can analyze narratives, the TF model struggled to effectively leverage this analysis for improved prediction.
Despite the presence of narratives in our curated datasets and the development of NLP tools for narrative extraction, evaluating their impact on macroeconomic prediction remains challenging. Our models incorporating narrative data showed limited improvement over those using only financial data, failing to consistently outperform baselines or financial models. Any observed improvements were marginal and statistically insignificant and we regard it as a negative result.
The missing link between the successful narrative extraction demonstrated by the LLMās analyses and the limited improvement in macroeconomic prediction raises a question about the extent to which narratives alone can truly drive and forecast economic fluctuations, at least at the macroeconomic level.
This study serves as a foundation for further exploration, highlighting the need for new macroeconomic models or tasks that can effectively assess the influence of extracted narratives on the economy.
Like any research, this project has potential limitations and faced several interesting challenges:
First, we focused on ānowcastingā ā predicting indicators right now or very soon. Economic markets are inherently complex and involve randomness, making accurate short-term prediction a significant challenge. The Efficient Market Hypothesis suggests that the predictive power of nowcasting is limited as asset prices react instantly to public information. However, Narrative Economics theory proposes that narratives affect peopleās decisions, potentially giving us an edge in predicting economic fluctuations
We only looked at a few specific economic variables (FFR, S&P 500, and VIX), influenced and shifted by diverse, external, and unobserved sources. Looking at different targets, or trying other tasks like detecting market anomalies or predicting profit, might have shown stronger evidence of the impact of narratives on the economy.
The challenge of timing: When does a narrative actually start to impact the market, and for how long? Does it take hours, days, weeks? Although comprehensive, our experiments only examined a limited set of time lags and prediction horizons.
What IS a Narrative?: While our datasets were carefully curated to capture potential narratives, definitively identifying them is challenging, especially when trying to aggregate multiple narratives for a holistic economic picture. The definition of ānarrativeā itself is broad and subjective, often only becoming clear in retrospect. This, combined with the inherent noise, biases, and misinformation common on social media, makes extracting clear, reliable narratives a complex task.
Our worldview: Our analysis relies on English-language data from Twitter and focused on US-centric macroeconomic indicators. Narratives and economies work differently around the world and on different social media platforms or other outlets. This is definitely an area for future expansion.
Lastly, we are limited to publicly accessible LLMs with a known cutoff date to ensure the models couldnāt ācheatā by accessing future knowledge. However, utilizing different models might lead to better results.
A quick word on ethics: This research involves technology aimed at predicting human and economic behavior. Itās vital to state clearly that this kind of tech could be misused in harmful or unfair ways (e.g., for market manipulation or unfair practices). We strongly believe it needs to be developed and used with caution and constant awareness of its ethical implications.
This blogpost extends the technical experiments presented in our paper
PLACEHOLDER FOR ACADEMIC ATTRIBUTION
BibTeX citation
PLACEHOLDER FOR BIBTEX