Comprehensive report on Large-Cap Firms’ Political Attitude

Due Date 3/24/2023 @ 4 P.M.

Brooks Walsh (Brooks377)

Hypothesis: Firms that are more in tune with the “political climate” have better immediate returns after filing a 10-K.

Abstract

This project uses an objective metric for sentiment (sentiment score) to measure the apparent sentiment of 10-K’s, and then correlates that score with stock return data around the filing date of the 10-K. When finding sentiment scores, I take note of the difference between a traditional bag-of-words approach and the new machine learning approach. After thorough analysis, it is determined that the machine learning method is more effective for determining sentiment, but more data should be used to verify these results.
The second half of this project finds sentiment values based on special topics to attempt to correlate those sentiment scores with stock returns around the 10-K filing date, in a similar process to the one I just described. The findings of this analysis did not provide enough correlation data to make meaningful predictions about future stock returns. However, there were many conclusions to be drawn from the standard deviations, weak correlations, and means of the resulting scores. Overall, this report does not meaningfully support or rebut my hypothesis, as no strong correlations were found.

Data

The Sample

The sample consists of 488 of the 503 stocks in the current S&P 500
- There are few enough firms missing that this data will be a great indicator
  - The missing firms are listed below
Some of the firms in the study were not in the S&P 500 during the period for which we have returns
- About 20-25 firms are swapped each year
  - Best I could find for historical constituency data was this link (Only shows swaps through 2021)

Total Missing Values in Final Dataframe: 15

Note: It is definitely possible to replace these ticker names to get additional data, but I felt my sample sufficed
1. First Republic Bank (FRC); did not have any 10-K listed on SEC website
2. GE HealthCare (GEHC); Split from GE on January 4 2023, so there is no 2022 10-K
3. Brown-Forman Corporation Class B (BF.B); class B stock, dropped
4. Berkshire Hathaway Inc Class B (BRK.B); class B stock, dropped
5. Ball Corp (BALL); ticker changed from BLL
6. Elevance Health Inc (ELV); ticker changed from ANTM
7. Gen Digital Inc (GEN); ticker changed from NLOK
8. Meta Platforms Inc (META); ticker changed from FB
9. Paramount Global Class B (PARA); ticker changed from VIAC
10. Warner Bros Discovery Inc (WBD); ticker changed from DISCA
11. Welltower Inc (WELL); no discernible reason… best guess is date issue
12 to 15 are missing because I used the code found below to find cumulative returns (1+(t+3)) * (1+(t+4)) … (1+(t+10)) - 1:
- This code creates a rolling window, but if there is not a t+10, t+9, etc. return value, the cumulative return will not calculate
  1. A
  2. AMAT
  3. AVGO
  4. NDSN

Calculating Cumulative Return Values

To calculate return values, I used the df_returns.csv file in the build/inputs folder

This dataset contains 2587065 values originally

Saving the file this way created an additional index that I had to drop using the following code:

df_returns.drop('Unnamed: 0', axis=1, inplace=True)

Next, I filtered the data set to only include return values for the firms in the scope of this report

Originally it was at this point that I dropped duplicate dates to ensure one return value per day (Which results in losing 1331 returns (or .975% of the data))
- However, as the professor correctly noted, we did not have enough data to know which return to drop
  - Thankfully, the professor cleaned the data for us, and duplicate dates are no longer an issue; so I removed my attempted fix

df_returns_500_bc = df_returns[df_returns['ticker'].isin(sp500_wDate['Symbol'])]

Finally, I used the following lines of code (using groupby/rolling) to calculate the cumulative returns

(1+t) * (1+(t+1)) * (1+(t+2)) - 1

df_returns_500['return_t_t2'] = (
  df_returns_500.assign(R=1+df_returns_500['ret'])
  .groupby('ticker')['R']
  .rolling(3)
  .apply(np.prod, raw=True)
  .shift(-2)
  .reset_index(level=0, drop=True)
  .sub(1)
)

(1+(t+3)) * (1+(t+4)) … (1+(t+10)) - 1

df_returns_500['return_t3_t10'] = (
  df_returns_500.assign(R=1+df_returns_500['ret'])
  .groupby('ticker')['R']
  .rolling(11, min_periods=11)
  .apply(lambda x: np.prod(x[3:]), raw=True)
  .reset_index(level=0, drop=True)
  .sub(1)
  .shift(-10)
)

Sentiment Scores: Getting 10-K’s

To calculate general sentiment scores, I am looking to find the 10-K HTML files that correspond with the 488 firms in the sample

I grabbed the HTML files using the code contained in the build/download_text_files.ipynb notebook file
- These hmtl files are stored in the following directory as a .zip: build/10k_files/10k_files.zip
  - After fixes, 498 10-K’s are downloaded; the only 2 missing do not have a 10-K filed on the SEC website

To work around a major problem where many 10-K’s were not downloading, and to fix erroneous filings, I used CIK instead of ticker when looping into the SEC EDGAR downloader.

for example, this code produces the Heico 10-K, when it should produce one for Agilent Technologies:
```
dl.get("10-K", "A", amount=1, after="2022-01-01", before="2022-12-31")
```
Because of the switch to CIK values, I had to drop the duplicate stocks from the 503 in the original sample:
```
sp500.drop_duplicates(subset='CIK', keep='first', inplace=True)
```

The following is the code that grabs 10-K’s from SEC EDGAR and downloads them into folder trees, which I then zip into the desired zip folder

for firm in tqdm(sp500['CIK'].astype(str)):

        symbol = sp500.loc[sp500['CIK'] == int(firm), 'Symbol'].values[0]
        pattern = 'sec-edgar-filings/'+firm.zfill(10)+'/10-K/*/*.html'
        firm_files = fnmatch.filter(file_list, pattern)
                
        if len(firm_files) == 0:
            dl.get("10-K", firm, amount=1, after="2022-01-01", before="2022-12-31")

Sentiment Scores: Getting 10-K Filing Date + Adding Returns

In order to correctly relate 10-K filing’s to returns, I will need the filing date of each

I started by creating lists of Accession numbers and CIK’s by using re.search on the file paths:

# grab accession number from file paths
acc_pattern = r"\d{10}-\d{2}-\d{6}"
acc_num_list = [re.search(acc_pattern, file_name).group() for file_name in file_list]

# grab the CIK from file paths
CIK_pattern = r"\d{10}"
file_CIK_list = [re.search(CIK_pattern, file_name).group() for file_name in file_list]

I then used the following for loop to place the CIK and Accession number into the correct place in the url, and pulled 10-K dates using a CSS selector, adding the dates directly to the dataframe:

for index, row in tqdm(CIK_ACC.iterrows()):
        cik = row["CIK"]
        accession_number = row["Accession"]
        url = f'https://www.sec.gov/Archives/edgar/data/{cik}/{accession_number}-index.html'
        r = session.get(url)
        filing_date = r.html.find('div.formContent > div:nth-child(1) > div:nth-child(2)', first= True).text
        CIK_ACC.loc[index, 'Filing_Date'] = filing_date

        sleep(.15) # for SEC timeout

To ensure the data is consistent with my desired sample: I removed the 2 stocks with no 10-K filings, then added back the 3 duplicate stocks I dropped earlier using a merge:

# first remove the 2 stocks that did not have 10-K filings
sp500_m2 = sp500[["Symbol", "CIK"]]
sp500_m2 = sp500_m2.loc[~sp500_m2['Symbol'].isin(['FRC', 'GEHC'])]
sp500_m2.reset_index(drop=True, inplace=True)

# then merge it with DF that only has only CIK's with relevant 10-K's attached
CIK_ACC_noTic = CIK_ACC.drop("Symbol", axis=1)
sp500_wDate = sp500_m2.merge(CIK_ACC_noTic, on="CIK")

Once my dates are collected, and return variables are measured, I merge the 2 dataframes together:

# merge return data into dataframe with dates
sp500_ret = sp500_wDate.merge(df_returns_500_merge.rename(columns={'ticker':'Symbol', 'date':'Filing_Date'}),
                              how="left",
                              on=['Symbol','Filing_Date'],
                             validate="1:1")

Sentiment Scores: HTML Parsing and Variable Creation

Between the last step and this one, I used multiple tests to find the stocks that did not have enough data to perform analysis on, and I dropped them, resulting in 488 rows of cumulative return variables and dates. Now, I use the CIK’s of my desired 488 stocks to calculate different sentiment scores using re.findall and the NEAR_regex() function (scores are standardized by document length):

an example of what goes through the loop (finding positive sentiment score using LM lexicon):

LM = pd.read_csv('inputs/LM_MasterDictionary_1993-2021.csv')
LM_positive_U = LM.query('Positive > 0')['Word'].to_list()
LM_positive = [elem.lower() for elem in LM_positive_U]
LM_positive = ['(' + '|'.join(LM_positive) + ')']

sentiment_pos_LM = []

for firm in tqdm(sp500_ret_parse['CIK'].astype(str)):

        # get a list of possible files for this firm
        firm_folder = "sec-edgar-filings/" + firm.zfill(10) + '/10-K/*/*.html'
        possible_files = fnmatch.filter(file_list, firm_folder)
        if len(possible_files) == 0:
            continue

        fpath = possible_files[0]  # the first match is the path to the file
        with zipfolder.open(fpath) as report_file:
            html = report_file.read().decode(encoding="utf-8")

        # Cleaning the html

        soup = BeautifulSoup(html, "lxml-xml")

        # Delete the hidden XBRL
        for div in soup.find_all("div", {'style':'display:none'}): 
            div.decompose()

        # clean the data for parsing
        soup_txt = soup.get_text().lower()
        soup_txt = re.sub(r'\W',' ',soup_txt)
        soup_txt = re.sub(r'\s+',' ',soup_txt)
        doc_length = len(soup_txt.split())
        LM_pos_regex = len(re.findall(NEAR_regex(LM_positive, partial=False) ,soup_txt))
        sentiment_pos_LM.append((LM_pos_regex/doc_length) * 100)

There is only one difference to the code above when I am creating my sentiment scores on special topics; I add an additional element to the list for the regex to see topics in proximity to positive/negative words. Example code is below:

I will discuss the reasoning for my topics/words in the next section

diversity_var_str = ['(diversity|diverse|racial|races|ethnicity|equitable|inclusion|culture)']
diversity_Pvar_strlist = GHR_positive + diversity_var_str
DIV_pos_regex = len(re.findall(NEAR_regex(diversity_Pvar_strlist, max_words_between=12, partial=False) ,soup_txt))
DIV_pos_list.append((DIV_pos_regex/doc_length) * 10000)

To finish creating my output data, I created a new dataframe using the lists of all the different sentiment scores, merged it with the frame that has dates and returns, and saved it to the output folder for analysis:

# merge sentiment score data with sp500_ret_parse
sp500_nfinal = sp500_ret_parse.merge(sent_scores, how="left", on="Symbol", validate="1:1")
# drop unnecessary data
sp500_final = sp500_nfinal.drop(['CIK', 'Accession'], axis=1).copy()
# final result goes to outputs folder
sp500_final.to_csv('outputs/sp500_final.csv', index=False)

Of note: Throughout the mechanical description, I do not mention when I do simple tasks like reset the index, or change formatting; all the relevant code is included in “_build/build_sample.ipynb”

Sentiment Scores: Topic Discussion

The three topics I chose for analysis are diversity, politics, and the environment. These are broad terms with lots of connotations, so let me break it down.

Diversity
Totally contradicting what I just said, the topic of diversity in a business setting is actually quite simple to parse, and often has static meaning. For the most part, diversity is mentioned in relation to some kind of “commitment” that a firm makes in general terms to the community. On social media, people give very little credence to these posts, but I was curious if investor sentiment changes depending on the “official stance” of the company on diversity. In choosing the words for this topic, I used generally broad, legal-style language to invoke the most hits on the regex.

Politics
So when I say politics, I don’t mean who they vote for; instead I am referring to the sentiment surrounding political situations in the US and around the world. Political situations such as new regulations, laws, geopolitical strife, etc.. In the context of this report, I am focusing on whether or not high-cap firms display any generalized sentiment at all towards regulation, and whether investors respond to that sentiment.

Environment
Admittedly, there will be some overlap between Politics and Environment when it comes to Regex hits; it is due to regulation being a common factor. Because of this, I had to be careful to avoid so much overlap that the data is meaningless when selecting words for this list. I focused on clean energy topics and progressive environmental thinking. In the context of this report, I would like to gauge if firms are talking about the environment only in the context of regulations, or if they sometimes inspire hope in their 10-K’s, and how investors react to this.

Main Focus
The three topics I selected for this study are not randomly selected, as they all connect on the idea of being in tune with an ever-changing world with increased regulation. Diversity and the environment are becoming increasingly popular things to discuss, and politics is what ties it all together. Based on the results of this study, I am hoping to conclude whether having a positive sentiment towards “seemingly-regulated”, trendy but heated topics will have tangible benefits to a firm in the short run, even if the results of that sentiment are never shown.

Final Analysis Sample

# required imports
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# read dataframe from output folder
df = pd.read_csv("../_build/outputs/sp500_final.csv")
df.describe()

	return_t_t2	return_t3_t10	LM Positive	LM Negative	GHR Positive	GHR Negative	Diversity P-score	Diversity N-score	Politics P-score	Politics N-score	Environment P-score	Environment N-score
count	488.000000	488.000000	488.000000	488.000000	488.000000	488.000000	488.000000	488.000000	488.000000	488.000000	488.000000	488.000000
mean	0.004316	-0.008571	0.498610	1.588406	2.391322	2.591227	1.152977	0.739452	4.127687	9.261984	2.260266	3.751491
std	0.054755	0.064650	0.132205	0.367764	0.348096	0.338930	0.794027	0.535682	1.956397	3.671565	2.608593	4.085639
min	-0.447499	-0.288483	0.122637	0.660856	0.796599	0.895284	0.000000	0.000000	0.740764	1.878169	0.000000	0.000000
25%	-0.025323	-0.048139	0.409171	1.329653	2.185601	2.397706	0.611078	0.375162	2.682415	6.642683	0.779966	1.432961
50%	0.001155	-0.010189	0.490398	1.561861	2.410403	2.590050	1.003079	0.634550	3.765500	8.798603	1.394714	2.412533
75%	0.029103	0.029081	0.565090	1.782317	2.604553	2.780330	1.477999	0.961974	5.287384	11.119435	2.640966	4.124739
max	0.348567	0.332299	1.089875	3.018505	3.798221	3.802972	5.595524	3.618076	10.811932	24.741341	19.225879	28.177610

Summary Stats Discussion
Above is the descriptive summary statistics for the final data sample that I am using for analysis. The first four sentiment statistics (LM/GHR positive/negative) are calculated as a percent of the total words in the document that are positive or negative words, so they are a gauge of the overall tone of each 10-K. The remaining six sentiment scores are NOT calculated as a percent. Instead these variables are calculated as a score, standardized by document length, that represents the amount of times a topic was discussed positively or negatively (using BHR lexicon). As for the special topic sentiment scores, I am surprised by the magnitude, but not by the results. As an example for proof that these are logical scores: it makes sense that most firms only discuss diversity in a neutral tone, not being overly positive or negative. In comparison, it is extremely common to lose out on opportunities, or realize losses because of regulations (political or environmental), and so the sentiment being negative for both is expected. Additionally, the degree to which companies talk about the environment in general is higher than diversity (environment is a safer topic) but lower than politics/regulation (can’t avoid talking about regulation in a firm).

Results

Correlation Data for each Sentiment Score

# Figure 1
df.corr(numeric_only=True).iloc[:2].drop(columns=["return_t_t2","return_t3_t10"])

	LM Positive	LM Negative	GHR Positive	GHR Negative	Diversity P-score	Diversity N-score	Politics P-score	Politics N-score	Environment P-score	Environment N-score
return_t_t2	-0.081006	-0.004184	0.042264	0.076401	-0.030131	-0.002085	-0.033906	-0.004680	0.093036	0.105852
return_t3_t10	-0.044273	-0.127429	-0.030688	0.050327	-0.065398	-0.016134	0.140764	0.158657	0.274920	0.282851

Correlation Heatmap

# Figure 2
corr = df.drop(columns=['Symbol','Filing_Date']).corr()

fig, ax = plt.subplots(figsize=(9,9))
plt.title("Correlation between Sentiment and Returns")
ax = sns.heatmap(corr, 
                 center=0,square=True,
                 cmap=sns.diverging_palette(230, 20, as_cmap=True),
                 mask=np.triu(np.ones_like(corr, dtype=bool)),
                 cbar_kws={"shrink": .5},
                )

png

Interesting to note that each score is most strongly associated with the negative/positive counterpart to that score. This means that some significant overlap was occurring, which I expected would happen.

Comparing the Correlations of Environment Scores with t3-t10 returns

Strongest correlations to returns

# Figure 3
f3 = sns.jointplot(data=df,
                  x="Environment P-score", y="return_t3_t10", kind='reg') 
f3.fig.suptitle('Correlation Between Environment P-score and Return')
f3.fig.subplots_adjust(top=0.95)

png

# figure 4
f4 = sns.jointplot(data=df,
                  x="Environment N-score", y="return_t3_t10", kind='reg') 
f4.fig.suptitle('Correlation Between Environment N-score and Return')
f4.fig.subplots_adjust(top=0.95)

png

Discussion Topics

Compare / contrast the relationship between the returns variable and the two “LM Sentiment” variables (positive and negative) with the relationship between the returns variable and the two “ML Sentiment” variables (positive and negative). Focus on the patterns of the signs of the relationships and the magnitudes.

The LM sentiment variables are created using the list of positive/negative words that is 86,531 words long. Created as a comprehensive and all inclusive list for sentiment analysis, this list of words was created by people and based on a previous list that is criticized and debunked called H4N. ML (GHR) sentiment variables are created the same way as LM variables, except the list of positive/negative words is only 94 words long, and this list was created by a machine learning algorithm. Despite this difference in length, my analysis (Figure 1) shows that the ML (GHR) sentiment variables are better correlated with both windows of returns in nearly all categories. LM variables are using a much longer list of possible words (920 times longer), and yet they are receiving less hits when parsing the 10-K. This would indicate that, for modern 10-K’s, the ML (GHR) list of words is a better indicator of stock price.

If your comparison/contrast conflicts with Table 3 of the Garcia, Hu, and Rohrer paper (ML_JFE.pdf, in the repo), discuss and brainstorm possible reasons why you think the results may differ. If your patterns agree, discuss why you think they bothered to include so many more firms and years and additional controls in their study? (It was more work than we did on this midterm, so why do it to get to the same point?)

The patterns shown in my analysis are congruent with those that appeared in the GHR paper. To quote the paper, “LM sentiment scores are barely associated with the stock price reactions during the release of 10-K statements” (Page 534). All of the correlation data for LM sentiment scores are negative (Figure 1), which would indicate a negative correlation; however, the magnitude of the correlations are minimal, and few conclusions can be drawn besides comparison. For comparison, the ML lists showed a mostly positive correlation with returns. While the magnitude of this correlation is similarly insignificant, the positive correlation, and consistency across time frames makes ML (GHR) the preferred list of words for analysis. So why did they bother do include so many more firms and years and controls? The best answer here is: for safety. While some conclusions can surely be drawn from the data I collected for this report, overall, it requires a larger timeframe and more 10-K’s to prove that any significant relationships exist. If the correlation data showed stronger/weaker relationships with less points of data, we would have to question the validity of that report. For the ML (GHR) paper, it is worth noting that they are doing a similar analysis, but using it to predict future stock prices, and prediction is out of the scope of a project/sample the size of mine. The moral is, more data is always better.

Discuss your 3 “contextual” sentiment measures. Do they have a relationship with returns that looks “different enough” from zero to investigate further? If so, make an economic argument for why sentiment in that context can be value relevant.

Diversity & Politics
The diversity and political sentiment scores showed almost no correlation, or actually, a negative correlation. Unfortunately for my analysis, these relationships are essentially random. That being said, there is a significant outlier in terms of correlation and that is political sentiment score’s relationship with returns three to ten days out. Interestingly, it is not correlated with whether the firm talks positively or negatively about politics or regulation, but rather whether they mention the situations at all. As is shown in Figure 2 (heat map), it is both the positive and negative sentiment regarding the environment that have positive correlations with returns (t3-t10). One possibility is that firms who are spending more time learning and writing about regulations and the current political climate will be more suited to adapt when regulations arrive.

The Environment
The environment sentiment scores show a pretty distinct positive relationship (in comparison with relevant data) when plotted against returns for day 4 to day 11. There was a less distinct, but also notable positive relationship with immediate returns from day 1 to day 3 (Figure 2). One possible reason for these relationships is similar to the reason I stated for political climate; it could be that firms who update themselves regularly to adapt to shifting environmental regulation see better returns when investors see that they are continuing to commit themselves to environmental goals.

Overall
All 3 of my special topic sentiment variables do not have a very notable difference between their positive and negative scores. This is partly due to my choice of words in the regex, as part of my goal in this project was to determine whether there were significant differences between the way firms talk about politics/environment and the regulations surrounding those topics. As can be seen in the summary statistics above Figure 1, the negative sentiment scores for both environment and politics had significantly higher standard deviations than their positive counterpart. This is logical and telling, because we would expect firms to be more hesitant to be critical of regulation or say that it is hurting them. This hesitancy would lead to some firms mentioning regulation in negative light significantly more than others, who are more reserved in language (or less impacted). To drive the point home, take notice of the diversity sentiment score’s standard deviation. This standard deviation is so much smaller than the other two because firms are likely to say what is expected of them on the topic and nothing more, to avoid being controversial.

Is there a difference in the sign and magnitude? Speculate on why or why not.

The ML(GHR) sentiment measures are mostly correlated positively with the return values, but the correlation is very small, and nearly insignificant. The difference in magnitude is extremely insignificant between the two return measures, especially for the list of negative words. The negative word list includes words such as: impacts, affected, and happened. While this list has been proven to be effective, the ambiguity of words such as “happened” could lead to false positives, or overlap between positive and negative sentiment. Despite this lack of clear evidence for a trend, the difference between ML positive and negative scores is significant enough to warrant further exploration. While there isn’t time in this report to dive extremely deep on the topic, I will speculate as to why there is such a difference in the return correlation value of positive and negative sentiment, and why it is only for the t+3 7 day returns. One explanation for this difference is, in the perspective of firms, if your 10-K has lots of positive words in it, it doesn’t mean much because all firms try to show off a little in 10-K’s (at least language-wise). However, firms will tend to avoid saying anything negative about their status unless they absolutely must. So in this case, it is probable that firms who release lots of negative sentiment see their stock price go down, leading to the positive correlation. In contrast, when firms release lots of positive sentiment, investors see it as business as usual, and the stock price wouldn’t react.