Clustering and Analysis of Tweets Related to Petrobras

Autores

DOI:

https://doi.org/10.12957/cadinf.2024.82401

Resumo

This study aimed to cluster and analyze tweets associated with Petrobras, exploring its meaning and user profiles on social media to understand their impact on financial markets. The research applied a workflow including the data collection from Twitter's API (current X), preprocessing of tweets using Python libraries, word vectorization via Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF), Principal Component Analysis (PCA) to reduce matrix dimensionality, and the K-means clustering technique. A total of 840 preprocessed tweets were clustered and analyzed for patterns related to Petrobras. Five clusters were identified in the initial analysis with no dimensionality reduction, showcasing differing characteristics, while the subsequent PCA-based analysis yielded three clusters showing contrasting themes in tweets. The PCA-based analysis showed grouped tweets about the market and economy (cluster 0), while cluster 1 was related to political concerns. Limitations included reliance on publicly available Twitter data, constraints due to the quantity and nature of tweets, and potential biases in sentiment analysis due to informal language and sarcasm. The research underscores the potential of unsupervised machine learning techniques in analyzing sentiments and user profiles related to financial markets. Insights derived from tweet clustering could aid investors in gauging market sentiment.

Downloads

Não há dados estatísticos.

Biografia do Autor

Demetrius Milton Murato, Universidade Tecnológica Federal do Paraná - Campus Londrina

Graduado em Engenharia de Produção e Analista de Operações na BTG Pactual

Bruno Samways dos Santos, Universidade Tecnológica Federal do Paraná - Campus Londrina

Professor Ajunto do Departamento Acadêmico de Engenharia de Produção

Rafael Henrique Palma Lima, Universidade Tecnológica Federal do Paraná - Campus Londrina

Professor Adjunto do Departamento Acadêmico de Engenharia de Produção.

Downloads

Publicado

2024-08-06

Como Citar

Murato, D. M., Samways dos Santos, B., & Palma Lima, R. H. (2024). Clustering and Analysis of Tweets Related to Petrobras. Cadernos Do IME - Série Informática, 49, 113–131. https://doi.org/10.12957/cadinf.2024.82401

Edição

Seção

Artigos