Document Type


Publication Date

Spring 2021


Today, social media has grown in usage to the point where it is often deeply intertwined with life offline. People share their thoughts, passions, and lives online, and in many ways these social networks can be considered abstractions of real-world society. The idea for this research is that by modeling on these social networks, these glimpses into people’s lives through their words and posts is capable of showing their current health situation, and their susceptibly to outside influences affecting it. The goal of this research project is to design and implement unsupervised machine learning techniques to group together subnetworks of connected individuals in hopes that it may be beneficial to current disease surveillance systems. Using the Python programming language and the libraries available to it, data was collected from the social network platform Twitter, and analyzed using three clustering and centrality measurements. The criterion to be included in the data found tweets containing symptomatic key words, like those of which experienced by people afflicted with the novel coronavirus disease (COVID-19). It is our findings in this research that by simulating the real-world connections that people have using their virtual connections, their surrounding cliques become discoverable. Providing new possibilities for viral control and disease prevention using easily sourced, and quickly gatherable information.