Publications and Presentations

Implications of Social Media on Disaster Response: Commentary on the Flint Twitterverse.

Book Chapter

Gina R. Méndez, Megan Stubbs-Richardson, Somya D. Mohanty, Arthur G. Cosby

T. Thornton Neaves, A. D. Williams, K. M. Simon & J. F. Sklarew (Eds.), Managing Challenges for the Flint Water Crisis (pp. 1-17). Westphalia Press, 2021

Publication year: 2021

The Flint Water Crisis is a profound humanitarian disaster for the citizens of Flint, Michigan. It is also an event that has captured the attention of individuals throughout the United States and indeed the world through extensive media coverage. It is unthinkable to many that a developed country can have a city whose water supply is poisoning its citizens and that the government failed to respond in an appropriate, timely manner to the water contamination. Given the increasing use of internet-based communication, this technological crisis created a high volume of human communication in the digital news and social media. It is apparent that humans are using social media as a new form of adaptation for dealing with extreme events and its challenges such as the Flint water crisis (Bernabé-Moreno et al.2014, Hossmann 2011, Saleem et al. 2014). In order to explore the possibilities and pitfalls of online communication during critical events, this chapter will discuss the collective ability of social media users to communicate, reach out to others for collective action, and organize in response to the negative consequences of the Flint disaster through the lens of Twitter. Rather than focusing on the technical aspects of data collection and analysis, our goal is to reach a wide variety of audiences with a key message, social media has the capacity to transform the way public and private sectors and civil society manage critical events in general and technological disasters in particular. The chapter starts by describing the event as observed in Twitter followed by some inferences from the data, building on former theoretical and empirical work about social media and disasters.

Hospital readmission outcomes by frailty risk in behavioral health adults

Journal paper

Deborah A. Lekan, Thomas P. McCoy, Marjorie Jenkins, Somya D. Mohanty, Prashanti Manda, Reham Yasin

Journal of Psychosocial and Mental Health Nursing

Publication year: 2021

The purpose of this retrospective study is to determine whether frailty is predictive of 30-day readmission in adults 50 years of age and older who were admitted with a psychiatric diagnosis to a behavioral health hospital, 2013-2017. A total of 1,063 patients were included. A 26-item frailty risk score (FRS-26-ICD10) was constructed from electronic health record (EHR) data. There were 114 readmissions.Cox regression modeling for demographic characteristics, emergent admission, comorbidity, and FRS-26-ICD determined prediction of time to readmission was modest (iAUC=0.671; the FRS-26-ICD was a significant predictor of readmission alone and in models with demographics and emergent admission; however, only the ECI was significantly related to hazard of readmission adjusting for other factors (adj.HR = 1.26, 95% CI=(1.17, 1.37), p<0.001) while FRS-26-ICD became non-significant. Frailty is a relevant syndrome in behavioral health that should be further studied in risk prediction and incorporated into care planning to prevent readmissions.

Comparison of a frailty risk score and comorbidity indices for hospital readmission using electronic health record data.

Journal paper

Deborah A. Lekan, Thomas P. McCoy, Marjorie Jenkins, Somya D. Mohanty, Prashanti Manda, Reham Yasin

Research in Gerontological Nursing, 14. 2

Publication year: 2021

The purpose of the current study was to investigate the predictive properties of five definitions of a frailty risk score (FRS) and three comorbidity indices using data from electronic health records (EHRs) of hospitalized adults aged ≥50 years for 3-day, 7-day, and 30-day readmission, and to identify an optimal model for a FRS and comorbidity combination. Retrospective analysis of the EHR dataset was performed, and multivariable logistic regression and area under the curve (AUC) were used to examine readmission for frailty and comorbidity. The sample (N = 55,778) was mostly female (53%), non-Hispanic White (73%), married (53%), and on Medicare (55%). Mean FRSs ranged from 1.3 (SD = 1.5) to 4.3 (SD = 2.1). FRS and co- morbidity were independently associated with readmission. Predictive accuracy for FRS and comorbidity combinations ranged from AUC of 0.75 to 0.77 (30-day readmission) to 0.84 to 0.85 (3-day readmission). FRS and comorbidity combinations performed similarly well, whereas comorbidity was always indepen- dently associated with readmission. FRS measures were more associated with 30-day readmission than 7-day and 3-day readmission.

Automated ontology-based annotation of scientific literature using deep learning.

Workshop

Somya D. Mohanty, Prashanti Manda, and Saed Sayedahmed

ACM Special Interest Group onManagement of Data (SIGMOD) 2020

Publication year: 2021

A multi-modal approach towards mining social media data during natural disasters - a case study of Hurricane Irma.

Journal paper

Somya D.Somya D. Mohanty, Brown Biggers, Saed Sayedahmed, Nastaran Pourebrahim, Evan B. Goldstein, Rick Bunch, Guangqing Chi, Fereidoon Sadri, Tom P. McCoy, and Arthur Cosby.

International Journal of Disaster Risk Reduction: 102032, 2021

Publication year: 2021

Streaming social media provides a real-time glimpse of extreme weather impacts. However, the volume of streaming data makes mining information a challenge for emergency managers, policy makers, and disciplinary scientists. Here we explore the effectiveness of data learned approaches to mine and filter information from streaming social media data from Hurricane Irma’s landfall in Florida, USA. We use 54,383 Twitter messages (out of 784 K geolocated messages) from 16,598 users from Sept. 10–12, 2017 to develop 4 independent models to filter data for relevance: 1) a geospatial model based on forcing conditions at the place and time of each tweet, 2) an image classification model for tweets that include images, 3) a user model to predict the reliability of the tweeter, and 4) a text model to determine if the text is related to Hurricane Irma. All four models are inde- pendently tested, and can be combined to quickly filter and visualize tweets based on user-defined thresholds for each submodel. We envision that this type of filtering and visualization routine can be useful as a base model for data capture from noisy sources such as Twitter. The data can then be subsequently used by policy makers, environmental managers, emergency managers, and domain scientists interested in finding tweets with specific attributes to use during different stages of the disaster (e.g., preparedness, response, and recovery), or for detailed research.

psi-collect: A Python module for post-storm image collection and cataloging.

Open Source Software

Matthew C. Moretz, Daniel Foster, Jonathan Weber, Rinty Chowdhury, Shah N. Rafique, Evan B. Goldstein, Somya D. Mohanty

The Journal of Open-Source Software (JOSS), 5(47), 2075, 2020.

Publication year: 2020

psi-collect is a command line tool for collecting post storm imagery from National Geodetic Survey (NGS) Remote Sensing Division of the US National Oceanographic and Atmospheric Administration. The tool enables reproducible computational workflows in downstream learning and labeling tasks and uses parallel processing to capture over 100,000 images each with an average size of 7.7 Mb from several different sources.

Data Science Course Ecosystem Mapping.

Invited Talks

Somya D. Mohanty

The Institute for Data,Evaluation, and Analytics (IDEA)

Publication year: 2020

Automated ontology-based annotation of scientific literature using deep learning.

Workshop

Prashanti Manda, Saed Sayedahmed, Somya D. Mohanty

Proceedings of the International Workshop on Semantic Big Data, SBD - In conjunction with the 2020 ACM SIGMOD/PODS Conference

Publication year: 2020

Representing scientific knowledge using ontologies enables data integration, consistent machine-readable data representation, and allows for large-scale computational analyses. Text mining approaches that can automatically process and annotate scientific literature with ontology concepts are necessary to keep up with the rapid pace of scientific publishing. Here, we present deep learning models (Gated Recurrent Units (GRU) and Long Short Term Memory (LSTM)) combined with different input encoding formats for automated Named Entity Recognition (NER) of ontology concepts from text. The Colorado Richly Annotated Full Text (CRAFT) gold standard corpus was used to train and test our models. Precision, Recall, F-1, and Jaccard semantic similarity were used to evaluate the performance of the models. We found that GRU-based models outperform LSTM models across all evaluation metrics. Surprisingly, considering the top two probabilistic predictions of the model for each instance instead of the top one resulted in a substantial increase in accuracy. Inclusion of ontology semantics via subsumption reasoning yielded modest performance improvement.

An Active Learning Pipeline to Detect Hurricane Washover in Post-Storm Aerial Images.

Workshop Presentation

Evan Goldstein,Somya D. Mohanty, Shah Rafique, and Jamision Valentine.

NeurIPSAI for Earth Sciences Workshop, Conference on Neural Information Processing Systems(NeurIPS). Dec. 2020.

Publication year: 2020

An Active Learning Pipeline to Detect Hurricane Washover in Post-Storm Aerial Images.

Workshop

Evan B. Goldstein, Somya D. Mohanty, Shah N. Rafique, Jamison Valentine

Workshop on AI in Earth Science, 2020 Conference on Neural Information Processing Systems, 2020.

Publication year: 2020

We present an active learning pipeline to identify hurricane impacts on coastal landscapes. Previously unlabeled post-storm images are used in a three component workflow — first an online interface is used to crowd-source labels for imagery; second, a convolutional neural network is trained using the labeled images; third, model predictions are displayed on an interactive map. Both the labeler and interactive map allow coastal scientists to provide additional labels that will be used to develop a large labeled dataset, a refined model, and improved hurricane impact assessments.

A multi-modal machine learning approach towards predicting patient readmission.

Conference Presentation

Somya D. Mohanty, Deborah Lekan, Thomas McCoy, Marjorie Jenkins, and Prashanti Manda

IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Nov. 2020.

Publication year: 2020

A multi-modal machine learning approach towards predicting patient readmission.

Conference paper

Somya D. Mohanty, Thomas P. McCoy, Prashanti Manda, Deborah A. Lekan, Marjorie Jenkins

2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2027-2035, June, 2020.

Publication year: 2020

Healthcare costs that can be attributed to unplanned readmissions are staggeringly high and negatively impact health and wellness of patients. In the United States, hospital systems and care providers have strong financial motivations to reduce readmissions in accordance with several government guidelines. One of the critical steps to reducing readmissions is to recognize the factors that lead to readmission and correspondingly identify at-risk patients based on these factors. The availability of large volumes of electronic health care records make it possible to develop and deploy automated machine learning models that can predict unplanned readmissions and pinpoint the most important factors of readmission risk. While hospital readmission is an undesirable outcome for any patient, it is more so for medically frail patients. Here, we develop and compare four machine learn- ing models (Random Forest, XGBoost, CatBoost, and Logistic Regression) for predicting 30-day unplanned readmission for patients deemed frail (Age ≥ 50). Variables that indicate frailty, comorbidities, high risk medication use, demographic, hospital and insurance were incorporated in the models for prediction of unplanned 30-day readmission. Our findings indicate that CatBoost outperforms the other three models (AUC 0.80) and prior work in this area. We find that constructs of frailty, certain categories of high risk medications, and comorbidity are all strong predictors of readmission for elderly patients.

A Data Science Approach for Curriculum Mapping.

Invited Talks

Somya D. Mohanty

UNCG - RENCI Collaboration.

Publication year: 2020

Using administrative and laboratory data from the electronic health record to examine frailty indicators for hospital readmission.

Conference Presentation

Deborah Lekan, Thomas McCoy,Somya D. Mohanty, Prashanti Manda, Rohit Gulia,and Marjorie Jenkins.

International Conference onFrailty and Sarcopenia. Feb. 2019.

Publication year: 2019

Understanding communication dynamics on Twitter during natural disasters: A case study of Hurricane Sandy

Journal paper

Nastaran Pourebrahim, Selima Sultana, Gochanour Amanda, John F. Edwards, and Somya D Mohanty.

International Journal of Disaster Risk Reduction 37 (2019): 101176.

Publication year: 2019

This study investigates Twitter usage during Hurricane Sandy following the survey of the general population and exploring communication dynamics on Twitter through different modalities. The results suggest that Twitter is a highly valuable source of disaster-related information particularly during the power outage. With a substantial increase in the number of tweets and unique users during the Hurricane Sandy, a large number of posts contained firsthand information about the hurricane showing the intensity of the event in real-time. More specifically, a number of images of damage and flooding were shared on Twitter through which researchers and emergency managers can retrieve valuable information to help identify storm damages and plan relief efforts. The social media analysis revealed the most important information that can be derived from twitter during disasters so that authorities can successfully utilize such data. The findings provide insights into the choice of keywords and sentiments and identifying the influential actors at different stages of disasters. A number of key influencers and their followers from different domains including political, news, weather, and relief organizations participated in Twitter-based discussions related to Hurricane Sandy. The connectivity of the influencers and their followers on Twitter plays a vital role in information sharing and dissemination throughout the hurricane. These connections can provide an effective vehicle for emergency managers towards establishing better bi-directional communication during disasters. However, while government agencies were among the prominent Twitter users during the Hurricane Sandy, they primarily relied on one-way communication rather than engaging with their audiences, a challenge that need to be addressed in future research.

Comparison of two comorbidity indices and a hospital risk score in prediction of early and late rehospitalization.

Conference Presentation

Deborah Lekan, Thomas McCoy,Somya D. Mohanty, Prashanti Manda, Rohit Gulia,and Marjorie Jenkins.

Cone Health Nursing Research and Evidence-based Practice Symposium. Nov. 2019.

Publication year: 2019

Comparison of a frailty risk score and comorbidity for early rehospitalizationusing electronic health record data

Conference Presentation

Deborah Lekan, Thomas McCoy,Somya D. Mohanty, Prashanti Manda, Rohit Gulia, and Marjorie Jenkins.

Annual Scientific Meetings of the Gerontological Society of America. Nov. 2019.

Publication year: 2019

Assessing relevance of tweets in natural disaster using a multi-model approach.

Poster

Nastaran Pourebrahim, Brown Biggers, Saed Sayedahmed,Somya D. Mohanty, RickBunch, Evan Goldstein, Freidoon Sadri, and Jo Klien.

UNCG Graduate Expo. Apr. 2019.

Publication year: 2019

A data-driven approach to predicting diabetes and cardiovascular disease with machine learning.

Journal paper

An Dinh, Stacy Miertschin, Amber Young, Somya D. Mohanty

BMC medical informatics and decision making, 19(1), 211, 2019

Publication year: 2019

Background: Diabetes and cardiovascular disease are two of the main causes of death in the United States. Identifying and predicting these diseases in patients is the first step towards stopping their progression. We evaluate the capabilities of machine learning models in detecting at-risk patients using survey data (and laboratory results), and identify key variables within the data contributing to these diseases among the patients.

Methods: Our research explores data-driven approaches which utilize supervised machine learning models to identify patients with such diseases. Using the National Health and Nutrition Examination Survey (NHANES) dataset, we conduct an exhaustive search of all available feature variables within the data to develop models for cardiovascular, prediabetes, and diabetes detection. Using different time-frames and feature sets for the data (based on laboratory data), multiple machine learning models (logistic regression, support vector machines, random forest, and gradient boosting) were evaluated on their classification performance. The models were then combined to develop a weighted ensemble model, capable of leveraging the performance of the disparate models to improve detection accuracy. Information gain of tree-based models was used to identify the key variables within the patient data that contributed to the detection of at-risk patients in each of the diseases classes by the data-learned models.

Results: The developed ensemble model for cardiovascular disease (based on 131 variables) achieved an Area Under – Receiver Operating Characteristics (AU-ROC) score of 83.1% using no laboratory results, and 83.9% accuracy with laboratory results. In diabetes classification (based on 123 variables), eXtreme Gradient Boost (XGBoost) model achieved an AU-ROC score of 86.2% (without laboratory data) and 95.7% (with laboratory data). For pre-diabetic patients, the ensemble model had the top AU-ROC score of 73.7% (without laboratory data), and for laboratory based data XGBoost performed the best at 84.4%. Top five predictors in diabetes patients were 1) waist size, 2) age, 3) self-reported weight, 4) leg length, and 5) sodium intake. For cardiovascular diseases the models identified 1) age, 2) systolic blood pressure, 3) self-reported weight, 4) occurrence of chest pain, and 5) diastolic blood pressure as key contributors.

Conclusion: We conclude machine learned models based on survey questionnaire can provide an automated identification mechanism for patients at risk of diabetes and cardiovascular diseases. We also identify key contributors to the prediction, which can be further explored for their implications on electronic health records.

What we learn from learning- Understanding capabilities and limitations of machine learning in botnet attacks

Conference paper

David Santana, Shan Suthaharan, and Somya D Mohanty.

In: Proceedings of the 2018 International Conference on Security & Management - SAM’18, 2018 World Congress in Computer Science, Computer Engineering, & Applied Computing (June 2018).

Publication year: 2018

With a growing increase in botnet attacks, computer networks are constantly under threat from attacks that cripple cyber-infrastructure. Detecting these attacks in real-time proves to be a difficult and resource intensive task. One of the pertinent methods to detect such attacks is signature based detection using machine learning models. This paper explores the efficacy of these models at detecting botnet attacks, using data captured from large-scale network attacks. Our study provides a comprehensive overview of performance characteristics two machine learning models — Random Forest and Multi-Layer Perceptron (Deep Learning) in such attack scenarios. Using Big Data analytics, the study explores the advantages, limitations, model/feature parameters, and overall performance of using machine learning in botnet attacks / communication. With insights gained from the analysis, this work recommends algorithms/models for specific attacks of botnets instead of a generalized model.

Using Data from the EHR to Examine Frailty for Early Readmission among Hospitalized Older Adults.

Invited Talks

Deborah Lekan, Thomas McCoy, Prashanti Manda, Somya D Mohanty, Rohit Gulia, and Marjorie Jenkins.

Using Data from the EHR to Examine Frailty for Early Readmission among Hospitalized Older Adults.

Publication year: 2018

The Propagation of Counteracting Information in Online Social Networks: A Case Study.

Conference paper

Logan Rohde, Somya Mohanty, Jing Deng, Fereidoon Sadri

2018 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE. http://dx.doi.org/10.1109/icdmw.2018.00168

Publication year: 2018

Information propagation in online social networks has drawn a lot of attention from researchers in different fields. While prior works have studied the impact and speed of different information propagation in various networks, we focus on the potential interactions of two hypothetically opposite pieces of information, negative and positive. We experiment the amount of time that is allowed for the positive information to be distributed with wide enough impact after the negative information and different selection strategies for positive source nodes. Our results enable the selection of a set of users based on a limited operating budget to start the spread of positive information as a measure to counteract the spread of negative information. Among different methods, we identify that both eigenvector and betweenness centrality are effective selection metrics. Furthermore, we quantitatively demonstrate that choosing a larger set of nodes for the spread of positive information allows for a wider window of time to respond in order to limit the propagation of negative information to a certain threshold.

Taking a Dive: Experiments in Deep Learning for Automatic Ontology-based Annotation of Scientific Literature.

Invited Talks

Prashanti Manda, Lucas Beasley, and Somya D Mohanty.

International Conference on Biological Ontology 2018, Corvallis, Oregon, USA. Aug. 2018.

Publication year: 2018

Taking a Dive: Experiments in Deep Learning for Automatic Ontology-based Annotation of Scientific Literature

Conference paper

Prashanti Manda, Lucas Beasley, and Somya D Mohanty

In: International Conference on Biological Ontology 2018 (Aug. 2018).

Publication year: 2018

Text mining approaches for automated ontology-based curation of biological and biomedical literature have largely focused on syntactic and lexical analysis along with machine learning. Recent advances in deep learning have shown increased accuracy for textual data annotation. However, the application of deep learning for ontology-based curation is a relatively new area and prior work has focused on a limited set of models.

Here, we introduce a new deep learning model/architecture based on combining multiple Gated Recurrent Units (GRU) with a character+word based input. We use data from five ontologies in the CRAFT corpus as a Gold Standard to evaluate our model’s performance. We also compare our model to seven models from prior work. We use four metrics – Precision, Recall, F1 score, and a semantic similarity metric (Jaccard similarity) to compare our model’s output to the Gold Standard. Our model resulted in a 84% Precision, 84% Recall, 83% F1, and a 84% Jaccard similarity. Results show that our GRU-based model outperforms prior models across all five ontologies. We also observed that character+word inputs result in a higher performance across models as compared to word only inputs.

These findings indicate that deep learning algorithms are a promising avenue to be explored for automated ontology-based curation of data. This study also serves as a formal comparison and guideline for building and selecting deep learning models and architectures for ontology-based curation.

Obamacare On Twitter: Online Political Participation And Its Effects On Polarization

Journal paper

Gina Rico Mendez, Arthur G Cosby, and Somya D Mohanty.

In: TEORIJA IN PRAKSA 55.2 (May 2018), pp. 419–444.

Publication year: 2018

As Internet-based communications have expanded, online debating has become a significant form of political participation. This work examines online discussions around health care in the United States by analysing tweets about Obamacare and then assessing the degrees of polarisation in social media. The results indicate that highly influential entities in social media have an important capacity to polarise the public. Another relevant finding is that ideology is a powerful mechanism to frame online discussions by relegating policy arguments in online debates. Finally, this work shows that social media can easily promote negative sentiments towards ‘the other’, confirming group homogeneity in online communities.

Enhancing Trip Distribution Prediction with Twitter Data: Comparison of Gravity and Neural Networks

Conference paper

Nastaran Pourebrahim, Selima Sultana, Jean-Claude Thill, and Somya D Mohanty.

In: 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems - ACM SIGSPATIAL 2018 (Sept. 2018).

Publication year: 2018

Predicting human mobility within cities is an important task in urban and transportation planning. With the vast amount of digital traces available through social media platforms, we investigate the potential application of such data in predicting commuter trip distribution at small spatial scale. We develop back propagation (BP) neural network and gravity models using both traditional and Twitter data in New York City to explore their performance and compare the results. Our results suggest the potential of using social media data in transportation modeling to improve the prediction accuracy. Adding Twitter data to both models improved the performance with a slight decrease in root mean square error (RMSE) and an increase in R-squared (R2) value. The findings indicate that the traditional gravity models outperform neural networks in terms of having lower RMSE. However, the R2 results show higher values for neural networks suggesting a better fit between the real and predicted outputs. Given the complex nature of transportation networks and different reasons for limited performance of neural networks with the data, we conclude that more research is needed to explore the performance of such models with additional inputs.

Data-Driven Exploration of Factors Affecting Federal Student Loan Repayment.

Invited Talks

Bin Luo, Qi Zhang, and Somya D Mohanty.

14th International Conference on Data Science (ICDATA - 18), Las Vegas. Aug. 2018.

Publication year: 2018

Data-Driven Exploration of Factors Affecting Federal Student Loan Repayment

Conference paper

Bin Luo, Qi Zhang, and Somya D Mohanty.

In: Proceedings of the 2018 International Conference on Data Science ICDATA’18, 2018 World Congress in Computer Science, Computer Engineering, & Applied Computing (June 2018).

Publication year: 2018

Student loans occupy a significant portion of the federal budget, as well as, the largest financial burden in terms of debt for graduates. This paper explores data-driven approaches towards understanding the repayment of such loans. Using statistical and machine learning models on the College Scorecard Data, this research focuses on extracting and identifying key factors affecting the repayment of a student loan. The specific factors can be used to develop models which provide predictive capability towards repayment rate, detect irregularities/non-repayment, and help understand the intricacies of student loans.

Big Data Analysis of Scien- tific Publications.

Invited Talks

Darpan Jhawar, Prashanti Manda, and Somya D Mohanty.

International Conference on Advances in Interdisciplinary Statistics and Combinatorics - 2018, Greensboro. Oct. 2018.

Publication year: 2018

A Data-driven Approach to Predicting Diabetes and Cardiovascular Disease with Machine Learning.

Invited Talks

An Dinh, Stacey Miertschin, and Amber Young Somya D Mohanty.

American Statistical Association Research Education for Undergraduates - University of North Carolina - Greensboro. July 2018.

Publication year: 2018

A Data-Driven Analysis of Patient Re-hospitalization.

Invited Talks

Somya D Mohanty, Rohit Gulia, Deborah Lekan, Prashanti Manda, Thomas McCoy, and Marjorie Jenkins.

International Conference on Advances in Interdisciplinary Statistics and Combinatorics - 2018, Greensboro. Oct. 2018.

Publication year: 2018

What’s hot and what’s not? - Exploring trends in bioinformatics literature using topic modeling and keyword analysis

Conference paper

Alex Haan, Somya D Mohanty, and Prashanti Manda

The International Symposium on Bioinformatics Research and Applications - 2017. Mar. 2017

Publication year: 2017

Scientists exploring a new area of research are interested to know the “hot” topics in that area in order to make informed choices. With exponential growth in scientific literature, identifying such trends manually is not easy. Topic modeling has emerged as an effective approach to analyze large volumes of text. While this approach has been applied on literature in other scientific areas, there has been no formal analysis of bioinformatics literature.

Here, we conduct keyword and topic model-based analysis on bioinformatics literature starting from 1998 to 2016. We identify top keywords and topics per year and explore temporal popularity trends of those keywords/areas. Network analysis was conducted to identify clusters of sub-areas/topics in bioinformatics. We found that “big-data”, “next generation sequencing”, and “cancer” all experienced exponential increase in popularity over the years. On the other hand, interest in drug discovery has plateaued after the early 2000s.

Using Humans as Sensors - Twitter and Hurricane Sandy.

Invited Talks

Somya D Mohanty

Triad Developers Conference, Wake Forest, Winston Salem. Mar. 2017.

Publication year: 2017

Big (Data)2 Science.

Invited Talks

Somya D Mohanty

The Institute for Data, Evaluation, and Analytics - University of North Carolina - Greensboro. May 2017.

Publication year: 2017

Towards an Autonomic Intrusion Response System

Invited Talks

Stefano Iannucci, and Somya D. Mohanty

NSF Center of Autonomic Computing Annual Meeting – Texas Tech University, Lubbock TX., Apr. 2016

Publication year: 2016

OMT: A Dynamic Authenticated Data Structure for Security Kernels

Journal paper

Mohanty, Somya D, Mahalingam Ramkumar, and Naresh Adhikari

International Journal of Computer Networks & Communications 8.4

Publication year: 2016

We introduce a family of authenticated data structures — Ordered Merkle Trees (OMT) — and illustrate their utility in security kernels for a wide variety of sub-systems. Specifically, the utility of two types of OMTs: a) the index ordered merkle tree (IOMT) and b) the range ordered merkle tree (ROMT), are investigated for their suitability in security kernels for various sub-systems of Border Gateway Protocol (BGP), the Internet’s inter-autonomous system routing infrastructure. We outline simple generic security kernel functions to maintain OMTs, and sub-system specific security kernel functionality for BGP sub- systems (like registries, autonomous system owners, and BGP speakers/routers), that take advantage of OMT .

Democracy, Education, and Free Speech: The Importance of #FeesMustFall for Transnational Activism

Journal paper

Lindsey Peterson, Kentse Radebe, and Somya D. Mohanty

Societies Without Borders 11.1 (2016), p. 10.

Publication year: 2016

South African students across numerous university campuses joined together in the second half of 2015 to protest the rising cost of higher education. In addition to on-campus protesting, activists utilized Twitter to mobilize and communicate with each other, and, as the protests drew national attention, the hashtag# FeesMustFall began trending on Twitter. Then, what began as a localized movement against tuition increases became a global issue when a court interdict was granted by a South African court against the use of the# FeesMustFall hashtag. This paper traces that global spread of the# FeesMustFall hashtag on Twitter as a response to the extraordinary attempt to limit online free speech. In this paper, we analyze the global flow and geographic spread of the# FeesMustFall hashtag on Twitter. Our evidence supports the argument that the attempt to censor and curtail the protestors’ right to organize and share the hashtag in fact propelled the# FeesMustFall movement onto the international stage.

Assessment of Social Media Usage during Hurricane Sandy and the Development of a Twitter-based, Web Application for Real-time Communication of Storm-related Information

Invited Talks

John F. Edwards, Somya D. Mohanty, Partick Fitzpatrick, and David Martin

32nd Conference on Environmental Information Processing Technologies, New Orleans, Louisiana. Jan. 2016

Publication year: 2016

Anomaly/Event Prediction in High-Velocity Streaming Data

Invited Talks

Somya D Mohanty

International Conference on Advances in Interdisciplinary Statistics and Combinatorics. Oct. 2016.

Publication year: 2016

What do Tweets tell us?

Invited Talks

Arthur G. Cosby and Somya D Mohanty

Federal Communication Commission - Connect2Health. Feb. 2015

Publication year: 2015

Reliable Assurance Protocols for Information Systems

Conference paper

Mahalingam Ramkumar and Somya D Mohanty

International Conference on the Evolving Internet.

Publication year: 2015

The assurances provided by an assurance protocol for any information system (IS), extend only as much as the integrity of the assurance protocol itself. The integrity of the assurance protocol is negatively influenced by a) the complexity of the assurance protocol, and b) the complexity of the platform on which the assurance protocol is executed. This paper outlines a holistic Mirror Network (MN) framework for assuring information systems that seeks to minimize both complexities. The MN framework is illustrated using a generic cloud file storage system as an example IS.

Emerging Analytics in Big Data Research.

Invited Talks

Somya D Mohanty

Big Data Seminar for the Social and Policy Sciences at the Centre for Advanced Academic Studies, University of Zagreb, Dubrovnik, Croatia. July 2015.

Publication year: 2015

Assuring a Cloud Storage Service

Journal paper

Somya D Mohanty and Mahalingam Ramkumar

International Journal of Information Sciences and Computer Engineering

Publication year: 2015

A cloud storage assurance architecture (CSAA) for providing integrity, privacy and availability assurances regarding any cloud storage service is presented. CSAA is motivated by the fact that the complexity of components (software / hardware and personnel) that compose such a service, and lack of transparency regarding policies followed by the service makes conventional security mechanisms insufficient to provide convincing assurances to users. As it is impractical to rule out hidden undesired functionality in every component of the service, CSAA bootstraps all desired assurances from simple transformation procedures executed inside a low complexity trustworthy module; no component of the cloud storage service is trusted.

A Sentiment Analysis of U.S. Local Government Tweets: The Connection Between Tone and Citizen Participation

Journal paper

Staci Zavattaro, Eddie French, and Somya D Mohanty.

Government Information Quarterly

Publication year: 2015

As social media tools become more popular at all levels of government, more research is needed to determine how the platforms can be used to create meaningful citizen–government collaboration. Many entities use the tools in one-way, push manners. The aim of this research is to determine if sentiment (tone) can positively influence citizen participation with government via social media. Using a systematic random sample of 125 U.S. cities, we found that positive sentiment is more likely to engender digital participation but this was not a perfect one-to-one relationship. Some cities that had an overall positive sentiment score and displayed a participatory style of social media use did not have positive citizen sentiment scores. We argue that positive tone is only one part of a successful social media interaction plan, and encourage social media managers to actively manage platforms to use activities that spur participation.

A Primer on the Use of the Social Media Tracking and Analysis System (SMTAS)

Invited Talks

Somya D Mohanty

Big Data Seminar for the Social and Policy Sciences at the Centre for Advanced Academic Studies, University of Zagreb, Dubrovnik, Croatia. July 2015.

Publication year: 2015

#Hurricane: MSU Researchers leverage Twitter to weather storm

News Articles

Mississippi Agricultural and Forestry Experiment Station (MAFES), Mississippi State University

MAFES Discover Maganize, MSU

Publication year: 2015

Social Media Tracking and Analysis System

Invited Talks

Arthur G. Cosby and Somya D Mohanty

Mississippi Association of Grantmakers. Jan. 2014

Publication year: 2014

Mining Twitter

Invited Talks

Somya D Mohanty

Big Data Week, Mississippi State University. Oct. 2014.

Publication year: 2014

Coastal Processes and Hazards ? News. Tweeting in the Tempest: What We’re Learning From #Sandy.

News Articles

Coastal Storm Awareness Program - NOAA

http://www.seagrant.sunysb.edu/articles/t/tweeting-in-the-tempest- what-we-re-learning-from-sandy-coastal-processes-hazards-news. Coastal Storm Awareness Program - NOAA, 2014

Publication year: 2014

Big Data and Disaster Response - The Superstorm Sandy Case Study.

Invited Talks

Arthur G. Cosby and Somya D Mohanty.

Centers on the Public Service, George Mason University. Oct. 2014.

Publication year: 2014

Assessment of Social Media Usage during Severe Weather Events and the Development of a Twitter-based Model for Improved Communication of Storm-related Information.

Invited Talks

John F. Edwards, John Horton, and Somya D Mohanty

Coastal Storm Awareness Program. Feb. 2014.

Publication year: 2014

Applying the Social Media Tracking and Analysis System to Social Science Research

Conference Presentation

Robert C. McMillen, Somya D Mohanty, and John F. Edwards

Annual Conference - World Association for Public Opinion Research

Publication year: 2014

An Efficient Trusted Computing Base for MANET Security

Journal paper

Somya D Mohanty, Vinay Totakura, and Mahalingam Ramkumar

Journal of Information Security

Publication year: 2014

Devices participating in mobile ad hoc networks (MANET) are expected to strictly adhere to a uniform routing protocol to route data packets among themselves. Unfortunately, MANET devices, composed of untrustworthy software and hardware components, expose a large attack surface. This can be exploited by attackers to gain control over one or more devices, and wreak havoc on the MANET subnet. The approach presented in this paper to secure MANETs restricts the attack surface to a single module in MANET devices a trusted MANET module (TMM). TMMs are deliberately constrained to demand only modest memory and computational resources in the interest of further reducing the attack surface. The specific contribution of this paper is a precise characterization of simple TMM functionality suitable for any distance vector based routing protocol, to realize the broad assurance that “any node that fails to abide by the routing protocol will not be able to participate in the MANET”.

A Trustworthy Assurance-as-a-Service Architecture

Book Chapter Workshop

Mahalingam Ramkumar and Somya D Mohanty

Frontiers in Artificial Intelligence and Applications, 831 - 840

Publication year: 2014

Increasing complexity and inter-dependency of information systems (IS), and the lack of transparency regarding system components and policies, have rendered traditional security mechanisms (applied at different OSI levels) inadequate to provide convincing confidentiality-integrity-availability (CIA) assurances regarding any IS. We present an architecture for a generic, trustworthy assurance-as-a-service IS, which can actively monitor the integrity of any IS, and provide convincing system-specific CIA assurances to users of the IS. More importantly no component of the monitored IS itself is trusted in order to provide assurances regarding the monitored IS.

Utilizing Social Media to Understand Human Interaction with Extreme Media Events: The Superstorm Sandy Beta Test.

Invited Talks

NWS Climate Services Seminar Series. National Weather Service, 2013.

Publication year: 2013

Social Media Tracking and Analysis System: The Superstorm Sandy Beta Test.

Invited Talks

Arthur G. Cosby and Somya D Mohanty

Lowder Lecture Series, University of Alabama. Apr. 2013

Publication year: 2013

Social Media Tracking and Analysis System - Privacy Issues.

Invited Talks

Arthur G. Cosby and Somya D Mohanty

Harvard Law School. Feb. 2013.

Publication year: 2013

Social Media Tracking and Analysis System - Applicability in Public Health.

Invited Talks

Arthur G. Cosby and Somya D Mohanty

Harvard School of Public Health. Feb. 2013.

Publication year: 2013

Social media and Understanding its Impact in Current Communication Technologies.

Invited Talks

Somya D Mohanty

Guest Lecture: Information and Communication Technologies in Globalization Process, Mississippi State. Nov. 2013

Publication year: 2013

Social Media and Tracking Analysis System (SMTAS)

Invited Talks

Somya D Mohanty

Thompson Congressional Staff Visit, SSRC. Sept. 2013

Publication year: 2013

Network Analysis of Twitter During Hurricane Sandy

News Articles

Innovative Data Laboratory, Social Science Research Center

http: //blog.idl.ssrc.msstate.edu/?p=42. Social Science Research Center, 2013

Publication year: 2013

Going beyond the trend on social media.

News Articles

Pointe Innovation Magazine, Innovate Missississippi

http://innovatems. uberflip.com/i/229713-point-innovation-magazine-winter-2013/31. Innovate Mississippi, 2013

Publication year: 2013

An Efficient TCB for a Generic Data Dissemination System

Conference paper

Arun Velagapalli, Somya D Mohanty, and Mahalingam Ramkumar

IEEE International conference on Communications in China, Communication Theory and Security Symposium. IEEE. 2012,

Publication year: 2012

Several applications fall under the broad umbrella of data dissemination systems (DDS), where providers and consumers of information rely on untrusted, or even unknown middle-men to disseminate and acquire data. This paper proposes a security architecture for a generic DDS by identifying a minimal trusted computing base (TCB) for middle-men and leveraging the TCB to provide useful assurances regarding the operation of the DDS. A precise characterization of the TCB is provided as a set of simple functions that can be executed even inside a severely resource limited trustworthy boundary. A core feature of the proposed approach is the ability of even resource limited modules to maintain an index ordered merkle tree (IOMT).

An Efficient TCB for a Generic Content Distribution System

Conference paper

Somya D Mohanty, Arun Velagapalli, and Mahalingam Ramkumar

Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2012 International Conference on. IEEE. 2012, pp. 5–12

Publication year: 2012

We consider the security requirements for a broad class of content distribution systems where the content distribution infrastructure is required to strictly abide by access control policies prescribed by owners of content. We propose a security solution that identifies a minimal trusted computing base (TCB) for a content distribution infrastructure, and leverages the TCB to provide all desired assurances regarding the operation of the infrastructure. It is assumed that the contents and access control policies associated with contents are dynamic.

Securing File Storage in an Untrusted Server-Using a Minimal Trusted Computing Base.

Conference paper

Somya D Mohanty and Mahalingam Ramkumar.

International Conference on Cloud Computing and Services Science (CLOSER), 2011, pp. 460–470

Publication year: 2011

In applications such as remote file storage systems, an essential component of cloud computing systems, users are required to rely on untrustworthy servers. We outline an approach to secure such file storage systems by relying only on a resource limited trusted module available at the server, and more specifically, without the need to trust any component of the server or its operator(s). The proposed approach to realize a trusted file storage system (TFSS) addresses some shortcomings of a prior effort (Sarmenta et al., 2006) which employs a merkle hash tree to guarantee freshness. We argue that the shortcomings stem the inability to verify non- existence. The TFSS described in this paper relies on index ordered merkle trees (IOMT) to gain the ability to verify non-existence.

Somya D. Mohanty

University of North Carolina - Greensboro