Conference paper

A multi-modal machine learning approach towards predicting patient readmission.

Conference paper

Somya D. Mohanty, Thomas P. McCoy, Prashanti Manda, Deborah A. Lekan, Marjorie Jenkins

2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2027-2035, June, 2020.

Publication year: 2020

Healthcare costs that can be attributed to unplanned readmissions are staggeringly high and negatively impact health and wellness of patients. In the United States, hospital systems and care providers have strong financial motivations to reduce readmissions in accordance with several government guidelines. One of the critical steps to reducing readmissions is to recognize the factors that lead to readmission and correspondingly identify at-risk patients based on these factors. The availability of large volumes of electronic health care records make it possible to develop and deploy automated machine learning models that can predict unplanned readmissions and pinpoint the most important factors of readmission risk. While hospital readmission is an undesirable outcome for any patient, it is more so for medically frail patients. Here, we develop and compare four machine learn- ing models (Random Forest, XGBoost, CatBoost, and Logistic Regression) for predicting 30-day unplanned readmission for patients deemed frail (Age ≥ 50). Variables that indicate frailty, comorbidities, high risk medication use, demographic, hospital and insurance were incorporated in the models for prediction of unplanned 30-day readmission. Our findings indicate that CatBoost outperforms the other three models (AUC 0.80) and prior work in this area. We find that constructs of frailty, certain categories of high risk medications, and comorbidity are all strong predictors of readmission for elderly patients.

What we learn from learning- Understanding capabilities and limitations of machine learning in botnet attacks

Conference paper

David Santana, Shan Suthaharan, and Somya D Mohanty.

In: Proceedings of the 2018 International Conference on Security & Management - SAM’18, 2018 World Congress in Computer Science, Computer Engineering, & Applied Computing (June 2018).

Publication year: 2018

With a growing increase in botnet attacks, computer networks are constantly under threat from attacks that cripple cyber-infrastructure. Detecting these attacks in real-time proves to be a difficult and resource intensive task. One of the pertinent methods to detect such attacks is signature based detection using machine learning models. This paper explores the efficacy of these models at detecting botnet attacks, using data captured from large-scale network attacks. Our study provides a comprehensive overview of performance characteristics two machine learning models — Random Forest and Multi-Layer Perceptron (Deep Learning) in such attack scenarios. Using Big Data analytics, the study explores the advantages, limitations, model/feature parameters, and overall performance of using machine learning in botnet attacks / communication. With insights gained from the analysis, this work recommends algorithms/models for specific attacks of botnets instead of a generalized model.

The Propagation of Counteracting Information in Online Social Networks: A Case Study.

Conference paper

Logan Rohde, Somya Mohanty, Jing Deng, Fereidoon Sadri

2018 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE. http://dx.doi.org/10.1109/icdmw.2018.00168

Publication year: 2018

Information propagation in online social networks has drawn a lot of attention from researchers in different fields. While prior works have studied the impact and speed of different information propagation in various networks, we focus on the potential interactions of two hypothetically opposite pieces of information, negative and positive. We experiment the amount of time that is allowed for the positive information to be distributed with wide enough impact after the negative information and different selection strategies for positive source nodes. Our results enable the selection of a set of users based on a limited operating budget to start the spread of positive information as a measure to counteract the spread of negative information. Among different methods, we identify that both eigenvector and betweenness centrality are effective selection metrics. Furthermore, we quantitatively demonstrate that choosing a larger set of nodes for the spread of positive information allows for a wider window of time to respond in order to limit the propagation of negative information to a certain threshold.

Taking a Dive: Experiments in Deep Learning for Automatic Ontology-based Annotation of Scientific Literature

Conference paper

Prashanti Manda, Lucas Beasley, and Somya D Mohanty

In: International Conference on Biological Ontology 2018 (Aug. 2018).

Publication year: 2018

Text mining approaches for automated ontology-based curation of biological and biomedical literature have largely focused on syntactic and lexical analysis along with machine learning. Recent advances in deep learning have shown increased accuracy for textual data annotation. However, the application of deep learning for ontology-based curation is a relatively new area and prior work has focused on a limited set of models.

Here, we introduce a new deep learning model/architecture based on combining multiple Gated Recurrent Units (GRU) with a character+word based input. We use data from five ontologies in the CRAFT corpus as a Gold Standard to evaluate our model’s performance. We also compare our model to seven models from prior work. We use four metrics – Precision, Recall, F1 score, and a semantic similarity metric (Jaccard similarity) to compare our model’s output to the Gold Standard. Our model resulted in a 84% Precision, 84% Recall, 83% F1, and a 84% Jaccard similarity. Results show that our GRU-based model outperforms prior models across all five ontologies. We also observed that character+word inputs result in a higher performance across models as compared to word only inputs.

These findings indicate that deep learning algorithms are a promising avenue to be explored for automated ontology-based curation of data. This study also serves as a formal comparison and guideline for building and selecting deep learning models and architectures for ontology-based curation.

Enhancing Trip Distribution Prediction with Twitter Data: Comparison of Gravity and Neural Networks

Conference paper

Nastaran Pourebrahim, Selima Sultana, Jean-Claude Thill, and Somya D Mohanty.

In: 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems - ACM SIGSPATIAL 2018 (Sept. 2018).

Publication year: 2018

Predicting human mobility within cities is an important task in urban and transportation planning. With the vast amount of digital traces available through social media platforms, we investigate the potential application of such data in predicting commuter trip distribution at small spatial scale. We develop back propagation (BP) neural network and gravity models using both traditional and Twitter data in New York City to explore their performance and compare the results. Our results suggest the potential of using social media data in transportation modeling to improve the prediction accuracy. Adding Twitter data to both models improved the performance with a slight decrease in root mean square error (RMSE) and an increase in R-squared (R2) value. The findings indicate that the traditional gravity models outperform neural networks in terms of having lower RMSE. However, the R2 results show higher values for neural networks suggesting a better fit between the real and predicted outputs. Given the complex nature of transportation networks and different reasons for limited performance of neural networks with the data, we conclude that more research is needed to explore the performance of such models with additional inputs.

Data-Driven Exploration of Factors Affecting Federal Student Loan Repayment

Conference paper

Bin Luo, Qi Zhang, and Somya D Mohanty.

In: Proceedings of the 2018 International Conference on Data Science ICDATA’18, 2018 World Congress in Computer Science, Computer Engineering, & Applied Computing (June 2018).

Publication year: 2018

Student loans occupy a significant portion of the federal budget, as well as, the largest financial burden in terms of debt for graduates. This paper explores data-driven approaches towards understanding the repayment of such loans. Using statistical and machine learning models on the College Scorecard Data, this research focuses on extracting and identifying key factors affecting the repayment of a student loan. The specific factors can be used to develop models which provide predictive capability towards repayment rate, detect irregularities/non-repayment, and help understand the intricacies of student loans.

What’s hot and what’s not? - Exploring trends in bioinformatics literature using topic modeling and keyword analysis

Conference paper

Alex Haan, Somya D Mohanty, and Prashanti Manda

The International Symposium on Bioinformatics Research and Applications - 2017. Mar. 2017

Publication year: 2017

Scientists exploring a new area of research are interested to know the “hot” topics in that area in order to make informed choices. With exponential growth in scientific literature, identifying such trends manually is not easy. Topic modeling has emerged as an effective approach to analyze large volumes of text. While this approach has been applied on literature in other scientific areas, there has been no formal analysis of bioinformatics literature.

Here, we conduct keyword and topic model-based analysis on bioinformatics literature starting from 1998 to 2016. We identify top keywords and topics per year and explore temporal popularity trends of those keywords/areas. Network analysis was conducted to identify clusters of sub-areas/topics in bioinformatics. We found that “big-data”, “next generation sequencing”, and “cancer” all experienced exponential increase in popularity over the years. On the other hand, interest in drug discovery has plateaued after the early 2000s.

Reliable Assurance Protocols for Information Systems

Conference paper

Mahalingam Ramkumar and Somya D Mohanty

International Conference on the Evolving Internet.

Publication year: 2015

The assurances provided by an assurance protocol for any information system (IS), extend only as much as the integrity of the assurance protocol itself. The integrity of the assurance protocol is negatively influenced by a) the complexity of the assurance protocol, and b) the complexity of the platform on which the assurance protocol is executed. This paper outlines a holistic Mirror Network (MN) framework for assuring information systems that seeks to minimize both complexities. The MN framework is illustrated using a generic cloud file storage system as an example IS.

An Efficient TCB for a Generic Data Dissemination System

Conference paper

Arun Velagapalli, Somya D Mohanty, and Mahalingam Ramkumar

IEEE International conference on Communications in China, Communication Theory and Security Symposium. IEEE. 2012,

Publication year: 2012

Several applications fall under the broad umbrella of data dissemination systems (DDS), where providers and consumers of information rely on untrusted, or even unknown middle-men to disseminate and acquire data. This paper proposes a security architecture for a generic DDS by identifying a minimal trusted computing base (TCB) for middle-men and leveraging the TCB to provide useful assurances regarding the operation of the DDS. A precise characterization of the TCB is provided as a set of simple functions that can be executed even inside a severely resource limited trustworthy boundary. A core feature of the proposed approach is the ability of even resource limited modules to maintain an index ordered merkle tree (IOMT).

An Efficient TCB for a Generic Content Distribution System

Conference paper

Somya D Mohanty, Arun Velagapalli, and Mahalingam Ramkumar

Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2012 International Conference on. IEEE. 2012, pp. 5–12

Publication year: 2012

We consider the security requirements for a broad class of content distribution systems where the content distribution infrastructure is required to strictly abide by access control policies prescribed by owners of content. We propose a security solution that identifies a minimal trusted computing base (TCB) for a content distribution infrastructure, and leverages the TCB to provide all desired assurances regarding the operation of the infrastructure. It is assumed that the contents and access control policies associated with contents are dynamic.

Securing File Storage in an Untrusted Server-Using a Minimal Trusted Computing Base.

Conference paper

Somya D Mohanty and Mahalingam Ramkumar.

International Conference on Cloud Computing and Services Science (CLOSER), 2011, pp. 460–470

Publication year: 2011

In applications such as remote file storage systems, an essential component of cloud computing systems, users are required to rely on untrustworthy servers. We outline an approach to secure such file storage systems by relying only on a resource limited trusted module available at the server, and more specifically, without the need to trust any component of the server or its operator(s). The proposed approach to realize a trusted file storage system (TFSS) addresses some shortcomings of a prior effort (Sarmenta et al., 2006) which employs a merkle hash tree to guarantee freshness. We argue that the shortcomings stem the inability to verify non- existence. The TFSS described in this paper relies on index ordered merkle trees (IOMT) to gain the ability to verify non-existence.

Somya D. Mohanty

University of North Carolina - Greensboro

Publication Types:

A multi-modal machine learning approach towards predicting patient readmission.

What we learn from learning- Understanding capabilities and limitations of machine learning in botnet attacks

The Propagation of Counteracting Information in Online Social Networks: A Case Study.

Taking a Dive: Experiments in Deep Learning for Automatic Ontology-based Annotation of Scientific Literature

Enhancing Trip Distribution Prediction with Twitter Data: Comparison of Gravity and Neural Networks

Data-Driven Exploration of Factors Affecting Federal Student Loan Repayment

What’s hot and what’s not? - Exploring trends in bioinformatics literature using topic modeling and keyword analysis

Reliable Assurance Protocols for Information Systems

An Efficient TCB for a Generic Data Dissemination System

An Efficient TCB for a Generic Content Distribution System

Securing File Storage in an Untrusted Server-Using a Minimal Trusted Computing Base.