Teaching

Teaching Philosophy

Fostering creativity in the minds of students is one of the most rewarding experiences in academia. Towards this end my, aim is to facilitate students by: (1) providing them the basic foundations of concepts, methods, and techniques; (2) encouraging them to be creative in the application of the basic foundations to unique problems; and (3) motivating them to challenge the foundations to develop novel approaches for new problems. I believe such a model will enable our future graduates to tackle a variety of scientific, social, and environmental challenges.

Present Teaching

Fall2017 - Present

CSC 405/605 - Data Science
The course is highly interactive, and will explore the theories, techniques, and the tools necessary to gain insights from such datasets. Using a problem-based learning philosophy, students are expected to make use of such technologies to design data solutions that can process and analyze real-world datasets for a variety of scientific, social, and environmental challenges. Syllabus

Department of Computer Science, University of North Carolina – Greensboro

PresentFall 2018

CSC 462/662 - Operating Systems
The aim of this course is to teach the concepts and principles of modern operating systems, and to provide opportunities to relate theoretical principles with operating system implementation. Specifically, learn about, processes / process management, concurrency / synchronization, memory management schemes, file systems, and secondary storage management.
Spring2017

CSC 490 - Senior Capstone
Application of fundamental knowledge and skills in computer science to develop software applications/projects capable of solving real-world problems. Syllabus

Department of Computer Science, University of North Carolina – Greensboro.
Spring2017

CSC 495/663 - Network Security
The course explores the topics of securing computer networks with the utilization of cryptographic authentication, communication, and transmission protocols. Syllabus

Department of Computer Science, University of North Carolina – Greensboro.
Fall2016

CSC 390 - Programming Languages
The course teaches students concepts of block-structured, object-oriented, functional, logic, and concurrent programming languages. Comparative study of syntactic and semantic features of these languages and writing programs using them. Syllabus

Department of Computer Science, University of North Carolina – Greensboro

Past Teaching

Fall2015

CSE 4990/6990: Big Data and Data Science
Department of Computer Science and Engineering, Mississippi State University
Spring2012

CSE 1384 - Intermediate Computer Programming - Python and C++
Lab Instructor, Department of Computer Science and Engineering, Mississippi State University.
Fall2011

CSE 1284 - Introduction to Programming Languages - Python
Lab Instructor, Department of Computer Science and Engineering, Mississippi State University.

Student Projects

CSC 405/605 - Data Science

Microsoft Academic Graph - Author Citation
Predict an author’s involvement in scholarly articles and journals

In the world of modern scientific scholarship, authorship attribution is vital to a text’s factual validity and a writer’s credibility. The ability for one to participate in the development of a field of study anonymously, regardless of criteria and experience, raises questions of academic reliability. To determine the identity or characteristics of writers of scholarly articles and journals, textual analysts utilize a variety of different methods, ranging from natural language processing to mathematical and algorithmic applications. With the development of “big data”, computer and data science, and machine learning, the concept of predicting the identity of published writers based on linguistic patterns and statistics alongside information gained from other datasets produced using textual analysis has become increasingly obtainable.

Using these computational techniques and analytical methods, the goal of this project is to determine the applicability and accuracy of certain techniques used in authorship prediction to a large academic dataset. The data used in this project is provided by the Microsoft Academic Graph (MAG), a source of information that has been utilized in projects such as Bing, Cortana, and Microsoft Word. With the MAG, certain metadata in the given data files will assist in the extraction of different features of academic publications such as names, titles, numerical identifiers, and abstractions. After extracting information from the MAG files, the data will be processed via the previously mentioned techniques, methods, and technologies in order to predict an author’s involvement in scholarly articles and journals.

Group Members

Matt Smitherman https://github.com/m-t-smith

Bhavana Yennam https://github.com/BhavanaYennam

Hao Zhang https://github.com/haozhang96

Justin Oakley https://github.com/jmoakle2

Eury Kim https://github.com/EuryKim1

https://github.com/UNCG-CSE/MAG_Author
Bioinformatics Trends
Predict the citation count of scientific papers
Centered on bioinformatics Scopus research papers’ data, our goal was to predict the citation count of a paper. This project involved four separate phases, each phase focused on Machine Learning and its improvement, due to how intimately Machine Learning is tied to our prediction success.The first phase of the project consisted of orienting ourselves with our data and what we want to do with it. Asking questions such as: what is the nature/origin of our data, what is the size of our data, what is our goal.

The second phase involved converting, organizing, and cleaning our data. Our data was originally in json format, so we had to convert from json to csv, handle null fields, handle lists. Ultimately, we prepared our data for the purposes of visualization and machine learning.

The third phase consisted of the visualization and understanding of our data, essentially educating ourselves on our data to make sound decisions for phase four (Machine Learning). We visualized and gained a big picture understanding of our target column, ‘Citations’. Then we went through each feature, one by one, trying to understand what relationship they may or may not have with ‘Citations’.

The fourth phase involved NLP processing and Machine Learning. The NLP techniques helped us create new features for Machine Learning to increase our accuracy. The NLP techniques included: removing special characters and stopwords, stemming, LDA (TFIDF), and encoding. For Machine Learning, we applied Random Forest and XGB. The best accuracy achieved was 74% using XGB with Log Transformation.

Tasks were divided/completed in two ways, depending on the context:

a) Group collaboration: we would reserve a room in the library and work on the project in unison.

b) Phase-by-phase division: we would divide up specific tasks for that particular phase, and then combine our efforts the next time we met. This division style only happened on occasion.

Members and Related Tasks:
- Mouna https://github.com/mounakalidindi
- Logan https://github.com/LoganHornbuckle
- Darpan https://github.com/djhawar
- Steve https://github.com/stevieclean
- Luke https://github.com/lukeusername
https://github.com/UNCG-CSE/Bioinformatics_Trends
Bat Echolocation
Identify and classify real bat calls according to the purpose of that call

This project aims to identify and classify real bat calls according to the purpose of that call, ranging from echolocation to mating. The calls are stored in Zero Crossing format; the data will have to be cleaned up as it contains a significant amount of noise. Once the data is cleaned, the bat calls will be clustered according to their shapes, and then classified for future scientific research. If all goes well, we will also be able to predict the nature of the calls based on metadata such as the time, location, and season that the calls were recorded in. The project is written in Python.

Members

Hadi Soufi

Yang Peng

Bety Rostandy

Thien Le

Kevin Keomalaythong

https://github.com/UNCG-CSE/Bat_Echolocation
Guilford County Financial Analysis
To inform and educate users about new bills, bills that have been made into law, and the legislative branch of government.

Introduction

Guilford County Financial Modeling is the idea for our project. We have data for Guilford County approved budgets from 2013-2018 and transactions from 2007-2018. The dataset available is financial data mainly including adopted/amended budget data and historical transaction data for the Guilford County. The idea is to help the Financial and Budget Department of Guilford County in maintaining their expected spending and transactions.

Gregory Purvine (gnpurvin) https://github.com/gnpurvin

Evan Crabtree (Crabtr) https://github.com/Crabtr

Rohit Gulia (rohit-gulia) https://github.com/rohit-gulia

Cody Cothern (Mask487) https://github.com/Mask487

Vincent Xiao(MrVinegar) https://github.com/MrVinegar

Goals

To be able to discover anomalies in the given financial data and be able to discover new anomalies as new data is added. Eventually, we would also like to be able to make financial forecasts and predictions based on the given data. We would like to be able to make predictions about upcoming transactions. If time allows, we would also like to be able to implement a method for predicting what transactions are likely to happen in the future. In the interest of keeping our project to a reasonable scope, we are focusing on anomaly detection. Our goal at this point is to develop a method for discovering anomalies in the past financial data and for catching new anomalies as new data comes in.

Objectives

In order to achieve these goals efficiently, we have divided the tasks among our team members like maintaining the data dictionary and documentation, finding the relationships in the data, finding the patterns in data in different departments, how the spending trends change with time, detecting anomalies or abnormalities in transactions and/or total actual spending as we move through the year.

https://github.com/UNCG-CSE/Guilford_Financial
MIMIC-III: Electronic Health Records
Predicting readmission in EHR data

MIMIC-III (Medical Information Mart for Intensive Care III) is an Electronic Health Records database containing the details of about 40,000 patients who were admitted as critical care patients. This database contains information on admission length, diagnoses, labs, medications, locations, chart notes, and more. The data spans a long range of time, as it was collected between the years of 2001 and 2012 at Beth Israel Deaconess Medical Center, located in Boston, MA.

MIMIC-III’s dataset is particularly interesting because of the unique quantity of information available as vitals were recorded about once an hour. This dataset is provided free from the MIT Lab for Computational Physiology.

Meet the Team

We are a team of five students from the University of North Carolina at Greensboro located in Greensboro, NC. We have one graduate student and four undergraduate students in the group formed in the Computer Science Data Science course offered in the Fall 2018 semester.

Graduate Student: Richard Powell
Undergraduate Students: Blair Gentry, Evan Rhoades, Maclean Fraizer, Spencer Whyatt

https://github.com/UNCG-CSE/MIMIC-EHR
Guilford County EMS
Analysis of Guilford County Emergency Medical Services
In this project, we are working with the Guilford County Emergency Medical Services, EMS, data sets which includes call records from the flowing agency:
- GCSD Guilford County Sheriff Department
- GCF Guilford County Fire Department
- EMS Emergency Medical Service
- ACO Animal Control
Upopn completion of the project, we should be able to
- Relationships between trending natures for calls and each agency; patterns between geographic area and leading calls’ natures; identification of predictors of emergency call categorization as “sick person”-later on if time permits.
- Correlate time of the call with the nature of the call, and understand the patterns in which the calls happen in terms of time as hours, days, and months.
- Response reaction time based on nature.
- Nature and matching priority(priority always based on operator, is it over time changed).
Enterprise Asset Management Analysis
City of Greensboro Enterprise Asset Management Analysis

An EAM system stands for enterprise asset management and it basically is the process of managing the lifecycle of physical assets to help save money and improve quality and efficiency. They are mostly used as a desktop software (Ex. spreadsheets) for big industries. The project will oversee over 900,000 assets, systems and positions created by the City of Greensboro over the span of 10 years. Dataset contains various tables in sql server. Some of the important tables are R5EVENTS: shows the data of events took place with details like location, date created, event description and event unique codes. R5BOOHOURS:contains the information of hours worked on event,cost and information about the person who worked on that particular event. R5OBJECT: has the data and information about the assets which can be found in detail in R5CLASSES table

Members:
Syed Shah: shshah2
Garret Mostella: Awesomesauce1256
Jonathan Langston: JTLangston96
Phillip Jones: FiliusRomanus

https://github.com/UNCG-CSE/EAM-COG

CSC 490 - Senior Capstone

NostraDomicile
Preditive model for home sale
The goal of the NostraDomicile Project is to create a web application whose two main functions are to predict whether a house will sell in a specific area based on the home’s attributes, and given a zip code, what are the most important factors leading to a sale in that area.

NostraDomicile will accomplish this goal by retrieving and storing housing market information using a Zillow API and MySQL database, using machine learning to evaluate housing data and determine factors influencing home sales in a particular area, and creating a user-friendly interface for users to view data about factors influencing home sales and create data visualizations about houses on the market based on user preferences

Team:
- Richard Andrews
- Ochaun Marshall
- Christian Simaan
- Jeremy Hutton
Link: https://github.com/nreader72/NostraDomicile
Prioritize
Priority-based personal assistant application
Prioritize is a priority-based personal assistant application created to help keep track and organize daily, weekly, and monthly tasks. It has been designed specifically with ease of use in mind, to simplify and streamline daily tasks. Prioritize differentiates itself from other calendar based applications with it’s signature priority system, which will determine when Prioritize will notify the users of their events.

Prioritize will utilize the Google Drive API, which will allow syncing of files between systems, as well as both online and offline functionality. Reminders will be stored in a SQLlite database locally, as well as in a JSON format for sharing between devices. Lookups into the database won’t need to happen often, it is simply to store a list of reminder objects. Google Drive will handle the syncing aspect for us. If the user deletes the application, their data will remain linked to their Google Drive account unless deleted from there as well. This is beneficial, if a catastrophe occurs, their data will remain safe, provided they have synced with the Drive. In addition to data storage, Google Drive is beneficial in the security aspect as well, as it handles security and ecruption of data when being synced.

Team:
- Joel Wilhelm
- Shahrukh
- Wajahat
- Cody Jones
Link: https://github.com/cejones9/Prioritize
Ketch
Facilitate transactions between two local parties for trading goods and services
Ketch is an iOS mobile app designed to facilitate transactions between two local parties for trading goods and services. It will utilize geolocation to create a local market derived from nearby users and allow them to buy and sell goods in a local marketplace. The program will emphasize security foremost, forcing the users to validate their identity, provide a secure transaction system, user reviews, and a safe meeting space for the actual transaction. The end result will be a more secure, safer way to buy and sell goods with far less fraudulent transactions occurring to the users.

Team:
- Sawyer Beaton
- Connor Butler
- Patrick Carder
- Keaton Greason
Link: https://github.com/connorbutler44/Ketch
Citizen
To inform and educate users about new bills, bills that have been made into law, and the legislative branch of government.
To reach the most users possible, this project aims to be platform agnostic and consists of both a website and mobile application. The mobile application and website will mirror each other very closely – each providing the same functionality and information. Users will be able to view information about bills, laws, legislators, filter information based on their location, and log in via Facebook. Providing service, a “backend”, will manage data acquisition from API endpoints and data queries from mobile application/website. This backend will also manage user interactivity to include user preferences, accounts, and an internal “voting” system which allows users to express support for various legislature.

Team:
- Charles Mayse
- Matthew Marley
- Matthew Rubacky
- Muhammed Akbay
Link: https://github.com/cnmayse/citizenapp
Dungeon of the Mighty Titan - Vanquish (DMT-V)
A multiplayer video game built using Unreal Engine 4 that allows for 3 to 5 players to fight AI controlled monsters in a dungeon-like setting.
Over the years, there has been a surge in popularity of massively multiplayer online games. One prominent feature of these games is the dungeon raiding mechanics they employ. Players form groups to fight hordes of enemies and powerful bosses as they travel through dungeons in order to obtain rare items. However, players will usually have to play for many hours before they can experience these adventures, as they require high-leveled and well-equipped characters to survive. With our game, players will be able to form a group and raid dungeons right from the beginning.

Team:
- David Bond
- Matthew Yengle
- Timothy Canipe
- Vishal Bhatt
Link: https://github.com/myengle/DMT-V
StudyBuddy
The Study Buddy application will allow students in a particular class to communicate with other students in the same class.
When it comes to classes and absences, some people would rather be without the day’s information than to find someone willing to share their notes. This application aims to aide in that process by eliminating the face to face interaction, while making it easier to find those who were in class and are willing to share notes. Users will be able to go to the website and download the software. Once the software has been installed, the user will then be able to log into the server and enter codes corresponding to classes in which they are enrolled. The user will have the option to join a specific class chatroom and see all of the other students in the class. They will have the ability to speak to everyone publicly or be able to send private messages. The user will have the ability to view pictures provided by other students containing notes from the day’s lecture or personal notes designed for the aiding of remote study sessions.

Team:
- Jaee Carr
- Tony Ratliff
- Trayvon McKnight
Link: https://github.com/Metalaxe1/StudyBuddy

CSE 4490/6990: Big Data and Data Science

Film Analytics
Predicting Movies Rating Using IMDb Data
Film Analytics is a project aimed at predicting iMDB movie ratings using covariates like genre(s), actor(s), director(s), and plot keyword(s). The implemented predictor system uses and compares two regression algorithms—linear regression and ordered logistic regression (also known as ordinal logit)—for predicting the outcome rating and the predictive improvements, bringing insights about the data. We have also analyzed the trends occuring in the iMDB movie data (obtained from OMDb web API) as well as the web-scraped box office data from iMDB itself. Incorporating naive Bayes classification method, the maching learning model also has the ability to include certain plot keywords as features for predictive analysis.

Team:
Link: https://github.com/somyamohanty/Data_Dawgs
Pixel Dawgs
Image Classifier
The main idea behind this study is to create an image classification algorithm that will read the image and produce several tags. These tags will be generated using image-processing tools like SLIC (Simple Linear Iterative Clustering), SOBEL filtering, and DBSCAN (Density Based Spatial Clustering of Applications with Noise). Initially ~2000 images will be used to train the code using machine learning algorithm. Later the classifier will predict tags on the test data. Each tag will then represent a certain layer of the image. Out of all the tags created top 5 tags will be used to define the image with % accuracy.

Team:
- Manish Borse
- Nick Rosetti
- Veera Karri
- Shreya Gupta
Link: https://github.com/somyamohanty/Pixel_Dawgs
Clouded Minds
Twitter Event Detector
Due to the nature of Twitter, tweets are able to provide real-time data for various world occurrences. We decided to focus on sports, since we believed that the tweets posted by the users watching sporting events can provide valuable insight to key events. Due to the wide range of sports that exist in modern world, we decided to limit our research to the most commonly followed game, soccer. In order to accomplish this, tweets were collected from Twitter’s streaming API and placed into a MongoDB in which tweets were filtered by official game hashtags and team names. By performing a post-hoc analysis of spikes in the volume of tweets, we were able to detect match events such as goal scored, the scorer’s name, the scorer’s team, and penalties throughout the game. Likewise, we tested the use of sentiment analysis to further correlate the relationship of the tweets and overall match progress.

Team:
- Deepak Gautam
- Sandra Lee Gibson
- Michael S Nichols
- William Wheeler Carter
Link: https://github.com/somyamohanty/Clouded_Minds
Predicting PlayStore Rating for Apps
Machine learning for understanding appstore utilization
Google Play Store is the most popular hub for android applications. With the increasing number of applications being uploaded in the store everyday, users prefer to download apps based upon their previous ‘Ratings’. The higher the ratings, it could be assumed that the application is more reliable and popular. Being in a competitive world of building android applications, it is the goal of all developers to maximize or increase the ratings of their applications. We have developed a system which would help developers to predict the rating of their newly uploaded applications based on their ‘Attributes’. We have retrieved a public data set of existing play store applications till August 2015. Several attributes of these applications like applications size, name, category, description, price etc have been used to build a training model that will be able to predict ratings for new applications. To build this training model we have used standard machine learning regression and classification methods. The learning performance of the model has been tested on the existing applications that already have their ratings. This model will be useful towards making continuous improvements of the new applications to boost their popularity.

Team:
- Naresh Adhikari
- Lucas Andrade Ribeiro
- Ayush Raj Aryal
- Daniel John Sween
- Naila Bushra
Link: https://github.com/somyamohanty/5DBMinds

Somya D. Mohanty

University of North Carolina - Greensboro

Teaching

Teaching Philosophy

Present Teaching

CSC 405/605 - Data Science

CSC 462/662 - Operating Systems

CSC 490 - Senior Capstone

CSC 495/663 - Network Security

CSC 390 - Programming Languages

Past Teaching

CSE 4990/6990: Big Data and Data Science

CSE 1384 - Intermediate Computer Programming - Python and C++

CSE 1284 - Introduction to Programming Languages - Python

Student Projects

CSC 405/605 - Data Science

Microsoft Academic Graph - Author Citation

Bioinformatics Trends

Bat Echolocation

Members

Guilford County Financial Analysis

Introduction

Goals

Objectives

MIMIC-III: Electronic Health Records

Meet the Team

Guilford County EMS

Enterprise Asset Management Analysis

CSC 490 - Senior Capstone

NostraDomicile

Prioritize

Ketch

Citizen

Dungeon of the Mighty Titan - Vanquish (DMT-V)

StudyBuddy

CSE 4490/6990: Big Data and Data Science

Film Analytics

Pixel Dawgs

Clouded Minds

Predicting PlayStore Rating for Apps