The Fordham Educational Data Mining Laboratory uses data mining and other analytical techniques to learn about student academic performance and the educational process. Thus far the majority of our research has been driven by two data sets: 1) undergraduate student course-grade enrollment data, where each record corresponds to one student in one course section with the earned grade, and 2) graduate admissions data for our department’s MS in Data Science and MS in Computer Science programs. Our research includes descriptive data mining related to the sequencing and the relationships between courses, and predictive data mining that involves predicting instructor effectiveness, future student performance, and suitable disciplines for a student to major in. We have published nine research articles since 2019. Students who are interested in this research should email one or more of the faculty co-directors of the lab.
Using student grade and enrollment records, we study the relationship among groups of courses within departments (e.g., Computer Science), within broad disciplines (e.g., STEM), and across the whole undergraduate program. study which courses are most commonly taken together. We use graph analysis to elucidate course pairs and groups of courses in which student performance is most highly-correlated , and to highlight course groups most commonly taken together . Single courses most connected to a variety of other courses, “hub nodes,” also are identified. We further identify common sequences for course enrollment  and study the effects of delays between one class and the next in common sequences .
Measuring instructor effectiveness is quite important and can be used to optimize instructor assignments to courses, inform promotion and tenure decisions, and to determine when to provide additional teaching resources and mentoring. Most existing instructor assessment methods rely on student surveys or peer assessments, which can easily be biased. Our research explored how we can use student performance in future courses to measure instructor effectiveness . In another study we used the same dataset to build a recommender system for undergraduate majors, which based its recommendations on how the student is likely to perform in the major, as well as how likely students with similar characteristics are likely to select that major . One project that did not use the course-grade enrollment data instead surveyed Computer Science undergraduate majors about their lifestyle and study habits and used that information to predict their academic performance as measured by their grade point average .
Predictive Modelling Using Admissions Data:
Predicting student success in a data science degree program is challenging due to the interdisciplinary nature of the field and the diverse backgrounds of the students. Our research applied machine learning models to assess applicants’ future academic performance in a Master of Data Science program using information from the admission applications . In a parallel study, we examined the viability of applying machine learning models to predict applicants’ GRE scores utilizing diverse information from the application materials, including undergraduate GPA, undergraduate major, and resume . For future work, we plan to develop predictive models and a free software tool to combat systematic racism, gender bias, and cultural bias in academic letters of recommendation (LOR). To this end, we will leverage natural language processing (NLP) and machine learning methods to identify the language in LORs that may be associated with bias. The software tool will aid the recommendation writer by highlighting potential instances of bias and suggesting alternatives.
 Y. Zhao, Q. Xu, M. Chen, G. Weiss “Predicting Student Performance in a Master’s Program in Data Science using Admissions Data,” International Conference on Educational Data Mining (EDM), 2020
 Y. Zhao, Z. Qi, S. Do, J. Grossi, J. Kang, G. Weiss ” Addressing Disparity in GRE-optional Admissions by Predicting GRE Performance Using Application Materials,” Under review.