CIS Department Talk - March 22, 2007

The Department of Computer and Information Science & The Society of Computer Science Present

Speaker:Yanjun Li, Wright State University
Topic:High Performance Text Clustering Algorithms
Date:Thursday March 22, 2007 at 1:00pm
Place:John Mulcahy Hall, Room 342


Text mining generally refers to the process of deriving high quality information from text. As most information (over 80%) is currently stored as text, text mining is believed to have a high commercial value. As one of important text mining methods, text clustering is known as unsupervised, automatic grouping of text documents into conceptually meaningful clusters, so that documents within a cluster have a high similarity among them, but they are dissimilar to documents in other clusters. Text clustering could be used to improve the performance of search engines, provide business intelligence solutions, etc.

In this talk, first I will give an overview of text clustering. Then, I will present my sequential and parallel high performance text clustering algorithms which target the unique characteristics of unstructured text database. My research focuses on the improvement of the performance of text clustering. I investigated the text clustering algorithms in four aspects: document representation, documents closeness measurement, reduction of high dimension, and parallelization. The experiment results show that the performance of our proposed algorithms is better than those of existing algorithms in terms of clustering accuracy. Finally, I will give conclusion and talk about my future work.


Yanjun (Lisa) Li received a B.S. degree in Economics from the University of International Business and Economics, Beijing, China; a B.S. degree (Summa Cum Laude) in Computer Science from Franklin University, Columbus, OH; and the M.S. degree in Computer Science from Wright State University, Dayton, OH. Currently, she is a Ph.D. candidate in the department of Computer Science and Engineering at Wright State University as a DAGSI scholarship student and expects to graduate in June, 2007. Her research interests include data mining and knowledge discovery, text mining, information retrieval, Ontology, bioinformatics analysis, parallel and distributed computing.

For more information, contact:
Ms. Diane Roche (718) 817-4480

