CIS Department Talk - March 22, 2007
The Department of Computer and Information Science & The Society of
Computer Science Present
Speaker: | Yanjun Li, Wright State University
|
Topic: | High Performance Text Clustering Algorithms
|
Date: | Thursday March 22, 2007 at 1:00pm |
Place: | John Mulcahy Hall, Room 342 |
Abstract:
Text mining generally refers to the process of deriving high quality
information from text. As most information (over 80%) is currently stored
as text, text mining is believed to have a high commercial value. As one of
important text mining methods, text clustering is known as unsupervised,
automatic grouping of text documents into conceptually meaningful clusters,
so that documents within a cluster have a high similarity among them, but
they are dissimilar to documents in other clusters. Text clustering could
be used to improve the performance of search engines, provide business
intelligence solutions, etc.
In this talk, first I will give an overview of text clustering. Then, I
will present my sequential and parallel high performance text clustering
algorithms which target the unique characteristics of unstructured text
database. My research focuses on the improvement of the performance of text
clustering. I investigated the text clustering algorithms in four aspects:
document representation, documents closeness measurement, reduction of high
dimension, and parallelization. The experiment results show that the
performance of our proposed algorithms is better than those of existing
algorithms in terms of clustering accuracy. Finally, I will give conclusion
and talk about my future work.
Bio:
Yanjun (Lisa) Li received a B.S. degree in Economics from the University of
International Business and Economics, Beijing, China; a B.S. degree (Summa
Cum Laude) in Computer Science from Franklin University, Columbus, OH; and
the M.S. degree in Computer Science from Wright State University, Dayton,
OH. Currently, she is a Ph.D. candidate in the department of Computer
Science and Engineering at Wright State University as a DAGSI scholarship
student and expects to graduate in June, 2007. Her research interests
include data mining and knowledge discovery, text mining, information
retrieval, Ontology, bioinformatics analysis, parallel and distributed
computing.
For more information, contact:
Ms. Diane Roche (718) 817-4480; (roche@cis.fordham.edu)
|