Thesis Defense Announcement for Archit Harsh
06/23/16 at 1:00 PM

June 16, 2016

Dear Faculty, graduate and undergraduate students,

You are cordially invited to my Masters thesis defense.

Title: Automatic K-Expectation-Maximization (K-EM) clustering algorithm for data-mining applications

When: Thursday, June 23, 2016 at 1:00 PM

Where: Simrall Hall, Room 228

Candidate: Archit Harsh

Degree: Masters, Electrical and Computer Engineering

 

Committee:

 

Dr. John E. Ball

Assistant Professor of Electrical and Computer Engineering (Major Professor)

 

Dr. Nicolas H. Younan

Professor of Electrical and Computer Engineering (Committee Member)

 

Dr. Mahalingam Ramkumar

Associate Professor of Computer Science and Engineering (Committee Member)

 

Abstract:

A non-parametric data clustering technique for achieving efficient data-clustering and improving the number of clusters is presented in this thesis. Specifically, two methods are proposed: Automatic K-Means and K-Expectation-Maximization (K-EM). The computational task of classifying the data set into k clusters is often referred to as k-clustering. K-Means and Expectation-Maximization algorithms have been widely deployed in data-clustering applications in relational databases. Result findings in related works studied in the literature revealed that both these algorithms have been found to be characterized with shortcomings. K-Means does not guarantee convergence and the choice of clusters heavily influence the results. Expectation-Maximization’s premature convergence does not assure the optimality of results and as with K-Means, the choice of clusters influences the results. To overcome the shortcomings, a fast automatic K-EM algorithm is developed and implemented that could both guarantee convergence and optimality of results. As an advantage of a non-parametric clustering technique, the proposed method provides the optimal number of clusters by utilizing various internal cluster validity metrics. thereby making it independent of the choice of clusters and provides  and unbiased results. The algorithm is implemented on a wide array of data-sets including real and synthetic data sets to validate the accuracy of the results and efficiency of the algorithm.

 

Best Regards,

 

Archit Harsh