Personalized Information Recommendation based on Web User Profiles
Abstract
The amount of electronic information available in the World Wide Web is growing rapidly. Although this information is essential for many of our daily operations, its enormous quantity makes it challenging and time consuming to retrieve specific data. Recently a lot of research has been done on personalized applications, which are able to tailor the information presented to individual users. The main goal of these personalized systems is to learn the users needs without asking for it explicitly. This research area is widely known as user profiling. In this thesis we present a novel web personalization system that constructs such user profiles in order to recommend previously unseen information of possible interest for the particular users. Our developed recommendation system is expected to automatically provide valuable information based on the user interests learned before. Several Data Mining (DM) and Information Retrieval (IR) techniques, like content-based clustering and text pre-processing methods, have been applied to build the initial user profile. Due to the fact that this approach mainly focuses on short-time interests, the user model is continuously updated according to the changes of interests over time. For the recommendation task, the common Vector Space Model (VSM) as well as the WordNet dictionary is employed to compare the established user model with the new unseen document collection. Accordingly, the comparison is carried out on both lexicographic and semantic level. In order to evaluate our approach, a prototype system has been implemented in Java programming language. By means of various experiments, we could demonstrate that the newly developed personalized recommendation system performs satisfying under certain conditions. However, further research is necessary to improve the recommendation of user related documents that are composed of untrained vocabulary. The suggested semantic analysis based on WordNet still offers various unutilized possibilities for improvement.