UMAP2016S

Analyzing Aggregated Semantics-enabled User Modeling on Google+ and Twitter for Personalized Link Recommendations


About


This post provides supplemental material and information about the paper "Analyzing Aggregated Semantics-enabled User Modeling on Google+ and Twitter for Personalized Link Recommendations"


Abstract


In this paper, we study if reusing Google+ profiles can provide reliable recommendations on Twitter to resolve the cold start problem. Next, we investigate the impact of giving different weights for aggregating user profiles from two OSNs and present that giving a higher weight to the targeted OSN profiles for aggregation allows the best performance in the context of a personalized link recommender system. Finally, we propose a user modeling strategy which combines entity- and category-based user profiles using with a discounting strategy. Results show that our proposed strategy improves the quality of user modeling significantly compared to the baseline method.

Slides:



About.me Dataset


Users tend to have multiple social identities in different OSNs [1]. To retrieve the ground truth data (i.e., users who are using both Google+ and Twitter), we obtained OSN accounts of users from about.me. About.me is a personal web hosting service, which offers registered users a simple platform from which to link multiple online identities, relevant external sites (e.g., personal homepage), and popular OSNs such as Facebook, Twitter, Google+ etc. We started from a set of randomly returned about.me accounts retrieved from about.me API15 and then gradually extended this set in a snowball manner. After all, we crawled 247,630 public profiles pages from about.me during December 2014 that have at least two external links. Two irrelevant external links to OSN identities (i.e., relevant external sites and RSS feeds that users added) were removed.

Figure 1. OSN co-occurring network in about.me dataset
As a result, there are 29 different communities in our dataset (see Figure 1). In Figure 1, the ties between OSNs show the co-occurrence frequency of two social networks in the profile pages of users.

The portion of users having three OSNs is the highest (22%) followed by 20% and 18% for those having four and two social networks, respectively. Over half (60%) of people have 2-4 social networks and each person participates in 4.48 OSNs on average. In our dataset, the number of different OSNs (29) and the average number (4.48) that each person participates in are both higher than the numbers from the previous study [14], which are 15 and 3.92 respectively.


Dataset for our study 


As we were interested in analyzing aggregated user profiles from Twitter and Google+, we randomly selected 480 active users from about.me dataset who had been using both OSNs. We extracted their UGC from Twitter and Google+ as well as all links shared with those UGC using our user modeling framework. All DBpedia entities within UGC and those within the content of each link were retrieved using the framework. The numbers of entities extracted from Twitter and Google+ profiles of users are displayed in Figure 3. As we can see from the figure, a greater number Of entities can be extracted from Google+ activities.

Figure 2. The number of entities extracted from Twitter and Google+ profiles of users



References


[1]. J. Liu, F. Zhang, X. Song, Y.-I. Song, C.-Y. Lin, and H.-W. Hon. What's in a name?: an unsupervised approach to link users across communities. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 495-504. ACM, 2013.