Guangyuan's Research and Development Blog: UMAP

Showing posts with label UMAP. Show all posts

UMAP2016 Travel Report

This week, I attended the 24th Conference on User Modeling, Adaptation and Personalization (UMAP 2016). This year, it held in conjunction with Hypertext Conference sharing some sessions (e.g., Doctoral Consortium, Keynote speakers). Overall, there were around 130 participants for the conference. This year, the conference received 123 submissions with a 28 % acceptance rate.

#umap2016 review process and stats by @laroyo and @julitav pic.twitter.com/N4ZWFhaXBA
— UMAP (@UMAPconf) July 13, 2016

A major change in this year was the presentation format. Different from previous years, we present 13 mins (long), 8 mins (short) with a poster session to receive more audiences and discussions.

Keynote Speakers:

The first keynote speaker was Hossein Derakhshan: Killing the Hyperlink, Killing the Web: the Shift from Library-Internet to Television-Internet.

The speaker is an Iranian-Canadian blogger who was imprisoned in Tehran from November 2008 to November 2014. He is credited with starting the blogging revolution in Iran and is called the father of Persian blogging by many journalists.

“The Web We Have to Save” by @h0d3r https://t.co/Nk5wCYHHAm #umap2016
— Guangyuan Piao (@parklize) July 13, 2016

Some impressive phrases during the speech:

- Many internet users in Brazil and India think Facebook is the Internet
- With 150 "likes", Facebook can know better about you than your parents, with 300 likes, the service can know better you than your spouse

The second speaker Lada Adamic, who is leading the Product Science group within Facebook's Data Science Team.

The speaker described three large-scale analyses of re-share cascades on Facebook, which were performed in aggregate using de-identified data.

Now @ladamic telling as about dynamics of information sharing as meme mutation/variation #umap2016 pic.twitter.com/gwRrsEKO99
— UMAP (@UMAPconf) July 14, 2016

Summaries of the speech:

- Cascades grow

- Cascades recur

- Cascades evolve

The third speaker Sandra Carberry, who is one of the founders of the User Modeling research area at the first woskshop in Maria Laach, 1986, gave a talk on "User Modeling: the Past, the Present and the Future".

Prof. Sandra Carberry' keynote "User Modeling: the Past, the Present and the Future" #umap2016 pic.twitter.com/lxiaJXVsYC
— Julita Vassileva (@julitav) July 15, 2016

--------------------------------------------------------------------------------------------------------------------------

I was there to present a short paper, a doctoral consortium paper and an extended abstract.

Short Paper

UMAP2016 - Analyzing Aggregated Semantics-enabled User Modeling on Google+ and Twitter for Personalized Link Recommendations from GUANGYUAN PIAO

Doctoral Consortium

Several Twitter & RecSys papers at the HT+UMAP DC session #umap2016 #acmht16 room 140 pic.twitter.com/PiC67Mirfd
— UMAP (@UMAPconf) July 13, 2016

In the Doctoral Consortium, each student was assigned an expert in your topic. Tsvika Kuflik, who is on the editorial board of UMUAI, was my mentor during the conference and offered many constructive feedbacks about my thesis.

Extended Abstract

This preliminary work describes a first step of user modeling using different fields of LinkedIn profiles to investigate which field of LinkedIn profiles can be helpful for user modeling in the context of MOOC recommendations.

UMAP2016EA - Analyzing MOOC Entries of Professionals on LinkedIn for User Modeling and Personalized MOOC Recommendations from GUANGYUAN PIAO

Many audiences asked about data collection. We used Google Customized Search Engine to search the LinkedIn website using a specific keyword like "coursera" to filter out LinkedIn profiles containing Coursera courses. For the details about the dataset, you can check the post here.

-----------------------------------------------------------------------------------------------------------------------

Impressively, the proceedings of UMAP 2016 have been available during the conference.

UMAP2016EA

Analyzing MOOC Entries of Professionals on LinkedIn for User Modeling and Personalized MOOCRecommendations

[UMAP2016 submission by Guangyuan Piao and John G. Breslin]

About

This post provides supplemental material and information about the poster "Analyzing MOOC Entries of Professionals on LinkedIn for User Modeling and Personalized MOOC Recommendations: a first look". Available online:

Poster:
UMAP2016EA - Analyzing MOOC Entries of Professionals on LinkedIn for User Modeling and Personalized MOOC Recommendations from GUANGYUAN PIAO

Dataset

name	number of records	description
users.sql	5668	5668 learner profiles from LinkedIn who have been taken any Coursera MOOCs
coruseRecordsV1.sql	15744	course records extracted from user profiles
eduExperience.sql	11085	educational experience of learners
workExperience.sql	32801	work experience of learners
skills.sql	159291	skills of learners

Descriptive statistics: the dataset is about analyzed MOOC learner profiles from LinkedIn, which consists of 15,744 MOOC entries from 5,668 professionals. Each professional took 3 courses on average with the majority of learners (87%) having less than or equal to 5 MOOCs. Interestingly, the learner with the largest number of MOOCs had 114 of them. The distribution of genders and degrees of learners is as below:

If we assume that course entries in LinkedIn are courses that have been completed by users, the distribution of degrees are similar to the study [1] which provides the distribution of learners who completed their course.

Verified certifications: Instead of just taking MOOCs on Coursera and getting statements of accomplishment, learners can also purchase verified certifications for some courses that meet certain criteria. A verified certification provides proof that learners have completed their online courses. In such cases, varied certifications can also be added to LinkedIn parallels with their varied serial numbers. We found that around 26% of certifications in our collected profiles are verified while 74% of the certifications are unverified.

Course tracks. We found that course tracks can be identified by exploring learning activities of users in the OSN. Formally, we can define a course track as a set of courses that were taken together more than n times where n is a threshold. The course relationships can be represented by weighted undirected networks like in the figure below.

Nodes denote courses and the ties among courses denote the frequency of two courses taken together. In this context, a course track is a clique (or complete graph that has an edge joining each pair of nodes) within the course relationships network, with the weight of each tie in the clique is higher than the threshold n. Course tracks can be constructed based on the cliques within the course relationships network. As one might expect, the higher of the value n, the stronger the relationships a course track must hold with less number of cliques meeting the criteria. Indeed, 60 maximal cliques (a clique in maximal if it cannot be extended to a larger clique) can be found with a threshold of 10 while 16 maximal cliques can be found with a threshold of 20.

We evaluated these tracks and found that two of the course tracks provided by Coursera can be identified in those cliques through this approach. A course track, called a specialization in Coursera, is a targeted sequence of courses from an institution taken together to earn a specialization certificate. The first course track from Coursera is a specialization of "Data Science" which consists of 9 courses from Johns Hopkins University, and the second one is a specialization of "Business Foundations" provided by the University of Pennsylvania. In practice, these ground truth course tracks can also be used for identifying the threshold n, which is the highest value that does not break the ground truth course tracks. In our case, 27 maximal cliques can be found including the two golden truth course tracks with the value of 13 for the threshold. Interestingly, when we look at the maximal clique that contains the "Data Science" course track (Figure 2), we found that "Machine Learning", "Introduction to Data Science" and "Computing for Data Analysis" are also being taken frequently with 9 courses in the "Data Science" course track in practice. This indicates that new course tracks can be constructed on top of existing tracks by exploring learning activities of users from the OSN.

[1]. T. Balch. MOOC student demographics. Retrieved Apr, 28:2013, 2013.

UMAP2016S

Analyzing Aggregated Semantics-enabled User Modeling on Google+ and Twitter for Personalized Link Recommendations

[UMAP2016 submission by Guangyuan Piao and John G. Breslin]

About

This post provides supplemental material and information about the paper "Analyzing Aggregated Semantics-enabled User Modeling on Google+ and Twitter for Personalized Link Recommendations".

Abstract

In this paper, we study if reusing Google+ profiles can provide reliable recommendations on Twitter to resolve the cold start problem. Next, we investigate the impact of giving different weights for aggregating user profiles from two OSNs and present that giving a higher weight to the targeted OSN profiles for aggregation allows the best performance in the context of a personalized link recommender system. Finally, we propose a user modeling strategy which combines entity- and category-based user profiles using with a discounting strategy. Results show that our proposed strategy improves the quality of user modeling significantly compared to the baseline method.

Slides:
UMAP2016 - Analyzing Aggregated Semantics-enabled User Modeling on Google+ and Twitter for Personalized Link Recommendations from GUANGYUAN PIAO

About.me Dataset

Users tend to have multiple social identities in different OSNs [1]. To retrieve the ground truth data (i.e., users who are using both Google+ and Twitter), we obtained OSN accounts of users from about.me. About.me is a personal web hosting service, which offers registered users a simple platform from which to link multiple online identities, relevant external sites (e.g., personal homepage), and popular OSNs such as Facebook, Twitter, Google+ etc. We started from a set of randomly returned about.me accounts retrieved from about.me API15 and then gradually extended this set in a snowball manner. After all, we crawled 247,630 public profiles pages from about.me during December 2014 that have at least two external links. Two irrelevant external links to OSN identities (i.e., relevant external sites and RSS feeds that users added) were removed.

Figure 1. OSN co-occurring network in about.me dataset

As a result, there are 29 different communities in our dataset (see Figure 1). In Figure 1, the ties between OSNs show the co-occurrence frequency of two social networks in the profile pages of users.

The portion of users having three OSNs is the highest (22%) followed by 20% and 18% for those having four and two social networks, respectively. Over half (60%) of people have 2-4 social networks and each person participates in 4.48 OSNs on average. In our dataset, the number of different OSNs (29) and the average number (4.48) that each person participates in are both higher than the numbers from the previous study [14], which are 15 and 3.92 respectively.

Dataset for our study

As we were interested in analyzing aggregated user profiles from Twitter and Google+, we randomly selected 480 active users from about.me dataset who had been using both OSNs. We extracted their UGC from Twitter and Google+ as well as all links shared with those UGC using our user modeling framework. All DBpedia entities within UGC and those within the content of each link were retrieved using the framework. The numbers of entities extracted from Twitter and Google+ profiles of users are displayed in Figure 3. As we can see from the figure, a greater number Of entities can be extracted from Google+ activities.

Figure 2. The number of entities extracted from Twitter and Google+ profiles of users

References

[1]. J. Liu, F. Zhang, X. Song, Y.-I. Song, C.-Y. Lin, and H.-W. Hon. What's in a name?: an unsupervised approach to link users across communities. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 495-504. ACM, 2013.

Review of 2015

Research

The year 2015 is finished and I've been in Ireland and Insight for 1 year and 6 months. Doing research here gives me quite different experience than previous research experience, and it poses good opportunities as well as challenges for myself.

Independent:

You have to grow up and be able to do your research (not projects) with your ideas and opinions, and conduct experiments by your own. I remembered the seminar I participated at the beginning of the PhD journey and the speaker described our academic supervisor as advisor since it is more appropriate. That means our advisor is who giving advices for your research but not who telling you every step you should move forward, and usually our supervisors also too busy to do so.

At first, I could not start own research and conduct experiment by myself, and there was always uncertainty about myself and I realized the way I've been trained always was "supervised" by others. It reminds me the time in South Korea when I was a master student as well as an employee in a company where I received a lot of things to-do every day from senior members. In contrast, I did not receive any call here, and all communication has been done through emails which is still a surprising fact for me. Thanks to God, even I have a lot of improvements to achieve, I have started the research, with advice from my supervisor.

I started to recognize the statements from (So you want to do a PhD from Open University) that a PhD is confirming your "research independence", i.e., you have to demonstrate that:

Ability to do research by yourself, rather than simply doing what your supervisor tells you
Awareness of where your work fits in relation to the discipline, and what it contributes to the discipline
Mature overview of the discipline

Insight Centre:

There have been many changes for Insight@Galway which was formally well known as DERI. Our former director Prof. Steffan Decker moved to Germany and we have new director Prof. Dietrich Rebholz-Schuhmann. Interestingly, many researchers, including PhD students, Postdocs moved to Germany as well. There are many career paths for graduates from here including academic positions as well as industry ones, or even some of them start running own startups etc.

Conference

After several attempts for conferences, I've published two full papers in JIST2015 and SAC2016, and I found that it is really important to publish or try to publish your results in any conference or journal to get started, and get feedbacks from the experts. In my previous experience, I've been recommended do not to read and present a conference paper for a seminar during previous studies. However, here, one thing I love is that top conferences have the same importance to top journals. There is an interesting article to read if you have the same wondering: https://homes.cs.washington.edu/~mernst/advice/conferences-vs-journals.html
At the end of the year, I submitted a paper to ESWC2016 which has very interested tutorials for me http://2016.eswc-conferences.org/program/workshops-tutorials and hope I will have an opportunity to attend it:). Another conference I'd like to participate is UMAP2016 which is also highly related to my research. So... Fingers crossed for the upcoming new year.

UMAP2016 Travel Report

The first keynote speaker was Hossein Derakhshan: Killing the Hyperlink, Killing the Web: the Shift from Library-Internet to Television-Internet.

UMAP2016EA

Analyzing MOOC Entries of Professionals on LinkedIn for User Modeling and Personalized MOOCRecommendations

About

Poster: UMAP2016EA - Analyzing MOOC Entries of Professionals on LinkedIn for User Modeling and Personalized MOOC Recommendations from GUANGYUAN PIAO

Dataset

UMAP2016S

Analyzing Aggregated Semantics-enabled User Modeling on Google+ and Twitter for Personalized Link Recommendations

About

Abstract

Slides: UMAP2016 - Analyzing Aggregated Semantics-enabled User Modeling on Google+ and Twitter for Personalized Link Recommendations from GUANGYUAN PIAO

About.me Dataset

Dataset for our study

References

Review of 2015

Independent:

Insight Centre:

Conference

Poster:
UMAP2016EA - Analyzing MOOC Entries of Professionals on LinkedIn for User Modeling and Personalized MOOC Recommendations from GUANGYUAN PIAO

Slides:
UMAP2016 - Analyzing Aggregated Semantics-enabled User Modeling on Google+ and Twitter for Personalized Link Recommendations from GUANGYUAN PIAO