Guangyuan's Research and Development Blog: Semantic Web

Showing posts with label Semantic Web. Show all posts

SPARQL: FILTER NOT EXISTS and MINUS

I was wondering the difference between FILTER NOT EXISTS and MINUS, and found out a great answer here

The difference between FILTER NOT EXISTS and MINUS is related to the two styles of negation used by SPARQL. According to the specification:

The SPARQL query language incorporates two styles of negation, one based on filtering results depending on whether a graph pattern does or does not match in the context of the query solution being filtered, and one based on removing solutions related to another pattern.

Still according to the specification:

NOT EXISTS and MINUS represent two ways of thinking about negation, one based on testing whether a pattern exists in the data, given the bindings already determined by the query pattern, and one based on removing matches based on the evaluation of two patterns. In some cases they can produce different answers.

The two requests of your question are cited in the specification and the results are explained in the following way:

SELECT * {

?s ?p ?o .

FILTER NOT EXISTS { ?x ?y ?z } .

}

This request evaluates to a result set with no solutions because { ?x ?y ?z } matches given any ?s ?p ?o, so NOT EXISTS { ?x ?y ?z } eliminates any solutions.

SELECT * {

?s ?p ?o .

MINUS { ?x ?y ?z } .

}

In the request with MINUS, there is no shared variable between the first part (?s ?p ?o) and the second (?x ?y ?z) so no bindings are eliminated.

Differences between Knolwedge Base, Knowledge Graphs, and Ontology

The terms "knowledge base" and "knowledge graphs" have gained a lot of popularity recent years, especially after Google's introduction about the Google Knowledge Graph. However, those terms have been used interchangeably and there has been lacking a good definition for distinguishing those terms.

Recently, the SEMANTiCS paper Towards a Definition of Knowledge Graphs from Ehrlinger, Lisa [2] provides a quite comprehensive analysis on those terms and a good definition about knowledge graph to define the distinction and relationships between those terms, which are quite useful.

"The knowledge base is a dataset with formal semantics that can contain different kinds of knowledge, for example, rules, facts, axioms, definitions, statements, and primitives" [1]

"A knowledge graph acquires and integrates information into an ontology (or knowledge base) and applies a reasoner to derive new knowledge."

More recently, Auer, Sören et al. relaxed the definition a little bit by any method instead of a reasoner when deriving new knowledge. That is,

"A knowledge graph acquires and integrates information into an ontology (or knowledge base) and applies a reasoner or other computaitonal methods to derive new knowledge."

This definition aligns with the assumption that a knowledge graph is somehow superior and more complex than a knowledge base (e.g., an ontology) because it applies a reasoning engine to generate new knowledge and integrates one or more information sources. Consequently, a manually created knowledge graph that does not support integration aspects is a plain knowledge base or knowledge-based system if it provides reasoning capabilities.

It is also interesting to note for me that an ontology consists not only of classes and properties (e.g., owl:ObjectProperty and owl:DatatypeProperty), but can also hold instances (i.e., the population of the ontology).

[1]. J. Davies, R. Studer, and P. Warren. Semantic Web Technologies: Trends and Research in Ontology-based Systems. John Wiley & Sons, 2006.

[2]. Ehrlinger, Lisa and Wolfram Wöß. Towards a Definition of Knowledge Graphs. SEMANTiCS conference, 2016.

[3]. Auer, Sören et al. Towards a knowledge graph for science, International Conference on Web Intelligence, Mining and Semantics, 2018.

Hypertext2017 Travel Report

I participated the 28th ACM Conference on Hypertext and Social Media (HT), which was located at Prague, Czech Republic from 4-7th, July. HT is a top-tier ACM conference in the areas of Hypertext and Social Media. This is the first time I'm attending HT, and interesting to know that TBL was demonstrated WWW in 1991 Hypertext conference https://home.cern/images/2014/01/tim-berners-lee-demonstrates-world-wide-web. https://www.quora.com/Why-is-Sir-Tim-Berners-Lee-unnoticed-when-his-contribution-is-comparable-to-Jobs-and-Gates. This year, HT has 69 regular paper submissions with a 27% acceptance rate, and 12 short-presentations. As I was at UMAP conference twice before, and HT has been held in close proximity with UMAP with similar program committees, I was wondering what's the difference between the two conferences. After attending the conference, I guess the key difference is while UMAP is more focused on the context of e-learning, such as user modeling, RecSys in educational systems, HT is more focused on linking data & resources and Social Media. Although HT has wide range of acceptance rate, overall, it has good average citation according to ACM DL.

Day-1:

Keynote: Peter Mika SCHIBSTED (Yahoo before)

It is interesting to see the keynote on Semantic Web in HT. In this talk, we look back at the history of the Semantic Web. The speaker discussed what the original aspirations of its creators were, and what has been achieved in practice in these two decades including some achievements especially in terms of search engines. In addition, also some failures which have not been achieved based on original visions.

What happened to the Semantic Web? from Peter Mika

Most of the presentations today related to studying problems on Social Media, such as hate speech:

Mainack Mondal, Leandro Augusto de Araújo Silva and Fabrício Benevenuto: A Measurement Study of Hate Speech in Social Media
Stringhini and Athena Vakali: Hate is not binary: Studying abusive behavior of #GamerGate on Twitter

These talks were interesting as I was interested in computational social science when I first started my PhD. For example, the first one above discussed about "how to measure hate speech?", "does the anonymity plays a role in it?", and how these phenomena differ across countries. The results, based on Twitter dataset were interesting. The authors found that there are more anonymous account of hate speech compared to baseline (random), i.e, users post more hate speech.

Day-2:

Keynote: "A MEME IS NOT A VIRUS: THE ROLE OF COGNITIVE HEURISTICS IN INFORMATION DIFFUSION" by Kristina Lerman

Kristina Lerman is Research Team Lead at the University of Southern California Information Sciences Institute and holds a joint appointment as a Research Associate Professor in the USC Computer Science Department. She talked about position bias in Social Mdedia, e.g., posts will be less likely to be seen with lower position, with more newer tweets coming, former tweets then become less likely to be seen with their positions moving down… and the phenomenon is more serious for well-connected users. Also, it is interesting that well-connected hubs are less likely to retweet older posts, retweet probability decreases with connectivity - highly connected people are less susceptible to infection, due to their increased cognitive load.

The presentations on day-2 were diverse, consists of linking content, crowd sourcing, story telling... And the following paper which tackles the problem of understanding task clarity in crowdsourcing platforms, especially CrowedFlow…, and how to measure it, won the best paper award in HT2017.

Ujwal Gadiraju, Jie Yang and Alessandro Bozzon: Clarity is a Worthwhile Quality - On the Role of Task Clarity in Microtask Crowdsourcing

Day-3:

The presentations on day-3 were about location-based social networks, user modeling, ratings/reviews and visualizations. One of the interesting papers was the following one which I had read about the previous work about happy map done by Daniele QUercia (Bell Labs Cambridge). This paper talked about various elements which might affect perceptions (such as safety etc.) of people about places.

David Candeia, Flávio Figueiredo, Nazareno Andrade and Daniele Quercia: Multiple Images of the City: Unveiling Group-Specific Urban Perceptions through a Crowdsourcing Game

My presentation was about "Leveraging Followee List Memberships for Inferring User Interests for Passive Users on Twitter", which is an extended work upon previous work in ECIR2017.

Leveraging Followee List Memberships for Inferring User Interests for Passive Users on Twitter from GUANGYUAN PIAO

Overall, the conference has around 70+ participants. However, what's impressive is the audiences were actively asking questions, and participated in discussions. In addition, the organizers made the proceedings available before the conference along with conference navigator developed by Uni. Pittsburgh: http://halley.exp.sis.pitt.edu/cn3/portalindex.php

Proceedings: http://dl.acm.org/citation.cfm?id=3078714&picked=prox&cfid=782021270&cftoken=32813465

Next year, HT2018 will be in Baltimore, USA. It is a good conference and hope I will have chances to attend the conference in the future as well.

EKAW2016 Travel Report

From 19-24, November, I attended 20th International Conference on Knowledge Engineering and Knowledge Management at Bologna, Italy. It's a biannual conference on Knowledge Engineering along with the K-CAP conference.

There were around 150 participants from worldwide. Regarding submissions, there were 226 abstracts which resulted in 171 final submissions in total. 539 reviews were submitted for those papers and 42 out of 142 research papers have been accepted. Based on further quality assessment, the organizers also divided 42 papers into long presentations (17.3%) and short presentations for presentations during the conference.

Keynotes:

The first keynote was given by Chris Welty from Google research. He talked about how current AI systems are losing information with one label ground truth for training themselves (e.g, a song might be in different genres or not in the options you provided for getting ground truth data with a survey). He pointed out current simplified world for AI, which consists of black and white, while the reality is much complex. To achieve better ground truth labeling, he also introduced solutions such as using the wise crowd with diversity-enabled labeling for training AI systems.

The second keynote was given by Francesca Rossi from IBM research. She talked about AI has the capabilities to make sense of the huge volume of data (text, images, videos, etc.) that surrounds us in our everyday private and professional life, and to transform it into knowledge to be exploited to make better and more informed decisions that could help solving global societal problems such as those in healthcare, transportation, and climate. To achieve these goals, and in order to fully exploit the potential of AI, we need to build intelligent machines that behave ethically and create symbiotic partnerships with humans. So rather than considering/making AI for Decision Making Systems, we need to consider/make it as Decision Support Systems.

The conference sessions are very diverse, from data management to NLP as well as Entity Recognition, Crowdsourcing, ontology related topics etc.

My presentation:

I presented a User Modeling work considering different dimensions studied in the literature for investigating their synergetic effect on User Modeling.

EKAW2016 - Interest Representation, Enrichment, Dynamics, and Propagation: A Study of the Synergetic Effect of Different User Modeling Dimensions for Personalized Recommendations on Twitter from GUANGYUAN PIAO

The organizers also did a good job on proceedings, which were available before the conference:

http://link.springer.com/book/10.1007/978-3-319-49004-5

SEMANTiCS 2016 Travel Report

Day-1: Tutorials & Workshops

I attended the afternoon session about Knowledge Engineering track using PoolParty from Semantic Web Company. I'm interested in how those Semantic Technologies being used in different enterprises, and what kind of solutions they need for soloving what kinds of problems. There were many industrial participants in Europe including Springer etc. As a researcher working closely on Semantic Technologies, Some told they are already using PoolParty and some were attending for better understanding of using Semantic Technologies in Enterprise scenarios, and most of the cases were wondering about integration of heterogeneous data sources, taxonomies and ontologies.

Day-2: Main conference

Stats: This year's conference received 85 submissions with 18 full papers(21.2%) and 8 short papers.

The first keynote: "Linked data experience at Springer Nature" by Michele Pasin

Dr. Michele talked about a summary of Springer's experience with Linked Data & Semantic Technologies for enterprise metadata management at large scale. He also introduced scigraph.com - a upcoming LD platform: one place for their all linked data efforts towards liked science data.

The second keynote: "The semantics of human network" by Marie Wallace, IBM

Marie from IBM shared their experience of using human network which generated by their enterprise social networks using IBM connections for different applications and services. She stressed that capturing human context at a global level, which is happening thanks to the social networks and IoT enabled world, is really important to help human digital experience.

These social dashboards for each employee shows different factors such as activity, reaction etc. of your personal social status and can also provide some recommendations for your improvements in different aspects.

I presented my full paper: "Exploring Dynamics and Semantics of User Interests for User Modeling on Twitter for Link Recommendations" in the Knowledge Discovery session. It is impressive to see the room was full of audiences and had interesting discussion with some audiences. This work also won the best paper award at #semanticsconf.

Day-3: Main conference

The first keynote: "Learning with Memory Embeddings and its Application in the Digitalization of Healthcare" by Volker Tresp from SIEMENS

He talked about mapping of the knowledge graph to a tensor representation whose entries are predicted by models using latent representations of generalized entities, and extension of this approach for medical decision processes.

The second keynote: "Enriching Content with User Data and Semantic Information" by Cathy Dolbear from Oxford Press

She talked about combining human-authored semantic information with semantic tags and taxonomy classifications automatically extracted from our content. She also introduce the Oxford Global Languages project, which links lexical information from multiple global and also digitally under-represented languages such as isiZulu and Urdu in a triple store.

It was a wonderful event which can meet industry people who are dealing with real-world problems with Semantic Technologies, as well as academic researchers. Hope to attend the conference again in the future:)

UMAP2016S

Analyzing Aggregated Semantics-enabled User Modeling on Google+ and Twitter for Personalized Link Recommendations

[UMAP2016 submission by Guangyuan Piao and John G. Breslin]

About

This post provides supplemental material and information about the paper "Analyzing Aggregated Semantics-enabled User Modeling on Google+ and Twitter for Personalized Link Recommendations".

Abstract

In this paper, we study if reusing Google+ profiles can provide reliable recommendations on Twitter to resolve the cold start problem. Next, we investigate the impact of giving different weights for aggregating user profiles from two OSNs and present that giving a higher weight to the targeted OSN profiles for aggregation allows the best performance in the context of a personalized link recommender system. Finally, we propose a user modeling strategy which combines entity- and category-based user profiles using with a discounting strategy. Results show that our proposed strategy improves the quality of user modeling significantly compared to the baseline method.

Slides:
UMAP2016 - Analyzing Aggregated Semantics-enabled User Modeling on Google+ and Twitter for Personalized Link Recommendations from GUANGYUAN PIAO

About.me Dataset

Users tend to have multiple social identities in different OSNs [1]. To retrieve the ground truth data (i.e., users who are using both Google+ and Twitter), we obtained OSN accounts of users from about.me. About.me is a personal web hosting service, which offers registered users a simple platform from which to link multiple online identities, relevant external sites (e.g., personal homepage), and popular OSNs such as Facebook, Twitter, Google+ etc. We started from a set of randomly returned about.me accounts retrieved from about.me API15 and then gradually extended this set in a snowball manner. After all, we crawled 247,630 public profiles pages from about.me during December 2014 that have at least two external links. Two irrelevant external links to OSN identities (i.e., relevant external sites and RSS feeds that users added) were removed.

Figure 1. OSN co-occurring network in about.me dataset

As a result, there are 29 different communities in our dataset (see Figure 1). In Figure 1, the ties between OSNs show the co-occurrence frequency of two social networks in the profile pages of users.

The portion of users having three OSNs is the highest (22%) followed by 20% and 18% for those having four and two social networks, respectively. Over half (60%) of people have 2-4 social networks and each person participates in 4.48 OSNs on average. In our dataset, the number of different OSNs (29) and the average number (4.48) that each person participates in are both higher than the numbers from the previous study [14], which are 15 and 3.92 respectively.

Dataset for our study

As we were interested in analyzing aggregated user profiles from Twitter and Google+, we randomly selected 480 active users from about.me dataset who had been using both OSNs. We extracted their UGC from Twitter and Google+ as well as all links shared with those UGC using our user modeling framework. All DBpedia entities within UGC and those within the content of each link were retrieved using the framework. The numbers of entities extracted from Twitter and Google+ profiles of users are displayed in Figure 3. As we can see from the figure, a greater number Of entities can be extracted from Google+ activities.

Figure 2. The number of entities extracted from Twitter and Google+ profiles of users

References

[1]. J. Liu, F. Zhang, X. Song, Y.-I. Song, C.-Y. Lin, and H.-W. Hon. What's in a name?: an unsupervised approach to link users across communities. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 495-504. ACM, 2013.

ACM SAC 2016 Travel Report

From the 4th to the 8th of April I had the pleasure to participate the 31st ACM Symposium on Applied Computing (ACM SAC), which was held in beautiful city Pisa, Italy. I was there to present my full paper "Measuring Semantic Distance for Linked Open Data-enabled Recommender Systems" and to participate the Student Research Competition sponsored by Microsoft.

This year, there were over 500+ registrations from 59 countries at this conference. There were 37 tracks and the overall acceptance rate for this year is 24%.

Keynotes:

There were two keynotes given by John Mylopoulos and Marco Conti, respectively. The first keynote is about the requirements problem in Software Engineering and the second keynote is about "From MANET to people-centric computing and communications.

Semantic Web Track:

There were two sessions with eight papers for Semantic Web Track where three of the participants two of the participants from our institute. Pasquale Minervini presented "Leveraging the Schema in Latent Factor Models for Knowledge Graph Completion" and another college Feng Gao presented "QoS-Aware Adaptation for Complex Event Service" in another (SOA) track.

Measuring Semantic Distance for Linked Open Data-enabled Recommender Systems from GUANGYUAN PIAO

Social Network and Media Analysis Track (SONAMA):

One of the papers in this track I'm interested in was "Inferring Semantic Interest Profiles from Twitter Followees: Does Twitter Know Better than Your Friends?" from Christoph Besel, University of Passau, Germany, which is related to my work. Although many previous works focused on using tweets for inferring user interest profiles, they used the alternative source (followees) to retrieve user interest profiles, which are based on the tendency that more and more users are consuming feeds instead of producing content on the social networks.

Student Research Competition (SRC):

I also participated in SAC SRC and went through 2nd round (top-5 list) and it was a good opportunity to compete across different disciplines. Congrats to all top-3 winners!

Lunch

Banquet

What would make the conference better?

It would be better to have a Twitter channel to communicate and disseminate activities during the conference. Next year, it will be in Morocco and hope I could attend again:).

---------------------------------------------------------------------------------------------------------
Update after the conference:

The proceedings are available from June, 2016

JIST2015 Travel Report

This is the first travel report for my first participation at a conference for presenting my paper. The Joint International Semantic Technology Conference is a merged conference of ASWC (Asian Semantic Web Conference) and CSWC (Chinese Semantic Web Conference). This year, the conference consists of main conference as well as an industrial forum and a data challenge. The majority of the participants come from China and Japan and the committees expects more participants from the USA and Europe. Overall, there were around 100 participants for the conference and the industrial forum.

Keynote speakers:
There were two keynote speakers and many invited talks during the conference and the industrial forum. Speakers include Prof. Frank van Harmelen, Hong-gee Kim from SNU Korea, Ming Zhou from Microsoft Asia, Prof. Jeff Pan and many industrial participants from giant companies such as Baidu, Alibaba, China Mobile etc.

Prof. Frank van Harmelen talked about the research efforts on Semantic Web till now and the near future and several projects that they are currently working on for understanding the Semantic Web.

2000-2010: how to build it
2010-2015: how to use it
next decades: how to understand it

To this end, he talked about How to build a WoD observatory and their solution named - LOD Laundromat: clean your dirty triples: crawl from registries (CKAN), which consists of 38 billion triples and counting!

Look back the research papers on Semantic Web community, there were a majority of papers optimized in DBpedia last 5 years. This is an especially bad idea since Semantic Web!= DBpedia... They try to use some analytical tools on top of the LOD Laundromat (LODLab, LOTUS) re-run experiments on larger datasets (100+) which was collected by the WoD observatory.

Even at an early stage, there are some interesting findings such as "Hotspots: Most of the dataset is not used in many queries (in terms of datasets)", if we can identify it, we can optimize for these queries and local sub-graph works well for answering queries (no explanation yet - theory)

So the agenda should be changed: not building things, but why those things work so well!

Industry forum:

It was very impressive that there were around 10 companies participated in the forum to present their work on using Semantic Technologies in their companies. The most impressive talk for me was given by Jie Bao from Memect Inc., who was working at RPI and MIT before. He pointed out the difference of research in Europe and the USA:

based on project vs. business
conference: ISWC vs. semantics

He stressed that even Semantic Web community is often talking about schema.org for a good example of SW, it is a project and not a business. There are little start ups survived at present even we expected a new semantic search engine, semantic agent etc..

And other slides on start-up companies with Semantic Web...

Data Challenge:

I think this session is one of the most beneficial ones for me. I personally participated the data challenge on entity type prediction and placed 4th among 13 teams. Other top-5 participants came from Peking University, Chinese Academy of Sciences, Fujitsu R&D and Dongnan University. Learnt a lot about NLP and Machine Learning and I need to improve NLP skills and ML skills in the near future.

JIST2015 data challenge from GUANGYUAN PIAO

Editors' session:

Since my paper "Computing the Semantic Similarity of Resources in DBpedia for Recommendation Purposes" for the main conference was one of the Best Paper Candidates, I had the chance to meet journal editors to get some feedbacks about the work. I had many useful comments from them and had a lot of work to do to improve/strengthen it.

Computing the Semantic Similarity of Resources in DBpedia for Recommendation Purposes from GUANGYUAN PIAO

It was a good experience to attend the conference and the JIST2016 will be at Singapore:)

---------------------------------------------------------------------------------------------------------
Update after the conference:

The proceedings are available from March, 2016. This is a very slow publication of post proceeding compared to other conferences such as UMAP. The proceeding of UMAP was available during the conference while JIST2015 proceeding had not been available until March, 2016. Both were published by Springer, what's the difference??

Previous JIST conferences have external proceedings for posters and workshops in CEUR but not this time. May be it depends on the organizing committees and it is better to check it before you submit any workshop or poster papers in a conference.

Although the conference has a line up of great speakers and programs, it would be better to disseminate the activities of the conference in global channel like Twitter (instead of using the localized social network) as it is a "international" conference.