The collected data from Facebook profiles about personal preferences (“likes”) for items in three domains: movies, books and music. After a process of user anonymization , the items available in the dataset have been mapped to their corresponding DBpedia URIs. These mappings can be used to extract semantic features from DBpedia or other LOD repositories to be exploited by the recommendation approaches proposed in the challenge. The dataset is split in a training set and an evaluation set .
The training set, the test set and the mapping files are available at eswc2015-lod -recsys -challenge-v1. 0. zip
For each domain (movies, books and music) this archive contains:
training set file with each line composed by:the \tuserID .itemID mapping file with each line composed by:the \t type \t DBpediaURI.itemID
The dataset contains 3225 items for the book domains, 6372 items for the music domains and 5389 items for the movie domains.
The training set contains 11600 ratings for the book domains, 854016 ratings for the music domains and 638268 ratings for the movie domains.
- Further investigation of the dataset
- After checking the
of music domain, it contains 1,093,851 likes information for music domains (liked music artists or bands)dataset - 52,072 users with 21 liked items on av
ageer - Can construct 5 test cases and others for training
cases - Some bad characters within URIs such as http://dbpedia.org/resource/S�bastien_Tellier can be found through
SPARQL query like below:
select distinct ?s where {
?s ?p ?o .
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o1 .
VALUES (?o1) {(<http://dbpedia.org/ontology/MusicalArtist>) (<http://dbpedia.org/ontology/Band>)}
FILTER regex(str(?s), "http://dbpedia.org/resource/S.*bastien_Tellier")
} LIMIT 100
No comments:
Post a Comment