Guangyuan's Research and Development Blog: SPARQL

Showing posts with label SPARQL. Show all posts

Setting up Jena Fuseki using Docker image

Apache Jena is a free and open source Java framework for building Semantic Web and Linked Data applications. The framework is composed of different APIs interacting together to process RDF data, as we can see from the figure below.

The focus of this post is about setting up Fuseki using Docker. In short, Fuseki a SPARQL server which can present RDF data and answer SPARQL queries over HTTP.

The content of this post is as follows:

Run Fuseki server with Docker image
Configuration of Fuseki data service
Test SPARQL queries on the server using cURL

Run Fuseki server with Docker image

If you are familiar with Docker, it is very convinient to set up a Fuseki server using a Fuseki Docker image from the Docker Hub, which is a cloud-based repository service provided by Docker for finding, storing, and sharing container images.

I have Docker Desktop in my laptop which provides an integrated development environment (IDE) for building, shipping, and running containerized applications using Docker.

After pulling the Fuseki Docker image, we can follow the instruction to run a Fuseki server already:


docker run --rm -it -p 3030:3030 --name fuseki -e ADMIN_PASSWORD=[PASSWORD] -e ENABLE_DATA_WRITE=[true|false] -e ENABLE_UPDATE=[true|false] -e QUERY_TIMEOUT=[number in milliseconds] --mount type=bind,source="$(pwd)"/fuseki-data,target=/fuseki-base/databases secoresearch/fuseki

The server should be accessible at http://localhost:3030.

Configuration of Fuseki data service

So far so good. One thing to note is that Fuseki server is running with default settings or configurations. According to the instruction in the Fuseki Docker image page, it also mentioned that we need to add configuration file assembler.ttl file under the fuseki-configuration/ folder. You can find the ttl file on the GitHub repo of the Fuseki Docker image provider. The instruction to run the Fuseki server with custom configuration is as follows:


mkdir fuseki-data
mkdir fuseki-configuration
cp -p assembler.ttl fuseki-configuration/
# edit fuseki-configuration/assembler.ttl to enable the endpoints you wish
docker run --rm -it -p 3030:3030 --name fuseki -e ADMIN_PASSWORD=[PASSWORD] -e QUERY_TIMEOUT=[number in milliseconds] --mount type=bind,source="$(pwd)"/fuseki-data,target=/fuseki-base/databases --mount type=bind,source="$(pwd)"/fuseki-configuration,target=/fuseki-base/configuration secoresearch/fuseki

Otherwise, we can also come up with our own configuration file based on the Fuseki Data Service Configuration Syntax. For example, in my case, I would like to simply test with some dummy RDF data loaded when the server starting up, so I also set up a MemoryModel for a .ttl file containing the RDF data I'm interested in. Everytime the server is starting, it contains the RDF dataset that I can play around with, and run some SPARQL queries over the dataset.


<#service> rdf:type fuseki:Service ;
    fuseki:name              "ds" ;   # http://host:port/ds
    fuseki:dataset           <#tdb> ;
    fuseki:endpoint [ 
         # SPARQL query service
        fuseki:operation fuseki:query ; 
        fuseki:name "sparql"
    ] ;
    
    ... ...
        
<#tdb>    rdf:type ja:RDFDataset ;
    rdfs:label "EnergyConsumption" ;
    ja:defaultGraph
      [ rdfs:label "DAYTON.ttl" ;
        a ja:MemoryModel ;
        ja:content [ja:externalContent <ttl file location>] ;
      ] ;
    .

Test SPARQL queries on the server

If you are planning to interact with the Fuseki server set up in your program such as using Python, you might need to test SPARQL queries via HTTP out first. One way is directly using the browser and type your endpoint with your query parameter:


http://localhost:3030/ds/sparql?query=SELECT%20*%20WHERE%20{?s%20?p%20?o}%20limit%203

If you are using curl for testing a SPARQL query, you can submit a URL-encoded SPARQL query


curl "http://localhost:3030/ds/sparql?query=SELECT%20*%20WHERE%20\{?s%20?p%20?o\}%20limit%203"

It seems escapes for starting and ending brackets are needed. For more details regarding using curl for SPARQL queries, one can refern to cURLing SPARQL, which contains much more details regarding the topic.

SPARQL: FILTER NOT EXISTS and MINUS

I was wondering the difference between FILTER NOT EXISTS and MINUS, and found out a great answer here

The difference between FILTER NOT EXISTS and MINUS is related to the two styles of negation used by SPARQL. According to the specification:

The SPARQL query language incorporates two styles of negation, one based on filtering results depending on whether a graph pattern does or does not match in the context of the query solution being filtered, and one based on removing solutions related to another pattern.

Still according to the specification:

NOT EXISTS and MINUS represent two ways of thinking about negation, one based on testing whether a pattern exists in the data, given the bindings already determined by the query pattern, and one based on removing matches based on the evaluation of two patterns. In some cases they can produce different answers.

The two requests of your question are cited in the specification and the results are explained in the following way:

SELECT * {

?s ?p ?o .

FILTER NOT EXISTS { ?x ?y ?z } .

}

This request evaluates to a result set with no solutions because { ?x ?y ?z } matches given any ?s ?p ?o, so NOT EXISTS { ?x ?y ?z } eliminates any solutions.

SELECT * {

?s ?p ?o .

MINUS { ?x ?y ?z } .

}

In the request with MINUS, there is no shared variable between the first part (?s ?p ?o) and the second (?x ?y ?z) so no bindings are eliminated.

Virtuoso 데이터베이스를 이용한 DBpedia local mirror환경 설정

DBpedia Endpoint가 유지보수때문인지 새벽이 되면 서비스가 끊겨서 지속적인 query가 불가능하서 로컬 미러를 만들어보기로 함... 원문: Setting up local DBpedia with virtuoso

버전 정보:

DBpedia 3.9EN (HDT file 과 동일한 버전으로 하기 위해서)
Virtuoso Opensource 7.1.0
Ubuntu

원문에서는 VM 4코어 32기가 램을 사용햇고 다른 Freebase와 같은 Knowledge Base를 같이 사용할려고 하면 더 많은 램을 (적어도 64) 추천하고 있다.

디폴트 포트는 8890 인데 사용상 편의를 위해서 내가 원하는 거로 바꿈 (8081)

서버IP:8081로 접속하면 virtuoso 관리하는 페이지가 나온다. 디폴트 계정은: dba / dba (자주쓰는 비번으로 변경)

bz2->gz 로 압축 포맷을 바꾸는데 1시간 정도 걸리는 것 같고 rdf_loader_run(); 을 실행하는데 1시간 넘게 걸린 것 같다.

서버 내리고 올리기 명령어

/etc/init.d/virtuoso-opensource stop
/etc/init.d/virtuoso-opensource start

안쓰는 애들 삭제하기

isql 들어가기
삭제명령 날리기: SPARQL DELETE FROM <http://dbpedia/org> { ?x ?y ?z .} WHERE {?x ?y ?z . FILTER isLiteral(?z) }

SPARQL query involving wikiPageWikiLink not working

OntologyProperty:WikiPageWikiLink

Just found the property is included in the dataset that downloaded from http://www.rdfhdt.org/datasets/, which is not found via SPARQL endpoint of DBpedia before.

This property denotes a "Link from a Wikipage to another Wikipage" and DBpedia extracts the Wikipedia page links and offers them for download, but does not add them to the public SPARQL endpoint. There are simply too many of them - they would overwhelm the SPARQL server.

This property might be useful to extract latent relationships between two resources:)

# How to query SPARQL with spaces?

RDF itself uses RDF URI references. The SPARQL query language, on the other hand, uses IRIs (the reason is that RDF predates IRIs and the notion of RDF URI references was developed in anticipation of what IRIs were expected to eventually look like. They almost got it right :)).

RDF URI references allow whitespaces while IRIs do not.

Another way of doing this is by using FILTER (STR(?Y)) "your url") like below:


PREFIX owl:  

SELECT ?y 

WHERE { 

   ?y owl:sameAs ?x

   FILTER (str(?x) = "http://website.com/urls/playing fun")

}

[1]. http://stackoverflow.com/questions/9055146/querying-with-spaces-sparql

BaseKB for Freebase data

Freebase will soon be discontinuing the API which allows queries by the proprietary MQL interface, and the effort below has a conversion of their data dump to a 1.2 billion triple products which is standards compliant.

http://basekb.com/gold/

Select all MusicalArtist and Bands from DBpedia Endpoint

 select COUNT(distinct ?artist) AS ?count where {  
 ?artist <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/MusicalArtist>}

Result: 45107

Get resources will return 10000 results by the following query:

 select distinct ?artist where {   
  ?artist <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/MusicalArtist>}  
 ORDER BY ?artist

Get the next 10000 results with offset

 select distinct ?artist where {   
  ?artist <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/MusicalArtist>}  
 ORDER BY ?artist  
 LIMIT 10000  
 OFFSET 20001

However, when the offset comes to 30001, there's an error:

Virtuoso 22023 Error SR353: Sorted TOP clause specifies more then 40001 rows to sort. Only 40000 are allowed. Either decrease the offset and/or row count or use a scrollable cursor

So I changed the ordering direction and get 20000 resources and removed duplicates

 select distinct ?artist where {   
  ?artist <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/MusicalArtist>}  
 ORDER BY DESC(?artist)  
 LIMIT 10000

Obviously, if you want to get more than 60000 resources. The problem will become complex...

Sesame 2 install

Requirement :

- Java 1.7
- Tomcat 8

Install Sesame2 :

1. Download sesame

2. Put two war files within war folder to webapps folder in Tomcat

3. You could visit http://localhost:8080/openrdf-workbench to check it is working or not.

4. New repository with Native Java Store (which is store in drive)

5. Add data files with your format to repository and you could use SPARQL to query big triple data as well.

Try to load .nq file with over 1GB and not working well using web interface and use the API (sesame as library), then, it again have problem with .nq file!!!

----------------

View local store from web interface:

Virtuoso triple store database for Windows

1. Download the virtuoso for Windows.

http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSUsageWindows

2. Create [system environment] VIRTUOSO_HOME as C:/Program Files/OpenLink Software/VOS7/virtuoso-opensource/

3. Add PATH value ;%VIRTUOSO_HOME%/bin;%VIRTUOSO_HOME%/lib

4. Create a servie :

cd %VIRTUOSO_HOME%/database

virtuoso-t +service create +instance "New Instance Name" +configfile virtuoso.ini

5. After U create the service, you could list / start / stop / delete by commands below:

Action	Command
List all Virtuoso services	`virtuoso-t +service list`
Start a Virtuoso service	`virtuoso-t +instance "Instance Name" +service start`
Stop a Virtuoso service	`virtuoso-t +instance "Instance Name" +service stop`
Delete a Virtuoso service	`virtuoso-t +instance "Instance Name" +service delete`

---------------------Erros Resolution--------------------

Error 1 : Unable to open the service control manager > (5).

Solution : Run cmd as Administration

Error 2 : While executing iSQL, can't access file directory because of ini file settings.

Solution : Modify (or add) directory allow settings in virtuoso.ini file.

References:

Using Virtuoso : http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSUsageWindows

Configuration :
http://docs.openlinksw.com/virtuoso/dbadm.html#fp_acliniallowed

Semantic Web Practice - Other SPARQL Query Forms (CONSTRUCT, ASK, DESCRIBE)

In SPARQL, there are four query forms - SELECT, CONSTRUCT, ASK, DESCRIBE with different return values.

SELECT : Return all or subset of the variables bound in a query pattern match
CONSTRUCT : Returns an RDF graph constructed by substituting variables in a set of triple templates.
ASK : Returns a Boolean indicating whether a query pattern matches or not.
Describe : Returns an RDF graph that describes the resources found.

Since we all quite know well about SELECT form, I'd like to give example on other forms here. Suppose we have ontology as below and four books have their prices separately.

Practice#1 : Get books(Subject)→price(Predicate)→price value(Object) triples.

1:  PREFIX : <http://boobks.example#>  
2:  CONSTRUCT { ?book :price ?price }  
3:  WHERE {  
4:   ?book :price ?price .  
5:  }  
6:  ORDER BY ?book

Query Result :

Practice#2 : Use ASK form to test Subject→affiliates(Predicate)→Object pattern exist or not.

1:  PREFIX : <http://boobks.example#>  
2:  ASK { ?org :affiliates ?author}

Practice#3 : Get RDF graph describes Subject→writesBook(Predicate)→book3(Object).

1:  PREFIX : <http://boobks.example#>  
2:  DESCRIBE ?x  
3:  WHERE { ?x :writesBook :book3 }

Query Result : It will return RDF graph describe your query pattern.

Semantic Web Practice - simple SPARQL example in Protege

Suppose that you have your personal profile ontology as displayed below, and the individual -> guangyuan-piao has data type properties:

guangyuan-piao→foaf:age→28
guangyuan-piao→foaf:familyName→"Piao"
guangyuan-piao→foaf:givenName→"Guangyuan"

Practice#1 : select organizations and persons which has the relationship "member".

 1:  PREFIX foaf: <http://xmlns.com/foaf/0.1/>  
 2:  SELECT ?organization ?person  
 3:     WHERE { ?organization foaf:member ?person }

This should return all organization-person pairs with the relationship "member". The query result is as below:

DERI - guangyuan-piao
Yonsei_Univeristy - guangyuan-piao
Jilin_University - guangyuan-piao

Practice#2 : select person with his/her age who has family name "piao".

1:  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>  
2:  PREFIX foaf: <http://xmlns.com/foaf/0.1/>  
3:  SELECT ?person ?age  
4:       WHERE { ?person foaf:age ?age .  
5:               ?person foaf:familyName "Piao"^^xsd:string  
6:       }

Practice#3 : select full name who at the age of 28.

1:  PREFIX foaf: <http://xmlns.com/foaf/0.1/>  
2:  SELECT ( CONCAT(?G," ",?F) AS ?name)  
3:       WHERE { ?P foaf:givenName ?G; foaf:familyName ?F .  
4:               ?P foaf:age 28  
5:       }

Practice#4 : select person who's family name start with "Pi".

1:  PREFIX foaf: <http://xmlns.com/foaf/0.1/>  
2:  SELECT ?person  
3:      WHERE { ?person foaf:familyName ?fn .  
4:              FILTER regex(?fn, "^Pi")   
5:      }

Q: What if change the regex(?fn, "^Pi") to regex(?fn, "^pi")?
A: It will not return any value since it's case-sensitive. But you could give the case-insensitive option to get the result.

1:  PREFIX foaf: <http://xmlns.com/foaf/0.1/>  
2:  SELECT ?person  
3:      WHERE { ?person foaf:familyName ?fn .  
4:          FILTER regex(?fn, "^pi", "i")   
5:      }

Practice#5 : select person who's age is over 29.

1:  PREFIX foaf: <http://xmlns.com/foaf/0.1/>  
2:  SELECT ?person  
3:      WHERE { ?person foaf:age ?age .  
4:          FILTER (?age > 29)   
5:      }

There's none of the result since guangyuan-piao's age is 28.

Practice#6 : select individual who has Person class.

1:  PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  
2:  PREFIX foaf: <http://xmlns.com/foaf/0.1/>  
3:  SELECT ?person  
4:       WHERE { ?person rdf:type foaf:Person }

This is an easy SPARQL and SPARQL also provide keyword "a" which is alternative for IRI-http://www.w3.org/1999/02/22-rdf-syntax-ns#type (case sensitive) looks neat than before.

1:  PREFIX foaf: <http://xmlns.com/foaf/0.1/>  
2:  SELECT ?person  
3:       WHERE { ?person a foaf:Person }

Other Examples:

# Example: Select all MusicArtists and Bands from DBpedia
# Example: How to query SPARQL with spaces?