Import DBpedia to Neo4j Database




dbpedia {

  # The location of the database directory for Neo4j
  # does not necessarily need to end in .db
  db-dir = "dbpedia.db"

  # The size of one transaction; this has no meaning for the batch importer
  # The size does not denoted the number of statements in one transaction
  # but the number of distinct subjects
  tx-size = 10000

  # an estimated (the more accurate, the better) value of how much resources
  # there will be created during the process
  approx-resources = 50000000

  # if true, create a schema index during shutdown phase
  # this will be costly at the end, but has no effect on the actual import performance
  deferred-index = true
}


  • Run the importer
java -server -Xmx32g -Ddbpedia.db-dir=dbpedia.db -jar dbpedia-neo4j.jar {article-categories_en,dbpedia_2015-04,infobox-property-definitions_bg,instance-types_en,instance_types_lhd_dbo_en,instance_types_lhd_ext_en,instance_types_sdtyped-dbo_en,instance-types-transitive_en,mappingbased-properties_en,out-degree_bg,page-links_en,persondata_en,redirects_bg,skos-categories_en,specific-mappingbased-properties_en,transitive-redirects_bg}.nt.bz2


  • Process without index (without reference.conf configured)
    • Time: 2days
    • DB size: 44G
    • Drawback: can not query about the node with uri (e.g., the query for " MATCH (n {uri:'http://dbpedia.org/resource/Steve_Jobs') RETURN n } is too slow...). In order to address this, you can add index for neo4j based on your requirements later.

Other problems might be occured:
- cannot start neo4j server (permission denied) : chown -R neo4j /var/lib/neo4j (or your directory of the neo4j folder)

No comments:

Post a Comment