Connecting Apache NiFi with Jena Fuseki in Kubernetes

Background

The setup before was running both Apache NiFi and Apache Jena Fuseki using Docker in a single local machine. And I would like to deploy Fuseki instance on a Kubernetes (k8s) cluster. The NiFi processes incoming data stream and store semantified (a set of triples) into the knowledge graph stored in Fuseki, which can be queried afterwards.


Deploying Jena Fuseki using Helm 

I used Helm, a package manager for Kubernetes, just like dpkg in Debian. Helm charts simplify the deployment and management of applications on Kubernetes clusters by providing a consistent way to package, configure, and deploy applications and services. Similar to Docker hub, there are many helm charts available on https://artifacthub.io/. I tested one of them for deploying Fuseki on the k8s cluster: https://artifacthub.io/packages/helm/inseefrlab/jena.

There is no problem using a minikube in a local laptop by using the helm chart installation command for the Fuseki helm chart directly. However, it turns out that there is an issue of persistent volumes (PV) when installing the helm chart on the k8s cluster. To solve this issue, some familarity of PV, persistent volume claims (PVC), and Storage Class of k8s are required. 

Simply put, things need to consider include:
  • Create PV first for the persistent volume claims (PVC) of the deoployment of Fuseki
  • Make the volume (folder) in the (master) node has writing permission for running the Fuseki container. 
You can check PV or PVC using commands:
$ kubectl get pv
$ kubectl get pvc


401 Unauthorized Issue for Accessing Fuseki

Another issue was a "401 unauthorized" error when the Apache NiFi tries to write data via the GSP endpoint of Jena Fuseki. In this case, authentication is required with the connection to the server. Jena provides authentication guide here.

...
Authenticator authenticator = AuthLib.authenticator("username", "password");
HttpClient httpClient = HttpClient.newBuilder()
	.authenticator(authenticator)
	.build();
// Setup connection
RDFConnection connection = RDFConnectionFuseki.create()
	.destination(context.getProperty(DESTINATION).getValue())
	.gspEndpoint(context.getProperty(GSP_ENDPOINT).getValue())
	.acceptHeaderSelectQuery("application/sparql-results+json, application/sparql-results+xml;q=0.9")
	.httpClient(httpClient)
	.build();
...

The destination is set by one of the properties of the custom processor in NiFi, e.g., http://[IP address of Fuseki]:3030/ds/, and the gsp (SPARQL Graph Store Protocol) endpoint is also set by one of the properties, .e.g., data, depending on your Fuseki setup. 

Once authentication is added, the NiFi custom processor including the above code snippet is able to communicate with the Fuseki instance on k8s cluster without any problem and run the same as in the local environment.

No comments:

Post a Comment