Service Deployment
The Dynamic-Graph-Service can be deployed on a Kubernetes cluster using the Helm package manager.
Prerequisites
Kubernetes 1.19+
Helm 3.2.0+
PV provisioner support in the underlying infrastructure
Deploy kafka Queue Service
The Dynamic-Graph-Service uses Kafka queue service to store streaming graph updates and sampled results. Before deploy a DGS service, you must deploy a kafka cluster first and create the following kafka queues:
dl2spl
: receiving graph updates from your data-source loaders, and consumed by sampling workers.spl2srv
: receiving sampled results from sampling workers, and consumed by serving workers.
Installing the Chart
Get repo info:
helm repo add DGS https://graphlearn.oss-cn-hangzhou.aliyuncs.com/charts/dgs/
helm repo update
Install the chart with release name my-release
:
helm install my-release dgs/dgs \
--set-file graphSchema=/path/to/schema/json/file \
--set kafka.dl2spl.brokers=[your kafka broker list of dl2spl] \
--set kafka.dl2spl.topic="your_kafka_topic_of_dl2spl" \
--set kafka.dl2spl.partitions=your_kafka_partitions_of_dl2spl \
--set kafka.spl2srv.brokers=[your kafka broker list of spl2srv] \
--set kafka.spl2srv.topic="your_kafka_topic_of_spl2srv" \
--set kafka.spl2srv.partitions=your_kafka_partitions_of_spl2srv
The graph schema must be specified from a json string or file by parameter graphSchema
.
An example
schema file can be followed to write your customized graph schema.
The info of dl2spl
and spl2srv
of your pre-deployed kafka cluster should be configured when you install the chart,
refer to Kafka service parameters
These commands deploy Dynamic-Graph-Service on the Kubernetes cluster in the default configuration. The Parameters section lists the parameters that can be configured during installation.
Tip: List all releases using
helm list
Uninstalling the Chart
To uninstall/delete the my-release
deployment:
helm delete my-release
The command removes all the Kubernetes components associated with the chart and deletes the release.
Parameters
Global parameters
Name | Description | Value |
---|---|---|
kubeVersion |
Override Kubernetes version | "" |
nameOverride |
String to partially override common.names.fullname | "" |
fullnameOverride |
String to fully override common.names.fullname | "" |
clusterDomain |
Default Kubernetes cluster domain | cluster.local |
commonLabels |
Labels to add to all deployed objects | {} |
commonAnnotations |
Annotations to add to all deployed objects | {} |
graphSchema |
The json string of graph schema, must be set during installation | "" |
configPath |
The service configmap mount path | "/dgs_conf" |
glog.toConsole |
Specify whether program logs are written to standard error as well as to files | false |
Kafka service parameters
Name | Description | Value |
---|---|---|
kafka.dl2spl.brokers |
Kafka brokers of processed graph updates from dataloader to sampling workers | ["localhost:9092"] |
kafka.dl2spl.topic |
Kafka topic of (dataloader -> sampling workers) | "record-batches" |
kafka.dl2spl.partitions |
Topic partition number of (dataloader -> sampling workers) | 4 |
kafka.spl2srv.brokers |
Kafka brokers of sampled updates from sampling workers to serving workers | ["localhost:9092"] |
kafka.spl2srv.topic |
Kafka topic of (sampling workers -> serving workers) | "sample-batches" |
kafka.spl2srv.partitions |
Topic partition number of (sampling workers -> serving workers) | 4 |
Image parameters
Name | Description | Value |
---|---|---|
image.registry |
Core service image registry | "graphlearn" |
image.repository |
Core service image repository | "dgs-core" |
image.tag |
Core service image tag (immutable tags are recommended) | "1.0.0" |
image.pullPolicy |
Core service image pull policy | IfNotPresent |
image.pullSecrets |
Specify docker-registry secret names as an array | [] |
FrontEnd parameters
Name | Description | Value |
---|---|---|
frontend.ingressHostName |
The host name of external ingress | "dynamic-graph-service.info" |
frontend.limitConnections |
The number of concurrent connections allowed from a single IP address | 10 |
Common Pod parameters
All workers have the same parameters of common pod assignment, worker-type={coordinator, sampling, serving}.
Name | Description | Value |
---|---|---|
${worker-type}.updateStrategy.type |
Pod deployment strategy type | "RollingUpdate" |
${worker-type}.updateStrategy.rollingUpdate |
Pod deployment rolling update configuration parameters | {} |
${worker-type}.podLabels |
Extra labels for worker pod | {} |
${worker-type}.podAnnotations |
Extra annotations for worker pod | {} |
${worker-type}.podAffinityPreset |
Pod affinity preset. Ignored if ${worker-type}.affinity is set. Allowed values: soft or hard |
"" |
${worker-type}.podAntiAffinityPreset |
Pod anti-affinity preset. Ignored if ${worker-type}.affinity is set. Allowed values: soft or hard |
"soft" |
${worker-type}.nodeAffinityPreset.type |
Node affinity preset type. Ignored if ${worker-type}.affinity is set. Allowed values: soft or hard |
"" |
${worker-type}.nodeAffinityPreset.key |
Node label key to match Ignored if ${worker-type}.affinity is set. |
"" |
${worker-type}.nodeAffinityPreset.values |
Node label values to match. Ignored if ${worker-type}.affinity is set. |
[] |
${worker-type}.affinity |
Affinity for pod assignment | {} |
${worker-type}.nodeSelector |
Node labels for pod assignment | {} |
${worker-type}.tolerations |
Toleration for pod assignment | [] |
${worker-type}.resources.limits |
The resources limits for the container | {} |
${worker-type}.resources.requests |
The requested resources for the container | {} |
${worker-type}.persistence.enabled |
Enable worker checkpoints persistence using PVC | false |
${worker-type}.persistence.storageClass |
PVC Storage Class for checkpoint data volume | "" |
${worker-type}.persistence.accessModes |
Persistent Volume Access Modes | ["ReadWriteOnce"] |
${worker-type}.persistence.size |
PVC Storage Request for checkpoint data volume | 20Gi |
${worker-type}.persistence.annotations |
Annotations for the PVC | {} |
${worker-type}.persistence.selector |
Selector to match an existing Persistent Volume for checkpoint data PVC. | {} |
${worker-type}.persistence.mountPath |
Mount path of the checkpoint data volume | "/${worker-type}_checkpoints" |
${worker-type}.livenessProbe.enabled |
Enable livenessProbe on ${worker-type} containers | true |
${worker-type}.livenessProbe.initialDelaySeconds |
Initial delay seconds for livenessProbe | 10 |
${worker-type}.livenessProbe.periodSeconds |
Period seconds for livenessProbe | 10 |
${worker-type}.livenessProbe.timeoutSeconds |
Timeout seconds for livenessProbe | 1 |
${worker-type}.livenessProbe.failureThreshold |
Failure threshold for livenessProbe | 3 |
${worker-type}.livenessProbe.successThreshold |
Success threshold for livenessProbe | 1 |
Coordinator parameters
Name | Description | Value |
---|---|---|
coordinator.readinessProbe.enabled |
Enable readinessProbe on coordinator containers | true |
coordinator.readinessProbe.initialDelaySeconds |
Initial delay seconds for readinessProbe | 5 |
coordinator.readinessProbe.periodSeconds |
Period seconds for readinessProbe | 10 |
coordinator.readinessProbe.timeoutSeconds |
Timeout seconds for readinessProbe | 1 |
coordinator.readinessProbe.failureThreshold |
Failure threshold for readinessProbe | 6 |
coordinator.readinessProbe.successThreshold |
Success threshold for readinessProbe | 1 |
coordinator.rpcService.port |
Coordinator headless rpc service port for internal connections | 50051 |
coordinator.rpcService.clusterIP |
Static clusterIP or None for Coordinator headless rpc service | "" |
coordinator.rpcService.sessionAffinity |
Control where internal rpc requests go, to the same pod or round-robin | None |
coordinator.rpcService.annotations |
Additional custom annotations for Coordinator headless rpc service | {} |
coordinator.httpService.port |
Coordinator http service port for external admin requests | 8080 |
coordinator.httpService.sessionAffinity |
Control where external http requests go, to the same pod or round-robin | None |
coordinator.httpService.externalTrafficPolicy |
Coordinator http service external traffic policy | Cluster |
coordinator.httpService.annotations |
Additional custom annotations for Coordinator http service | {} |
coordinator.workdir |
Local ephemeral storage mount path for Coordinator working directory | "/coordinator_workdir" |
coordinator.connectTimeoutSeconds |
The max timeout seconds when other workers connect to Coordinator | 60 |
coordinator.heartbeatIntervalSeconds |
The heartbeat interval in seconds when other workers report statistics to coordinator | 10 |
Sampling parameters
Name | Description | Value |
---|---|---|
sampling.workerNum |
Number of Sampling workers | 2 |
sampling.workdir |
Local ephemeral storage mount path for Sampling working directory | "/sampling_workdir" |
sampling.actorLocalShardNum |
Local computing shard number for each Sampling Worker pod | 4 |
sampling.dataPartitionNum |
The total partition number of data across all Sampling Workers | 8 |
sampling.rocksdbEnv.highPriorityThreads |
The thread number of high-priority rocksdb background tasks | 2 |
sampling.rocksdbEnv.lowPriorityThreads |
The thread number of low-priority rocksdb background tasks | 2 |
sampling.sampleStore.memtableRep |
The rocksdb memtable structure type of sample store | "hashskiplist" |
sampling.sampleStore.hashBucketCount |
The hash bucket count of sample store memtable | 1048576 |
sampling.sampleStore.skipListLookahead |
The look-ahead factor of sample store memtable | 0 |
sampling.sampleStore.blockCacheCapacity |
The capacity (bytes) of sample store block cache | 67108864 |
sampling.sampleStore.ttlHours |
The TTL hours for sampling data in sample store | 1200 |
sampling.subscriptionTable.memtableRep |
The rocksdb memtable structure type of Sampling subscription table | "hashskiplist" |
sampling.subscriptionTable.hashBucketCount |
The hash bucket count of subscription table memtable | 1048576 |
sampling.subscriptionTable.skipListLookahead |
The look-ahead factor of subscription table memtable | 0 |
sampling.subscriptionTable.blockCacheCapacity |
The capacity (bytes) of subscription table block cache | 67108864 |
sampling.subscriptionTable.ttlHours |
The TTL hours for sampling rules in subscription table | 1200 |
sampling.recordPolling.threadNum |
The thread number for graph update consuming from kafka queues | 2 |
sampling.recordPolling.retryIntervalMs |
The retry interval (ms) when no record has been polled | 100 |
sampling.recordPolling.processConcurrency |
The max processing concurrency for polled records | 100 |
sampling.samplePublishing.producerPoolSize |
The max number of kafka producer for sampling results | 2 |
sampling.samplePublishing.maxProduceRetryTimes |
The maximum retry times of producing a kafka message | 3 |
sampling.samplePublishing.callbackPollIntervalMs |
The interval(ms) for polling async producing callbacks | 100 |
sampling.logging.dataLogPeriod |
Specify how many graph update batches should be processed between two logs | 10 |
sampling.logging.ruleLogPeriod |
Specify how many sampling rules should be processed between two logs | 10 |
Serving parameters
Name | Description | Value |
---|---|---|
serving.workerNum |
Number of Serving workers, each Serving Worker is an independent pod | 2 |
serving.readinessProbe.enabled |
Enable readinessProbe on Serving worker containers | true |
serving.readinessProbe.initialDelaySeconds |
Initial delay seconds for readinessProbe | 30 |
serving.readinessProbe.periodSeconds |
Period seconds for readinessProbe | 10 |
serving.readinessProbe.timeoutSeconds |
Timeout seconds for readinessProbe | 1 |
serving.readinessProbe.failureThreshold |
Failure threshold for readinessProbe | 6 |
serving.readinessProbe.successThreshold |
Success threshold for readinessProbe | 1 |
serving.httpService.port |
The external port of Serving http service for inference queries | 10000 |
serving.httpService.sessionAffinity |
Control where http requests go, to the same pod or round-robin | None |
serving.httpService.externalTrafficPolicy |
The external traffic policy of Serving http service | Cluster |
serving.httpService.annotations |
Additional custom annotations of Serving http service | {} |
serving.workdir |
Local ephemeral storage mount path for Serving working directory | "/serving_workdir" |
serving.actorLocalShardNum |
Local computing shard number for each Serving Worker pod | 4 |
serving.dataPartitionNum |
The partition number of data for each Serving Worker | 4 |
serving.rocksdbEnv.highPriorityThreads |
The thread number of high-priority rocksdb background tasks | 2 |
serving.rocksdbEnv.lowPriorityThreads |
The thread number of low-priority rocksdb background tasks | 2 |
serving.sampleStore.inMemoryMode |
Specify whether to open rocksdb in-memory mode of sample store | false |
serving.sampleStore.memtableRep |
The rocksdb memtable structure type of sample store | "hashskiplist" |
serving.sampleStore.hashBucketCount |
The hash bucket count of sample store memtable | 1048576 |
serving.sampleStore.skipListLookahead |
The look-ahead factor of sample store memtable | 0 |
serving.sampleStore.blockCacheCapacity |
The capacity (bytes) of sample store block cache | 67108864 |
serving.sampleStore.ttlHours |
The TTL hours for serving data in sample store | 1200 |
serving.recordPolling.threadNum |
The thread number for sample update consuming from kafka queues | 2 |
serving.recordPolling.retryIntervalMs |
The retry interval (ms) when no record has been polled | 100 |
serving.recordPolling.processConcurrency |
The max processing concurrency for polled records | 100 |
serving.logging.dataLogPeriod |
Specify how many sample update batches should be processed between two logs | 10 |
serving.logging.requestLogPeriod |
Interval of incoming inference query requests for logging serving statistics | 1 |
Other Parameters
Name | Description | Value |
---|---|---|
serviceAccount.create |
Enable creation of ServiceAccount for pods | true |
serviceAccount.name |
The name of the service account to use. If not set and create is true , a name is generated |
"" |
serviceAccount.automountServiceAccountToken |
Allows auto mount of ServiceAccountToken on the serviceAccount created | true |
serviceAccount.annotations |
Additional custom annotations for the ServiceAccount | {} |
rbac.create |
Whether to create & use RBAC resources or not | false |