Skip to content

Commit

Permalink
[Doc] Add Shared-data Deployment doc on HDFS (backport #50468) (#50552)
Browse files Browse the repository at this point in the history
Co-authored-by: 絵空事スピリット <[email protected]>
  • Loading branch information
mergify[bot] and EsoragotoSpirit authored Sep 4, 2024
1 parent 1cef25f commit a4a474c
Show file tree
Hide file tree
Showing 7 changed files with 377 additions and 76 deletions.
1 change: 1 addition & 0 deletions docs/docusaurus/sidebars.json
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@
"deployment/shared_data/gcs",
"deployment/shared_data/azure",
"deployment/shared_data/minio",
"deployment/shared_data/hdfs",
"deployment/shared_data/feature-support-shared-data"
]
},
Expand Down
29 changes: 15 additions & 14 deletions docs/en/deployment/shared_data/azure.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ import SharedDataUse from '../../_assets/commonMarkdown/sharedDataUse.md'

The deployment of a shared-data StarRocks cluster is similar to that of a shared-nothing StarRocks cluster. The only difference is that you need to deploy CNs instead of BEs in a shared-data cluster. This section only lists the extra FE and CN configuration items you need to add in the configuration files of FE and CN **fe.conf** and **cn.conf** when you deploy a shared-data StarRocks cluster. For detailed instructions on deploying a StarRocks cluster, see [Deploy StarRocks](../../deployment/deploy_manually.md).

> **Note**
> **NOTE**
>
> Do not start the cluster until after it is configured for shared-storage in the next section of this document.
Expand All @@ -29,8 +29,9 @@ Before starting the cluster configure the FEs and CNs. An example configuration

### Example FE configuration for Azure Blob Storage

The example shared-data additions for your `fe.conf` can be added to the `fe.conf` file on each
of your FE nodes.
The example shared-data additions for your `fe.conf` can be added to the `fe.conf` file on each of your FE nodes.

- If you use the shared key to access Azure Blob Storage, add the following configuration items:

```Properties
run_mode = shared_data
Expand Down Expand Up @@ -73,13 +74,12 @@ of your FE nodes.
The running mode of the StarRocks cluster. Valid values:

- `shared_data`
- `shared_nothing` (Default).
- `shared_nothing` (Default)

> **Note**
>
> You cannot adopt the `shared_data` and `shared_nothing` modes simultaneously for a StarRocks cluster. Mixed deployment is not supported.
> **NOTE**
>
> Do not change `run_mode` after the cluster is deployed. Otherwise, the cluster fails to restart. The transformation from a shared-nothing cluster to a shared-data cluster or vice versa is not supported.
> - You cannot adopt the `shared_data` and `shared_nothing` modes simultaneously for a StarRocks cluster. Mixed deployment is not supported.
> - Do not change `run_mode` after the cluster is deployed. Otherwise, the cluster fails to restart. The transformation from a shared-nothing cluster to a shared-data cluster or vice versa is not supported.
#### cloud_native_meta_port

Expand All @@ -105,13 +105,14 @@ Supported from v3.1.0.
The type of object storage you use. In shared-data mode, StarRocks supports storing data in Azure Blob (supported from v3.1.1 onwards), and object storages that are compatible with the S3 protocol (such as AWS S3, Google GCP, and MinIO). Valid value:

- `S3` (Default)
- `AZBLOB`.
- `AZBLOB`
- `HDFS`

> Note
>
> If you specify this parameter as `S3`, you must add the parameters prefixed by `aws_s3`.
> **NOTE**
>
> If you specify this parameter as `AZBLOB`, you must add the parameters prefixed by `azure_blob`.
> - If you specify this parameter as `S3`, you must add the parameters prefixed by `aws_s3`.
> - If you specify this parameter as `AZBLOB`, you must add the parameters prefixed by `azure_blob`.
> - If you specify this parameter as `HDFS`, you must add the parameter `cloud_native_hdfs_url`.
#### azure_blob_path

Expand All @@ -129,7 +130,7 @@ The Shared Key used to authorize requests for your Azure Blob Storage.

The shared access signatures (SAS) used to authorize requests for your Azure Blob Storage.

> **Note**
> **NOTE**
>
> Only credential-related configuration items can be modified after your shared-data StarRocks cluster is created. If you changed the original storage path-related configuration items, the databases and tables you created before the change become read-only, and you cannot load data into them.
Expand Down
57 changes: 30 additions & 27 deletions docs/en/deployment/shared_data/gcs.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
displayed_sidebar: docs
---

# Deploy StarRocks using GCS
# Use GCS for shared-data

import SharedDataIntro from '../../_assets/commonMarkdown/sharedDataIntro.md'
import SharedDataCNconf from '../../_assets/commonMarkdown/sharedDataCNconf.md'
Expand All @@ -19,7 +19,7 @@ import SharedDataUse from '../../_assets/commonMarkdown/sharedDataUse.md'

The deployment of a shared-data StarRocks cluster is similar to that of a shared-nothing StarRocks cluster. The only difference is that you need to deploy CNs instead of BEs in a shared-data cluster. This section only lists the extra FE and CN configuration items you need to add in the configuration files of FE and CN **fe.conf** and **cn.conf** when you deploy a shared-data StarRocks cluster. For detailed instructions on deploying a StarRocks cluster, see [Deploy StarRocks](../../deployment/deploy_manually.md).

> **Note**
> **NOTE**
>
> Do not start the cluster until after it is configured for shared-storage in the next section of this document.
Expand All @@ -29,28 +29,25 @@ Before starting the cluster configure the FEs and CNs. An example configuration

### Example FE configuration for GCS

The example shared-data additions for your `fe.conf` can be added to the `fe.conf` file on each
of your FE nodes. Because GCS storage is accessed using the
[Cloud Storage XML API](https://cloud.google.com/storage/docs/xml-api/overview), the parameters
use the prefix `aws_s3`.
The example shared-data additions for your `fe.conf` can be added to the `fe.conf` file on each of your FE nodes. Because GCS storage is accessed using the [Cloud Storage XML API](https://cloud.google.com/storage/docs/xml-api/overview), the parameters use the prefix `aws_s3`.

```Properties
run_mode = shared_data
cloud_native_meta_port = <meta_port>
cloud_native_storage_type = S3
```Properties
run_mode = shared_data
cloud_native_meta_port = <meta_port>
cloud_native_storage_type = S3

# For example, testbucket/subpath
aws_s3_path = <s3_path>
# For example, testbucket/subpath
aws_s3_path = <s3_path>

# For example: us-east1
aws_s3_region = <region>
# For example: us-east1
aws_s3_region = <region>

# For example: https://storage.googleapis.com
aws_s3_endpoint = <endpoint_url>
# For example: https://storage.googleapis.com
aws_s3_endpoint = <endpoint_url>

aws_s3_access_key = <HMAC access_key>
aws_s3_secret_key = <HMAC secret_key>
```
aws_s3_access_key = <HMAC access_key>
aws_s3_secret_key = <HMAC secret_key>
```

### All FE parameters related to shared-storage with GCS

Expand All @@ -59,13 +56,12 @@ use the prefix `aws_s3`.
The running mode of the StarRocks cluster. Valid values:

- `shared_data`
- `shared_nothing` (Default).
- `shared_nothing` (Default)

> **Note**
> **NOTE**
>
> You cannot adopt the `shared_data` and `shared_nothing` modes simultaneously for a StarRocks cluster. Mixed deployment is not supported.
>
> Do not change `run_mode` after the cluster is deployed. Otherwise, the cluster fails to restart. The transformation from a shared-nothing cluster to a shared-data cluster or vice versa is not supported.
> - You cannot adopt the `shared_data` and `shared_nothing` modes simultaneously for a StarRocks cluster. Mixed deployment is not supported.
> - Do not change `run_mode` after the cluster is deployed. Otherwise, the cluster fails to restart. The transformation from a shared-nothing cluster to a shared-data cluster or vice versa is not supported.
#### cloud_native_meta_port

Expand All @@ -91,7 +87,14 @@ Supported from v3.1.0.
The type of object storage you use. In shared-data mode, StarRocks supports storing data in Azure Blob (supported from v3.1.1 onwards), and object storages that are compatible with the S3 protocol (such as AWS S3, Google GCS, and MinIO). Valid value:

- `S3` (Default)
- `AZBLOB`.
- `AZBLOB`
- `HDFS`

> **NOTE**
>
> - If you specify this parameter as `S3`, you must add the parameters prefixed by `aws_s3`.
> - If you specify this parameter as `AZBLOB`, you must add the parameters prefixed by `azure_blob`.
> - If you specify this parameter as `HDFS`, you must add the parameter `cloud_native_hdfs_url`.
#### aws_s3_path

Expand All @@ -110,7 +113,7 @@ The region in which your S3 bucket resides, for example, `us-west-2`.
Whether to use Instance Profile and Assumed Role as credential methods for accessing GCS. Valid values:

- `true`
- `false` (Default).
- `false` (Default)

If you use IAM user-based credential (Access Key and Secret Key) to access GCS, you must specify this item as `false`, and specify `aws_s3_access_key` and `aws_s3_secret_key`.

Expand All @@ -136,7 +139,7 @@ The ARN of the IAM role that has privileges on your GCS bucket in which your dat

The external ID of the AWS account that is used for cross-account access to your GCS bucket.

> **Note**
> **NOTE**
>
> Only credential-related configuration items can be modified after your shared-data StarRocks cluster is created. If you changed the original storage path-related configuration items, the databases and tables you created before the change become read-only, and you cannot load data into them.
Expand Down
123 changes: 121 additions & 2 deletions docs/en/deployment/shared_data/hdfs.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,124 @@
---
unlisted: true
displayed_sidebar: docs
---

This document only exists in the Chinese docs
# Use HDFS for shared-data

import SharedDataIntro from '../../_assets/commonMarkdown/sharedDataIntro.md'
import SharedDataCNconf from '../../_assets/commonMarkdown/sharedDataCNconf.md'
import SharedDataUseIntro from '../../_assets/commonMarkdown/sharedDataUseIntro.md'
import SharedDataUse from '../../_assets/commonMarkdown/sharedDataUse.md'

<SharedDataIntro />

## Architecture

![Shared-data Architecture](../../_assets/share_data_arch.png)

## Deploy a shared-data StarRocks cluster

The deployment of a shared-data StarRocks cluster is similar to that of a shared-nothing StarRocks cluster. The only difference is that you need to deploy CNs instead of BEs in a shared-data cluster. This section only lists the extra FE and CN configuration items you need to add in the configuration files of FE and CN **fe.conf** and **cn.conf** when you deploy a shared-data StarRocks cluster. For detailed instructions on deploying a StarRocks cluster, see [Deploy StarRocks](../../deployment/deploy_manually.md).

> **NOTE**
>
> Do not start the cluster until after it is configured for shared-storage in the next section of this document.
## Configure FE nodes for shared-data StarRocks

Before starting FEs, add the following configuration items in the FE configuration file **fe.conf**.

### Example FE configurations for HDFS

These are example shared-data additions for your `fe.conf` file on each of your FE nodes.

```Properties
run_mode = shared_data
cloud_native_meta_port = <meta_port>
cloud_native_storage_type = HDFS

# Example: hdfs://127.0.0.1:9000/user/starrocks/
cloud_native_hdfs_url = <hdfs_url>
```

### All FE parameters related to shared-storage with HDFS

#### run_mode

The running mode of the StarRocks cluster. Valid values:

- `shared_data`
- `shared_nothing` (Default)

> **NOTE**
>
> - You cannot adopt the `shared_data` and `shared_nothing` modes simultaneously for a StarRocks cluster. Mixed deployment is not supported.
> - Do not change `run_mode` after the cluster is deployed. Otherwise, the cluster fails to restart. The transformation from a shared-nothing cluster to a shared-data cluster or vice versa is not supported.
#### cloud_native_meta_port

The cloud-native meta service RPC port.

- Default: `6090`

#### enable_load_volume_from_conf

Whether to allow StarRocks to create the default storage volume by using the object storage-related properties specified in the FE configuration file. Valid values:

- `true` (Default) If you specify this item as `true` when creating a new shared-data cluster, StarRocks creates the built-in storage volume `builtin_storage_volume` using the object storage-related properties in the FE configuration file, and sets it as the default storage volume. However, if you have not specified the object storage-related properties, StarRocks fails to start.
- `false` If you specify this item as `false` when creating a new shared-data cluster, StarRocks starts directly without creating the built-in storage volume. You must manually create a storage volume and set it as the default storage volume before creating any object in StarRocks. For more information, see [Create the default storage volume](#use-your-shared-data-starrocks-cluster).

Supported from v3.1.0.

> **CAUTION**
>
> We strongly recommend you leave this item as `true` while you are upgrading an existing shared-data cluster from v3.0. If you specify this item as `false`, the databases and tables you created before the upgrade become read-only, and you cannot load data into them.
#### cloud_native_storage_type

The type of object storage you use. In shared-data mode, StarRocks supports storing data in Azure Blob (supported from v3.1.1 onwards), and object storages that are compatible with the S3 protocol (such as AWS S3, Google GCP, and MinIO). Valid value:

- `S3` (Default)
- `AZBLOB`
- `HDFS`

> **NOTE**
>
> - If you specify this parameter as `S3`, you must add the parameters prefixed by `aws_s3`.
> - If you specify this parameter as `AZBLOB`, you must add the parameters prefixed by `azure_blob`.
> - If you specify this parameter as `HDFS`, you must add the parameter `cloud_native_hdfs_url`.
#### cloud_native_hdfs_url

The URL of your HDFS storage, for example, `hdfs://127.0.0.1:9000/user/xxx/starrocks/`.

> **NOTE**
>
> Only credential-related configuration items can be modified after your shared-data StarRocks cluster is created. If you changed the original storage path-related configuration items, the databases and tables you created before the change become read-only, and you cannot load data into them.
If you want to create the default storage volume manually after the cluster is created, you only need to add the following configuration items:

```Properties
run_mode = shared_data
cloud_native_meta_port = <meta_port>
enable_load_volume_from_conf = false
```

## Configure CN nodes for shared-data StarRocks

<SharedDataCNconf />

## Use your shared-data StarRocks cluster

<SharedDataUseIntro />

The following example creates a storage volume `def_volume` for an HDFS storage, enables the storage volume, and sets it as the default storage volume:

```SQL
CREATE STORAGE VOLUME def_volume
TYPE = HDFS
LOCATIONS = ("hdfs://127.0.0.1:9000/user/starrocks/");

SET def_volume AS DEFAULT STORAGE VOLUME;
```

<SharedDataUse />
Loading

0 comments on commit a4a474c

Please sign in to comment.