Databases

A How-To for Migrating Elasticsearch to IBM Cloud Databases for Elasticsearch

Share this post:

Migrate your data from Compose to Databases for Elasticsearch

If you’re moving your data over to IBM Cloud Databases for Elasticsearch, you’ll need to take some steps to successfully migrate all of your data. We’ve got you covered. In this post, we’ll show you how to securely migrate your data from Compose to Databases for Elasticsearch using your own object storage.

If you’re an Elasticsearch user, you know that it’s a fantastic database for storing and searching for data quickly and efficiently. If you’re thinking about making the transition into the cloud or from IBM Compose for Elasticsearch/another cloud provider to Databases for Elasticsearch, this article will guide you through the process.

Overall, what the migration process entails is taking snapshots of your current Elasticsearch database and storing those securely in your preferred IBM Cloud Object Storage/S3-compatible object storage bucket, then restoring those in your Databases for Elasticsearch deployment. Don’t worry, we’ll take you step-by-step through the process.

We’re assuming that you want to make this transition with the least downtime possible. All you have to do is bring your database credentials, your own object storage, and make sure your application’s queries will work for Elasticsearch version 6.x.

Let’s get started.

Reindexing Elasticsearch from 2.x to 5.x to 6.x

One of the significant changes that you might have to make to your Elasticsearch 5.x deployment is the updating of indices. While Elasticsearch version 6.x supports indices created in version 5.x, version 6.x indices are not compatible those created in version 2.x. Therefore, if you’ve created indices in version 2.x and imported them into version 5.x, you’ll need to reindex your Elasticsearch indices to version 5.x before switching over to version 6.x. There is a reindex API and a guide that show you how to do that.

Keep in mind that if you’ve got an index called searchguard in your Compose for Elasticsearch deployment, you’ll want to reindex it to a different name. searchguard is a reserved index name in IBM Cloud Databases for Elasticsearch.

Getting started with Databases for Elasticsearch 6.x

With your indices all set for the migration, you’ll need to set up a Databases for Elasticsearch deployment. Once it’s set up, you’ll need to allocate as many resources to the deployment as you have in your Compose deployment. You can do that either through the IBM Cloud dashboard or via the IBM Cloud CLI tool.

From your IBM Cloud dashboard, select the blue Create resource button in the upper, right-hand corner.

That will take you to the catalog where you can type in “elasticsearch,” and select the Databases for Elasticsearch tile.

That will take you to the deployment setup page where you can select a name for the databases and any options you might need for your specific deployment, such as resource allocation (e.g., memory and disk allocation). Once you’ve done that, click Create and your database will begin provisioning.

From the IBM Cloud CLI, you can also create a Databases for Elasticsearch deployment. Log into your IBM Cloud account and then create your deployment using the following command:

ibmcloud service create databases-for-elasticsearch standard <deployment_name>

That’s it!

Once your database has been provisioned, make sure that you generate service credentials for your Databases for Elasticsearch deployment or get the credentials using the IBM Cloud CLI with the cdb plugin (Cloud Databases) as follows:

ibmcloud cdb deployment-connections <your_elasticsearch_deployment_name>

If you don’t know the password for the database, you can either reset the admin password or use the password from the service credentials that can be generated for your database through the IBM Cloud UI. You’ll also need to retrieve and store the CA certificate for the database to a location on your system:

ibmcloud cdb cacert <your_elasticsearch_deployment_name>

After that’s done, we’ll need to set up an IBM Cloud Object Storage or S3 bucket so that we can store and read snapshots of our Elasticsearch databases securely.

Bring your own IBM Cloud Object Storage or S3 bucket

To migrate your Elasticsearch database over to IBM Cloud, we’ll take snapshots of our Elastisearch database and store those in an object storage bucket like IBM Cloud Object Storage (COS) or S3. Then, we’ll restore those snapshots from an object storage bucket into your new Elasticsearch database.

A snapshot is essentially a copy of your databases at a point in time. Depending on how your application uses your database at the time of migration, you could take a single snapshot or multiple snapshots in order to ensure that you’ve captured the most current data that’s been stored in Elasticsearch.

COS and S3 are compatible—you have the flexibility to use either object storage solution. Whichever you choose, you’ll need to generate credentials in the form of an HMAC key (access key ID and secret access key), which Elasticsearch will use when uploading and reading the snapshots.

To get the access key ID and secret access key for your IBM Cloud Object Storage service, select the Service credentials button on the left-hand menu bar from your object storage dashboard. That will take you to another screen where you’ll click the blue New credential button, which will give you a pop-up window. In that window, the most important option is Include HMAC credential. Click on the box next to that and watch the parameters inside the textbox at the bottom of the window change to {"HMAC": true}.

This option will generate the access key ID and secret access key which are required by Elasticsearch so that we can store and read the database snapshots. Once you’ve generated credentials, click on View credentials and you’ll see the keys within the cos_hmac_keys JSON object (note these example credentials have been hidden).

Now that we have all the information we need to start migrating, let’s give you the script.

Migrating to Databases for Elasticsearch

When migrating your Elasticsearch database to Databases for Elasticsearch, you’re essentially creating snapshots of your current database and restoring them into your new deployment. We’ve created a migration repository on Github that provides you with an example called elasticsearch_migration.sh. This file comprises a sequence ofcURL commands we ran to successfully migrate a large Elasticsearch deployment from Compose to Databases for Elasticsearch.

The migration process essentially consists of running thosecURL commands until you’ve migrated all your data over to Databases for Elasticsearch. Below is the sequence of steps for doing that so let’s start.

Setting up environment variables

To make it easier to run the cURL commands, we’ve created a few environment variables that we’ll use in these commands:

  • compose_username – Compose database username
  • compose_password – Compose database password
  • compose_endpoint – Compose database endpoint
  • compose_port – Compose database port number
  • icd_username – Databases for Elasticsearch username
  • icd_password – Databases for Elasticsearch password
  • icd_endppoint – Databases for Elasticsearch endpoint
  • icd_port – Databases for Elasticsearch port number
  • CURL_CA_BUNDLE – the location of your Databases for Elasticsearch CA certificate
  • storage_service_endpoint – COS endpoint
  • bucket_name – COS bucket name that will store your migrations
  • access_key – the access key you got from your COS HMAC credentials
  • secret_key – the secret key you got from your COS HMAC credentials
  • path_to_snapshot_folder – the path insider your COS bucket pointing to your migrations

We have an example of what these might look like in the elasticsearch_migration.sh file in our Github repository. These might look like:

   compose_username=composetestuser
   compose_password=composetestpassword
   compose_endpoint=test.composedb.com
   compose_port=33999
   icd_username=icdtestuser
   icd_password=icdtestpassword
   icd_endpoint=my-es.test.databases.appdomain.cloud
   icd_port=24000
   export CURL_CA_BUNDLE=/path/to/icd/ssl/certificate
   storage_service_endpoint=s3-api.us-geo.objectstorage.service.networklayer.com
   bucket_name=myawesomebucket
   access_key=n9dh89h2189hd12hd
   secret_key=nd0nd021nd012n0dn102nd01n20dn120d
   path_to_snapshot_folder=elastic_search/deployment-1/migration

Note: One of the common mistakes found in the path_to_snapshot_folder variable is a forward slash / in front of the path. Make sure that your path looks like example/path not /example/path, or example/path/.

Once you’ve exported these variables, let’s move on to the next step, which is mounting the Elasticsearch databases to your COS bucket.

Mounting Elasticsearch to COS

The next step you need to do is mount your COS bucket to both databases. Your Compose for Elasticsearch database will need to write to your COS bucket, while your Databases for Elasticsearch deployment will need to read from that bucket. To do that, you’ll need to run two cURL commands; we’ve used the variables created above within them:

   # Mount S3/COS bucket on Compose deployment
   curl -H 'Content-Type: application/json' -sS -XPOST \
   "https://${compose_username}:${compose_password}@${compose_endpoint}:${compose_port}/_snapshot/migration" \
   -d '{
     "type": "s3",
     "settings": {
       "endpoint": "'"${storage_service_endpoint}"'",
       "bucket": "'"${bucket_name}"'",
       "base_path": "'"${path_to_snapshot_folder}"'",
       "access_key": "'"${access_key}"'",
       "secret_key": "'"${secret_key}"'"
     }
   }'

This command will mount your COS bucket to the Compose database. Notice that we’re using Elasticsearch’s _snapshot API and calling the backup migration for demonstration purposes. Next, we’ll need to mount the COS bucket to Databases for Elasticsearch like:

   # Mount S3/COS bucket on Databases for Elasticsearch
   curl -H 'Content-Type: application/json' -sS -XPOST \
   "https://${icd_username}:${icd_password}@${icd_endpoint}:${icd_port}/_snapshot/migration" \
   -d '{
     "type": "s3",
     "settings": {
       "readonly": true,
       "endpoint": "'"${storage_service_endpoint}"'",
       "bucket": "'"${bucket_name}"'",
       "base_path": "'"${path_to_snapshot_folder}"'",
       "access_key": "'"${access_key}"'",
       "secret_key": "'"${secret_key}"'"
     }
   }'

Notice two things here. First, the _snapshot API with the same backup name migration. In addition, we’re setting readonly to true for the API’s settings. This means that Databases for Elasticsearch can only read from the bucket and not write to it. Since we’re restoring the data from Compose to Databases for Elasticsearch, our new deployment only needs to read from the bucket.

The bucket should now be mounted to the databases. So, let’s move onto the next step which is migrating the data.

Data Snapshots and Restores

Your database migration comprises taking snapshots of your Compose database and placing those in your COS bucket. Once the data has been successfully copied to the bucket, your Databases for Elasticsearch will read that data from the bucket and restore it into the database.

We assume that your current Compose for Elasticsearch database is writing new data as you’re taking snapshots. Therefore, you might need to take more than one snapshot and do more than one restore depending on the size of the original database. For smaller datasets, one snapshot/restore cycle might do, but for larger datasets, you’ll need more. For a multi-terabyte, heavy write database, you might need more than four snapshot/restore cycles. Nonetheless, we’ll take you through how you could do multiple snapshot/restores in the following example.

To perform the first snapshot you’d run the following command:

   # Perform 1st snapshot on Compose deployment
   curl -sS -XPUT \
  "https://${compose_username}:${compose_password}@${compose_endpoint}:${compose_port}/_snapshot/migration/snapshot-1?wait_for_completion=true"

Note here that we’ve named the snapshot snapshot-1 because if you’re taking multiple snapshots, each one has to be named differently. Also, you’ll see wait_for_completion=true, which indicates whether the snapshot request should return immediately or wait until the snapshot is finished. If your database is over 50 GB, you can remove this statement or set it to false because the cURL command could time out. Therefore, if you do not use wait_for_completion=true, you can poll the database to find out if the snapshot has completed successfully. Here’s that command:

   snapshot_name=snapshot-1 # replace snapshot name accordingly
   curl -sS -XGET \ 
   "https://${compose_username}:${compose_password}@${compose_endpoint}:${compose_port}/_snapshot/migration/#{snapshot_name}?pretty" 

As you can see above, when doing multiple snapshots on a large database, you’ll need to poll that particular snapshot by name. So in the example above, we have snapshot-1 as the snapshot we’re polling. Once your snapshot is done, you’ll get a response like the following:

{
  "snapshots" : [
    {
      "snapshot" : "snapshot-1",
      "uuid" : "wbashPcyR--zMR6v_Q2MVw",
      "version_id" : 6050299,
      "version" : "6.5.2",
      "indices" : [
        "logs-211998",
        "logs-231998",
        "logs-191998",
        "logs-181998",
        "logs-241998",
        "logs-221998",
        "logs-201998"
      ],
      "include_global_state" : true,
      "state" : "SUCCESS",
      "start_time" : "2019-03-08T18:48:48.857Z",
      "start_time_in_millis" : 1552070928857,
      "end_time" : "2019-03-08T18:49:34.063Z",
      "end_time_in_millis" : 1552070974063,
      "duration_in_millis" : 45206,
      "failures" : [ ],
      "shards" : {
        "total" : 35,
        "failed" : 0,
        "successful" : 35
      }
    }
  ]
}

What you’re looking for here is "state" : "SUCCESS". The first snapshot will take longer than subsequent ones. That’s due to the snapshot gathering all the existing data from your indices and writing them to your COS bucket. Subsequent snapshots will take less time since they are gathering your most recent data that was written since the latest snapshot was taken.

If you take a look at your COS bucket, it will look similar to this:

With the snapshot successfully stored in COS, we’ll now restore it into your Databases for Elasticsearch deployment running:

   curl -H 'Content-Type: application/json' -sS -XPOST \
   "https://${icd_username}:${icd_password}@${icd_endpoint}:${icd_port}/_snapshot/migration/snapshot-1/_restore?wait_for_completion=true"

The command for restoring a snapshot is similar to taking one, except you’ll see that we’re using _restore at the end of the URL which is a command to restore the indices. Again, if you have a large database to restore, remove wait_for_completion=true or set it to false because it might time out. If you’ve kept the wait_for_completion option, you’ll get a SUCCESS returned after the restore has completed. However, if you’ve set it to false or removed the option entirely, you can poll the restore to check if it’s completed using the following command to check the Databases for Elasticsearch cluster’s state:

   curl -sS "https://${icd_username}:${icd_password}@${icd_endpoint}:${icd_port}/_cluster/state?pretty". 

And get the following response when the restore has completed.

...
      {
          "state" : "STARTED",
          "primary" : true,
          "node" : "kN9tN1RvRN2B1JPqaPSUSA",
          "relocating_node" : null,
          "shard" : 4,
          "index" : "logs-201998",
          "allocation_id" : {
            "id" : "iZf9sH_cR7m8RCZNPci82g"
          }
        }
      ]
    }
  },
  "snapshot_deletions" : {
    "snapshot_deletions" : [ ]
  },
  "snapshots" : {
    "snapshots" : [ ]
  },
  "restore" : {
    "snapshots" : [ ]
  }
}

The important part here is:

  "restore" : {
    "snapshots" : [ ]
  }

If that’s empty, then the restore process has finished.

If you’re ready to finalize your migration right here without making more snapshots and restoring them, skip over the rest until you come to the section “Final Snapshot and Restore”, below. If you have a large database with lots of writes, then read on to learn how to continue the snapshot/restore process to decrease the write downtime you’ll need to take for the final snapshot/restore cycle.

As mentioned above, if your database is large and is write-heavy, you’ll need to do this process more than once. The caveat here, however, is that for subsequent snapshots/restores, you’ll need to close the indices of your Databases for Elasticsearch deployment prior to restoring another snapshot. Existing indices can only be restored if they are closed. After they are closed, Elasticsearch will open them back up during the restore process and restore the new data inside the indices.

To close the indices of your Databases for Elasticsearch deployment run the following command:

   curl -sS "https://${icd_username}:${icd_password}@${icd_endpoint}:${icd_port}/_cat/indices/?h=index" | \
   grep -v -e '^searchguard$' | \
    while read index; do
     curl -sS -XPOST "https://${icd_username}:${icd_password}@${icd_endpoint}:${icd_port}/$index/_close"
   done

This command will close all the indices, except for the searchguard index, which is a special index where global security presets are stored. If you have an index named searchguard in your deployment, you’ll need to rename it to avoid errors.

Once the index is closed, create the next snapshot of your database. Again, run the same snapshot command you ran previously, but change the name of the snapshot. For this example, we changed it to snapshot-2.

   curl -sS -XPUT \
   "https://${compose_username}:${compose_password}@${compose_endpoint}:${compose_port}/_snapshot/migration/snapshot-2?wait_for_completion=true"

Wait until the snapshot has finished before attempting to restore the data if you have the wait_for_completion option set to false or removed it entirely. Once the snapshot has completed, you can restore the data into Databases for Elasticsearch. You’ll notice that subsequent snapshots will take less time than the first.

Since the indices are already closed, you can restore the data like:

   curl -H 'Content-Type: application/json' -sS -XPOST \
   "https://${icd_username}:${icd_password}@${icd_endpoint}:${icd_port}/_snapshot/migration/snapshot-2/_restore?wait_for_completion=true" 

After that’s done, just close the indices on the Databases for Elasticsearch deployment again:

   curl -sS "https://${icd_username}:${icd_password}@${icd_endpoint}:${icd_port}/_cat/indices/?h=index" | \
   grep -v -e '^searchguard$' | \
   while read index; do
     curl -sS -XPOST "https://${icd_username}:${icd_password}@${icd_endpoint}:${icd_port}/$index/_close"
   done

Continue with this snapshot/restore cycle until you’re ready to make the final snapshot/restore of your database.

Final Snapshot and Restore

Determining when to make the final snapshot is entirely up to you. For the final snapshot and restore, you’ll have to stop all writes to your Compose database so that the final snapshot contains all the data that’s been freshly written since the last snapshot/restore cycle. Therefore, you will have to determine an acceptable amount of time you’re prepared not to accept writes to your database to finish the cycle.

Consider this scenario to help you determine when to execute the final snapshot/restore. Assuming you’re in the migration process at the point where you’re in the snapshot/restore process. At this time, you’re not anticipating any heavy write spikes so we can safely assume that each snapshot/restore will be faster than the previous one. By timing how long your last snapshot/restore cycle takes to complete, you could roughly determine the time it might take to complete the next one. Once you feel that it’s an acceptable amount of time to stop all writes, proceed with the final snapshot/restore.

Once you settle on a timeframe to make the last snapshot, you’ll need to remember to close all indices to your Databases for Elasticsearch deployment. Again, the command to close all indices, except searchguard, is the following:

   curl -sS "https://${icd_username}:${icd_password}@${icd_endpoint}:${icd_port}/_cat/indices/?h=index" | \
   grep -v -e '^searchguard$' | \
   while read index; do
     curl -sS -XPOST "https://${icd_username}:${icd_password}@${icd_endpoint}:${icd_port}/$index/_close"
   done

Now, stop all writes to your Compose for Elasticsearch database. After that, create the final snapshot.

   curl -sS -XPUT \
   "https://${compose_username}:${compose_password}@${compose_endpoint}:${compose_port}/_snapshot/migration/snapshot-3?wait_for_completion=true"

Once that’s done, perform the last restore to Databases for Elasticsearch:

   curl -H 'Content-Type: application/json' -sS -XPOST \
   "https://${icd_username}:${icd_password}@${icd_endpoint}:${icd_port}/_snapshot/migration/snapshot-3/_restore?wait_for_completion=true" 

Then, open all the indices on your Databases for Elasticsearch deployment:

   curl -sS -XPOST "https://${icd_username}:${icd_password}@${icd_endpoint}:${icd_port}/_all/_open"

Finally, change over the connection strings in your application to point to Databases for Elasticsearch to start writing new data.

Stay tuned for more info on migrating your databases from Compose to IBM Cloud Databases

We’ve taken you through the migration process to migrate your Compose for Elasticsearch Database over to Databases for Elasticsearch. In our migration Github repository, we have an example of how we migrated one of our Elasticsearch deployments from Compose to Databases for Elasticsear. In that repository, we’ve created four snapshots/restores, but that doesn’t mean you need four snapshots/restore cycles; you might need more or less depending on how large your database is, and how many writes you have to your Compose database throughout the snapshot/restore process. Nonetheless, by following the step in this article, you’ll get a general overview of how to start the migration process to begin using Databases for Elasticsearch.

This quick guide is part of a series of articles that explains how to migrate your databases from Compose to IBM Cloud Databases—the latest generation of open-source databases on IBM Cloud. We examined Elasticsearch in this article, but if you’re looking to migrate other databases—like Redis—we’ve got you covered in a previous article, too.

If you still have questions, reach out—our support team is always ready to lend a hand.

You can consult our Github repository Cloud Databases Migration Examples for more migration examples to come.

Related

Developer Advocate - IBM Cloud

More Databases stories
March 19, 2019

Cloudant Best (and Worst) Practices — Part 2

As outlined in "Cloudant Best (and Worst) Practices—Part 1," I’ve had the unique opportunity to see IBM Cloudant from all angles—the customers who use it, the engineers that run it, and the folks who support and sell it—and I'm here to summarize the best—and worst! — practices we see most often in the field.

Continue reading

March 18, 2019

Cloudant Best (and Worst) Practices — Part 1

As providers of the Cloudant database service, we'd like to summarize the best—and worst!—practices we see most often in the field. This is Part 1 of a two-part post.

Continue reading

February 27, 2019

IBM Cloud Databases for MongoDB is Generally Available

IBM Cloud is announcing the General Availability of IBM Cloud Databases for MongoDB in Dallas, Frankfurt, London, Oslo, Sydney, Tokyo, and Washington D.C.

Continue reading