IBM Support

DSE 5.0 SSTables with UDTs corrupted after upgrading to DSE 5.1, 6.0, or 6.7

Troubleshooting


Problem

Summary

There are SSTable corruption messages in the Apache cassandra system.log after upgrading from DSE 5.0 to DSE 5.1 or from DSE 5.0 to DSE 6.0/6.7.

Applies to

  • DataStax Enterprise 6.7
  • DataStax Enterprise 6.0
  • DataStax Enterprise 5.1

Symptoms

- In some cases, the corruption prevents DSE from starting up.

- One or more UDTs are part of the table definition for the table whose sstables are corrupted.

- The error messages look like any of the following:

ERROR [CompactionExecutor:6] 2019-03-12 08:33:37,121 CassandraDaemon.java:122 - Exception in thread Thread[CompactionExecutor:6,5,main]
org.apache.cassandra.io.sstable.CorruptSSTableException: \
  Corrupted: /var/lib/cassandra/data/my_keyspace/my_test_table1-924c55872e3a345bb10c12f37c1ba895/mc-24-big-Data.db
        at org.apache.cassandra.io.sstable.SSTableIdentityIterator.hasNext(SSTableIdentityIterator.java:167)
        at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:101)
        at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:33)
        ...
Caused by: java.io.IOException: Corrupt value length 774861946 encountered, as it exceeds the maximum of 268435456, \
  which is set via max_value_size_in_mb in cassandra.yaml
ERROR [CompactionExecutor:2] 2019-02-28 07:24:41,751 StartupDiskErrorHandler.java:41 - \
  Exiting forcefully due to file system exception on startup, disk failure policy "stop"
org.apache.cassandra.io.sstable.CorruptSSTableException: \
  Corrupted: /var/lib/cassandra/data/my_keyspace/my_test_table2-24101c25a2ae3af787c1b40ee1aca33f/mc-33-big-Data.db
        ...
Caused by: java.io.IOException: Corrupt (negative) value length encountered
ERROR [CompactionExecutor:3] 2019-02-05 09:52:48,472 CassandraDaemon.java:119 - Exception in thread Thread[CompactionExecutor:3,5,main]
org.apache.cassandra.io.sstable.CorruptSSTableException: \
  Corrupted: /var/lib/cassandra/data/my_keyspace/my_test_table3-5e7583b5f3f43af19a39b7e1d6f5f11f/mc-49-big-Data.db
        at org.apache.cassandra.io.sstable.SSTableIdentityIterator.hasNext(SSTableIdentityIterator.java:167)
        ...
Caused by: java.io.EOFException: EOF after 876569 bytes out of 1158599

- Doing a typical/normal scrub does not resolve the issue. In later versions of DSE (5.1.14, 6.0.6, 6.7.2), scrub results look something like:

$ sstablescrub -s my_keyspace my_test_table4
Pre-scrub sstables snapshotted into snapshot pre-scrub-1934746623111
INFO sstable /var/lib/cassandra/data/my_keyspace/my_test_table4-96489b7980be3e14a70166a0b9159450/mc-98-big: \
  Column 'col_using_my_udt1' needs to be updated from type 'my_udt1' to 'frozen<my_udt1>'
INFO sstable /var/lib/cassandra/data/my_keyspace/my_test_table4-96489b7980be3e14a70166a0b9159450/mc-98-big: \
  Column 'col_using_my_udt2' needs to be updated from type 'my_udt2' to 'frozen<my_udt2>'
INFO sstable /var/lib/cassandra/data/my_keyspace/my_test_table4-96489b7980be3e14a70166a0b9159450/mc-58-big: \
  Column 'col_using_my_udt3' needs to be updated from type 'my_udt3' to 'frozen<my_udt3>'
INFO Not fixing identified and fixable serialization-header issues.
Unfixed, but fixable errors in serialization-header detected, aborting. \
  Use a non-validating mode ('-e fix' or '-e fix-only') for --header-fix

Cause

In Apache Cassandra 3.0/DSE 5.0, both UDTs and tuples can only be frozen (you are not allowed to create a table column that is of type un-frozen udt/tuple). When Apache Cassandra writes the metadata for your frozen udt data, it is supposed to write an org.apache.cassandra.db.marshal.FrozenType(...) "bracket" (that encloses your user-defined type) in the schema and in the SerializationHeader component of the -Stats.db sstable component; unfortunately, Apache Cassandra 3.0 does not write that FrozenType "bracket" into the SerializationHeader.Component in the -Stats.db sstable component.

As a result, later versions of Apache Cassandra (Cassandra 3.6 and newer), which support both frozen AND non-frozen UDTs (non-frozen UDT support came with CASSANDRA-7423), run into a problem when trying to read the earlier/Apache Cassandra 3.0 sstable data since the absence of the org.apache.cassandra.db.marshal.FrozenType(...) "bracket" in the serialization header signifies a non-frozen UDT but the actual sstable data contains a frozen UDT.

The problem manifests itself in a wide variety of errors/exceptions (such as CorruptSSTableException, EOFException, OutOfMemoryError).

Solution

Upgrade to DSE versions that provide the correct handling of UDTs in the serialization header:

The DSE release notes for these versions identify the issue that is resolved:

  • DSE 5.0 SSTables with UDTs are corrupted in DSE 5.1, DSE 6.0, and DSE 6.7. (DB-2954, Cassandra-15035)

If the DSE 5.0.x schema contains user-defined types (UDTs), the SSTable serialization headers are fixed when DSE is started with DSE 6.0.6 or later.

If issues with UDTs in SSTables exist after upgrade from DSE 5.0.x, then run offline scrub on the SSTables that have or had UDTs that were created in DSE 5.0.x:

$ sstablescrub -e fix-only your_keyspace_name your_table_name

This problem cannot be fixed using online scrub.

Note: This -e fix-only scrub option only fixes the serialization header problems (in the -Stats.db sstable file) caused by UDTs; it does not perform a "normal" scrub (which re-writes the sstable -Data.db file), so if there is other corruption present, apart from the UDT serialization header problem, then run you should a "normal" scrub as well to fix the non-UDT based corruptions.

Here is an example of "sstablescrub -e fix-only" output when run against sstable files that have the UDT serialization problem:

$ sstablescrub -e fix-only my_keyspace my_test_table5
Pre-scrub sstables snapshotted into snapshot pre-scrub-1553549712434
INFO sstable /var/lib/cassandra/data/my_keyspace/my_test_table5-d20c52c2584911e89d93b99c05bcb0d0/mc-1-big: Column 'col_using_my_udt1' needs to be updated from type 'my_udt1' to 'frozen<my_udt1>'
INFO sstable /var/lib/cassandra/data/my_keyspace/my_test_table5-d20c52c2584911e89d93b99c05bcb0d0/mc-1-big: Column 'col_using_my_udt2' needs to be updated from type 'my_udt2' to 'frozen<my_udt2>'
INFO sstable /var/lib/cassandra/data/my_keyspace/my_test_table5-d20c52c2584911e89d93b99c05bcb0d0/mc-1-big: Column 'col_using_my_udt3' needs to be updated from type 'my_udt3' to 'frozen<my_udt3>'
INFO sstable /var/lib/cassandra/data/my_keyspace/my_test_table5-d20c52c2584911e89d93b99c05bcb0d0/mc-1-big: Column 'col_using_my_udt4' needs to be updated from type 'my_udt4' to 'frozen<my_udt4>'
INFO sstable /var/lib/cassandra/data/my_keyspace/my_test_table5-d20c52c2584911e89d93b99c05bcb0d0/mc-2-big: Column 'col_using_my_udt1' needs to be updated from type 'my_udt1' to 'frozen<my_udt1>'
INFO sstable /var/lib/cassandra/data/my_keyspace/my_test_table5-d20c52c2584911e89d93b99c05bcb0d0/mc-2-big: Column 'col_using_my_udt2' needs to be updated from type 'my_udt2' to 'frozen<my_udt2>'
INFO sstable /var/lib/cassandra/data/my_keyspace/my_test_table5-d20c52c2584911e89d93b99c05bcb0d0/mc-2-big: Column 'col_using_my_udt3' needs to be updated from type 'my_udt3' to 'frozen<my_udt3>'
INFO sstable /var/lib/cassandra/data/my_keyspace/my_test_table5-d20c52c2584911e89d93b99c05bcb0d0/mc-2-big: Column 'col_using_my_udt4' needs to be updated from type 'my_udt4' to 'frozen<my_udt4>'
INFO Writing new metadata files
INFO Writing new metadata file /var/lib/cassandra/data/my_keyspace/my_test_table5-d20c52c2584911e89d93b99c05bcb0d0/mc-1-big-Statistics.db
INFO Writing new metadata file /var/lib/cassandra/data/my_keyspace/my_test_table5-d20c52c2584911e89d93b99c05bcb0d0/mc-2-big-Statistics.db
INFO Finished writing new metadata files
Not continuing with scrub, since '--header-fix fix-only' was specified.

As you can see from above, the "-e fix-only" scrub stops scrubbing after fixing the UDT serialization problem (in the -Statistics.db files), it doesn't continue/make any attempt to fix/re-write any of the -Data.db files. To scrub/re-write the -Data.db files, do a "normal" scrub (that is, scrub without the "-e fix-only" option).

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCR56","label":"IBM DataStax Enterprise"},"ARM Category":[{"code":"","label":""}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Historical Number

ka0Ui0000000MVBIA2

Document Information

Modified date:
30 January 2026

UID

ibm17258606