Cassandra 什么是墓碑
什么是墓碑?
- 使用CQL DELETE语句
- 使用生存时间(TTL)过期的数据
- 使用内部操作,例如使用实例化视图
- 带有
null
值的INSERT或UPDATE操作 - 具有集合列的UPDATE操作
创建逻辑删除后,可以将其标记在分区的不同部分。根据标记的位置,墓碑可以分为以下组之一。每个类别通常对应一种唯一类型的数据删除操作。
逻辑删除通过写路径,并被写入一个或多个节点上的SSTables中。逻辑删除的一个关键区别是由gc_grace_seconds设置的内置有效期,称为宽限期。在其有效期结束时,该墓碑将作为常规压实过程的一部分被删除。
表中的逻辑删除过多可能会对应用程序性能产生负面影响。许多墓碑通常指示数据模型或应用程序中的潜在问题。
创建键空间和表
在以下示例中,cycling
键空间用于说明不同的逻辑删除类别。使用了两个表: rank_by_year_and_cycling_name
和 cyclist_career_teams
。
cqlsh
和CQL命令,因此建议使用两个不同的终端。或者,使用一个终端cqlsh
并使用DataStax Studio发出CQL命令 。
在开始之前,将以下命令复制到cqlsh
提示中以创建cycling
键空间,创建两个表并将数据插入 rank_by_year_and_cycling_name
表中。
您稍后将数据插入到“ 单元格逻辑删除”和“ TTL逻辑删除”中的cyclist_career_teams
表中。
CREATE KEYSPACE cycling WITH replication = {‘class‘: ‘SimpleStrategy‘, ‘replication_factor‘: ‘1‘} AND durable_writes = true; CREATE TABLE cycling.rank_by_year_and_name ( race_year int, race_name text, rank int, cyclist_name text, PRIMARY KEY ((race_year, race_name), rank) ) WITH CLUSTERING ORDER BY (rank ASC); INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2015, ‘Tour of Japan - Stage 4 - Minami > Shinshu‘, ‘Benjamin PRADES‘, 1); INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2015, ‘Tour of Japan - Stage 4 - Minami > Shinshu‘, ‘Adam PHELAN‘, 2); INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2015, ‘Tour of Japan - Stage 4 - Minami > Shinshu‘, ‘Thomas LEBAS‘, 3); INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2015, ‘Giro d‘‘Italia - Stage 11 - Forli > Imola‘, ‘Ilnur ZAKARIN‘, 1); INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2015, ‘Giro d‘‘Italia - Stage 11 - Forli > Imola‘, ‘Carlos BETANCUR‘, 2); INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2014, ‘4th Tour of Beijing‘, ‘Phillippe GILBERT‘, 1); INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2014, ‘4th Tour of Beijing‘, ‘Daniel MARTIN‘, 2); INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2014, ‘4th Tour of Beijing‘, ‘Johan Esteban CHAVES‘, 3); CREATE TABLE cycling.cyclist_career_teams ( id UUID PRIMARY KEY, lastname text, teams set<text> );
冲洗到SSTables
在对表进行每次修改之后,请nodetool flush
在cycling
键空间上运行命令以将 数据从内存表刷新到磁盘上的SSTables。在运行sstabledump
以查看输出之前,必须执行此步骤。
nodetool flush cycling;
刷新cycling
键空间后,sstabledump
在SSTable上运行命令,如以下示例所示。
cd / var / lib / cassandra / data / cycling / rank_by_year_and_name-bc05fba12baf11e8b4a8ad2b042f3e18 sstabledump mc-2-big-Data.db
注:该sstabledump工具是Apache的Cassandra ™3.0,DDAC,DSE 5.0及更高版本。对于以前的版本,请改用sstable2json实用程序。
分区墓碑
当明确删除整个分区时,将生成分区逻辑删除。在CQL DELETE语句中,WHERE子句是针对分区键的相等条件。
DELETE from cycling.rank_by_year_and_name WHERE race_year = 2014 AND race_name = ‘4th Tour of Beijing‘;
查看此分区的sstabledump输出,deletion_info
逻辑删除标记在分区级别,并且与分区内的任何行或单元都不相关。
{ "partition" : { "key" : [ "2014", "4th Tour of Beijing" ], "position" : 0, "deletion_info" : { "marked_deleted" : "2018-05-16T19:40:06.454282Z", "local_delete_time" : "2018-05-16T19:40:06Z" } }, "rows" : [ ] }
行墓碑
当明确删除分区中的特定行时,将生成行逻辑删除。该模式具有一个复合主键,该主键同时包含分区键和集群键。在CQL DELETE语句中,WHERE子句是针对分区键列和集群键列的相等条件。
DELETE from cycling.rank_by_year_and_name WHERE race_year = 2015 AND race_name = ‘Giro d‘‘Italia - Stage 11 - Forli > Imola‘ AND rank = 2;
查看此分区的sstabledump输出,deletion_info
逻辑删除标记在行级别,并由该分区下的聚类键标识。分区和行单元格均不包含墓碑标记。
{ "partition" : { "key" : [ "2015", "Giro d‘Italia - Stage 11 - Forli > Imola" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 74, "clustering" : [ 2 ], "deletion_info" : { "marked_deleted" : "2018-05-18T15:29:06.227148Z", "local_delete_time" : "2018-05-18T15:29:06Z" }, "cells" : [ ] } ] }
范围墓碑
当可以通过范围搜索表示的分区中的几行被明确删除时,就会发生范围逻辑删除。该架构具有一个复合主键,该主键同时包含分区键和集群键。在CQL DELETE语句中,WHERE子句是针对分区键的相等条件,加上针对聚类键的不相等条件。
rank_by_year_and_name
表格,然后重新创建表格以使用必要的数据填充表格。DELETE from cycling.rank_by_year_and_name WHERE race_year = 2015 AND race_name = ‘Tour of Japan - Stage 4 - Minami > Shinshu‘ AND rank > 1;
查看此分区的sstabledump输出,deletion_info
逻辑删除标记在行级别。特殊的边界标记标记 range_tombstone_bound
已删除行的范围范围(由聚类键值标识)。
{ "partition" : { "key" : [ "2015", "Tour of Japan - Stage 4 - Minami > Shinshu" ], "position" : 252 }, "rows" : [ { "type" : "range_tombstone_bound", "start" : { "type" : "inclusive", "deletion_info" : { "marked_deleted" : "2018-05-18T16:09:21.474713Z", "local_delete_time" : "2018-05-18T16:09:21Z" } } }, { "type" : "range_tombstone_bound", "end" : { "type" : "exclusive", "clustering" : [ 1 ], "deletion_info" : { "marked_deleted" : "2018-05-18T16:09:21.474713Z", "local_delete_time" : "2018-05-18T16:09:21Z" } } } ] }
ComplexColumn墓碑
当插入或更新集合类型列(例如集合,列表和映射)时,将生成ComplexColumn逻辑删除。
先前我们创建了 cyclist_career_teams
表格。运行以下cqlsh
命令以将数据插入该表。
INSERT INTO cycling.cyclist_career_teams ( id, lastname, teams) VALUES (cb07baad-eac8-4f65-b28a-bddc06a0de23, ‘ARMITSTEAD‘, { ‘Boels-Dolmans Cycling Team‘,‘AA Drink - Leontien.nl‘,‘Team Garmin - Cervelo‘ } );
查看此分区的sstabledump输出,在该分区上没有发生明显的手动删除,但是deletion_info
在单元格级别上为collection type column列出了一个标记teams
。
{ "partition" : { "key" : [ "cb07baad-eac8-4f65-b28a-bddc06a0de23" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 130, "liveness_info" : { "tstamp" : "2018-05-18T16:26:23.779724Z" }, "cells" : [ { "name" : "lastname", "value" : "ARMITSTEAD" }, { "name" : "teams", "deletion_info" : { "marked_deleted" : "2018-05-18T16:26:23.779723Z", "local_delete_time" : "2018-05-18T16:26:23Z" } }, { "name" : "teams", "path" : [ "AA Drink - Leontien.nl" ], "value" : "" }, { "name" : "teams", "path" : [ "Boels-Dolmans Cycling Team" ], "value" : "" }, { "name" : "teams", "path" : [ "Team Garmin - Cervelo" ], "value" : "" } ] } ] }
ComplexColumn墓碑
当插入或更新集合类型列(例如集合,列表和映射)时,将生成ComplexColumn逻辑删除。
先前我们创建了 cyclist_career_teams
表格。运行以下cqlsh
命令以将数据插入该表。
INSERT INTO cycling.cyclist_career_teams ( id, lastname, teams) VALUES (cb07baad-eac8-4f65-b28a-bddc06a0de23, ‘ARMITSTEAD‘, { ‘Boels-Dolmans Cycling Team‘,‘AA Drink - Leontien.nl‘,‘Team Garmin - Cervelo‘ } );
查看此分区的sstabledump输出,在该分区上没有发生明显的手动删除,但是deletion_info
在单元格级别上为collection type column列出了一个标记teams
。
{ "partition" : { "key" : [ "cb07baad-eac8-4f65-b28a-bddc06a0de23" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 130, "liveness_info" : { "tstamp" : "2018-05-18T16:26:23.779724Z" }, "cells" : [ { "name" : "lastname", "value" : "ARMITSTEAD" }, { "name" : "teams", "deletion_info" : { "marked_deleted" : "2018-05-18T16:26:23.779723Z", "local_delete_time" : "2018-05-18T16:26:23Z" } }, { "name" : "teams", "path" : [ "AA Drink - Leontien.nl" ], "value" : "" }, { "name" : "teams", "path" : [ "Boels-Dolmans Cycling Team" ], "value" : "" }, { "name" : "teams", "path" : [ "Team Garmin - Cervelo" ], "value" : "" } ] } ] }
单元格墓碑
如null
以下示例所示,当从单元格中明确删除一个值(例如分区的特定行的列)时,或在使用值插入或更新单元格时,会生成单元格逻辑删除 。
INSERT INTO cycling.rank_by_year_and_name ( race_year, race_name, cyclist_name, rank) VALUES (2018, ‘Giro d‘‘Italia - Stage 11 - Osimo > Imola‘, null, 1);
查看此分区的“ sstabledump”输出,deletion_info
逻辑删除标记与特定的单元关联。
{ "partition" : { "key" : [ "2018", "Giro d‘Italia - Stage 11 - Osimo > Imola" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 80, "clustering" : [ 1 ], "liveness_info" : { "tstamp" : "2018-05-18T17:13:42.602827Z" }, "cells" : [ { "name" : "cyclist_name", "deletion_info" : { "local_delete_time" : "2018-05-18T17:13:42Z" } } ] } ] }
TTL墓碑
当TTL(生存时间)期满时,将生成TTL逻辑删除。TTL过期标记可以出现在行或单元格级别。但是,Cassandra标记的TTL数据与显式删除的逻辑删除数据不同。即使分区只有一行(没有聚类键),TTL标记仍会在行级别进行。
以下语句为整个行设置TTL。
INSERT INTO cycling.cyclist_career_teams ( id, lastname, teams) VALUES (e7cd5752-bc0d-4157-a80f-7523add8dbcd, ‘VAN DER BREGGEN‘, { ‘Rabobank-Liv Woman Cycling Team‘,‘Sengers Ladies Cycling Team‘,‘Team Flexpoint‘ }) USING TTL 1;
以下语句为单个单元格设置TTL。
UPDATE cycling.rank_by_year_and_name USING TTL 1 SET cyclist_name = ‘Cloudy Archipelago‘ WHERE race_year = 2018 AND race_name = ‘Giro d‘‘Italia - Stage 11 - Osimo > Imola‘ AND rank = 1;
查看这些分区的sstabledump输出,第一个CQL语句"expired" : true
在该liveness_info
部分中使用TTL过期标记标记该行(分区键:e7cd5752-bc0d-4157-a80f-7523add8dbcd)。
{ "partition" : { "key" : [ "e7cd5752-bc0d-4157-a80f-7523add8dbcd" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 134, "liveness_info" : { "tstamp" : "2018-05-18T17:38:13.135226Z", "ttl" : 1, "expires_at" : "2018-05-18T17:38:14Z", "expired" : true }, "cells" : [ { "name" : "lastname", "value" : "VAN DER BREGGEN" }, { "name" : "teams", "deletion_info" : { "marked_deleted" : "2018-05-18T17:38:13.135225Z", "local_delete_time" : "2018-05-18T17:38:13Z" } }, { "name" : "teams", "path" : [ "Rabobank-Liv Woman Cycling Team" ], "value" : "" }, { "name" : "teams", "path" : [ "Sengers Ladies Cycling Team" ], "value" : "" }, { "name" : "teams", "path" : [ "Team Flexpoint" ], "value" : "" } ] } ] }
第二个CQL语句使用该单元格的"expired" : true
TTL过期标记标记该单元格(分区键:2018,聚类键:1,列名:cyclist_name)。
{ "partition" : { "key" : [ "2018", "Giro d‘Italia - Stage 11 - Osimo > Imola" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 95, "clustering" : [ 1 ], "cells" : [ { "name" : "cyclist_name", "value" : "Cloudy Archipelago", "tstamp" : "2018-05-18T18:22:52.532855Z", "ttl" : 1, "expires_at" : "2018-05-18T18:22:53Z", "expired" : true } ] } ] }