Most Popular
Recently Added
Recently Updated

SymmetricDS Purge Process in 2.x

There are three purge jobs that run in the background:

  • Outgoing Purge
  • Incoming Purge
  • Data Gap Purge

The automatic startup of these jobs are all controlled by the same parameter: start.purge.job.

Outgoing Purge

Outgoing purge deletes captured and routed data from sym_data, sym_data_event, and sym_outgoing_batch that has already been delivered and is older than the purge.retention.minutes.

First, stranded batches are updated. If a node is no longer enabled and there is data queued up for it, then those batches are marked as OK so the subsequent purge will clean up the captured data:

update sym_outgoing_batch set status='OK' where node_id not in (select node_id from sym_node where sync_enabled=1) and status != 'OK'

Next, sym_data is purged. In order to purge sym_data we first query the range of data_ids that need purged:

select min(data_id), max(data_id) from sym_data where data_id < (select max(data_id) from sym_data)

Next, we delete from sym_data. The following SQL is run multiple times for a sub-range that is specified by job.purge.max.num.data.to.delete.in.tx until the entire range is purged.

delete from sym_data where data_id between :MIN and :MAX and create_time < :CUTOFF_TIME and data_id in (select e.data_id from sym_data_event e where e.data_id between :MIN and :MAX) and data_id not in (select e.data_id from sym_data_event e where e.data_id between :MIN and :MAX and (e.data_id is null or e.batch_id in (select batch_id from sym_outgoing_batch where status not in ('OK','IG'))))

Next, we delete sym_data_event and sym_outgoing_batch. Both of these tables are keyed by batch_id.

We get the range of batch_ids to purge:

select min(batch_id), max(batch_id) from sym_outgoing_batch where create_time < ? and status in ('OK','IG') and batch_id < (select max(batch_id) from sym_outgoing_batch)

Then, we delete from sym_data_event by batch_id. The following SQL is run multiple times for a sub-range that is specified by job.purge.max.num.data.events.to.delete.in.tx until the entire range is purged.

delete from sym_data_event where batch_id not in (select batch_id from sym_outgoing_batch where batch_id between :MIN and :MAX and status not in ('OK','IG')) and batch_id between :MIN and :MAX

Finally, we delete from sym_outgoing_batch. The following SQL is run multiple times for a sub-range that is specified by job.purge.max.num.batches.to.delete.in.tx until the entire range is purged.

delete from sym_outgoing_batch where status in ('OK','IG') and batch_id between :MIN and :MAX and batch_id not in (select batch_id from sym_data_event where batch_id between :MIN and :MAX)

Incoming Purge

Incoming purge deletes from sym_incoming_batch that is older than the purge.retention.minutes.

First, a range is retrieved:

select node_id, min(batch_id), max(batch_id) from sym_incoming_batch where create_time < ? and status = 'OK' group by node_id

Then, we delete from sym_incoming_batch. The following SQL is run multiple times for a sub-range that is specified by job.purge.max.num.batches.to.delete.in.tx for each node range until each is purged.

delete from sym_incoming_batch where batch_id between ? and ? and node_id = ? and status = 'OK'

Data Gap Purge

The data gap purge deletes from sym_data_gap where data is older than routing.data.reader.type.gap.retention.period.minutes.

delete from sym_data_gap where last_update_time < ? and status != 'GP'


Properties ID: 000004   Views: 3765   Updated: 12 years ago
Filed under: