Skip to main content

This site requires you to update your browser. Your browsing experience maybe affected by not having the most up to date version.

General Questions /

General questions about getting started with SilverStripe that don't fit in any of the categories above.

Moderators: martimiz, Sean, biapar, Willr, Ingo, swaiba, simon_w

[SOLVED] Batch Deleting from Large Tables


Reply


3 Posts   383 Views

Avatar
zenmonkey

Community Member, 532 Posts

18 October 2013 at 4:13pm

In SS3.0.x I'm trying to batch delete from a large dataset. 16K records linked to Files and other DataObjects, all told with relations its prbably about 250K. To delete the records I basically have a CSV, of 10000 entries that I need to compare against.

I figured the easiest way is loop through my DataObject (16K table), and if its not in the CSV delete it and all its related records. Right now I have php memory set at 512MB, and process timeout at 10min and its still choking even if I batch check 500 records at a time.

I'm wondeirng if I should just temporarily max out my PHP enviroment and try to get through it, or if there is a way I could optimize the actual process to get through it more effeciently.

Avatar
Bambii7

Community Member, 254 Posts

21 October 2013 at 12:18pm

Hi zenmonkey,

I would probably tackle it one object at a time. And hook into the onBeforeDelete function for each object to clean up relationships (if you're not already).

Small side note batch deleting isn't replication safe for master/salve & ndbcluster set ups.

Use fgetcsv() to get 1 row at a time, for examples see http://php.net/manual/en/function.fgetcsv.php

Then I'm not sure what the best practices are here but I like to set the timeout limit within the loop. This resets the time out each time it's called. So you can give each object 30sec (more if required) to do its thing.

$object = null;
while() {
set_time_limit ( 30 );
...
$object = DataObject;
PROCESS CODE
...
}

Trying to bulk process 500 objects with active record can be costly. In the past I've been trying to process 1000s (not in SS) and found it was taking 15min to load all the data and about 3min to process. Doing it one at a time will be slow (maybe an hour) but it will be safe and less intrusive of the server resources.

Avatar
zenmonkey

Community Member, 532 Posts

21 October 2013 at 1:38pm

Thanks, this appears to be working, and since I'm not timing out, I don't actually need to use the csv.

Cheers,