18 October 2013 at 4:13pm
In SS3.0.x I'm trying to batch delete from a large dataset. 16K records linked to Files and other DataObjects, all told with relations its prbably about 250K. To delete the records I basically have a CSV, of 10000 entries that I need to compare against.
I figured the easiest way is loop through my DataObject (16K table), and if its not in the CSV delete it and all its related records. Right now I have php memory set at 512MB, and process timeout at 10min and its still choking even if I batch check 500 records at a time.
I'm wondeirng if I should just temporarily max out my PHP enviroment and try to get through it, or if there is a way I could optimize the actual process to get through it more effeciently.
21 October 2013 at 12:18pm
I would probably tackle it one object at a time. And hook into the onBeforeDelete function for each object to clean up relationships (if you're not already).
Small side note batch deleting isn't replication safe for master/salve & ndbcluster set ups.
Use fgetcsv() to get 1 row at a time, for examples see http://php.net/manual/en/function.fgetcsv.php
Then I'm not sure what the best practices are here but I like to set the timeout limit within the loop. This resets the time out each time it's called. So you can give each object 30sec (more if required) to do its thing.
Trying to bulk process 500 objects with active record can be costly. In the past I've been trying to process 1000s (not in SS) and found it was taking 15min to load all the data and about 3min to process. Doing it one at a time will be slow (maybe an hour) but it will be safe and less intrusive of the server resources.