Drupal Tutorial: Drupal Batch, Cron, and Queue.

by: outlierdavid

Performing large operations on a PHP-based web server is no easy task, especially when a bootstrap is involved. Avoiding timeout errors while not hampering user experience is key.  In this tutorial, I'm going to show you how you can easily combine Drupal's batch, cron, and queue APIs to quickly process large jobs in the background on your server's own time.

The Project:

We are going to write an email application that takes a list of users (which in this case are nodes) that are referenced by taxonomy terms.  For instance, we have a taxonomy vocabulary called "mailing_list" that has multiple mailing list term names.  The contact content type has a term reference to a mailing list so that a contact can belong to multiple mailing lists.

The Challenge:

We need to ensure that this email application has the potential to email over 10,000 users or more. Potentially, it needs to email over 100,000 contacts.  We could use the batch API to send an email directly to every single contact while the end user watches a very slow-moving progress bar...

OR

We can do something much faster.  Write three pieces of very quickly obtained data to Drupal's database queue, which reduces the batch run time by a significant amount, and set a cron job to cycle through the queue at its own pace.  Thus, letting the web server decide how much it can handle at a time.

STEP 1: Create the form

I'm going to glance over this step, but it is fairly simple.  We are going to create a form that will allow our users to customize the Sender's Name, Email, Subject, and Body of the email.  We're also going to provide a list of mailing lists to choose from, so the user can then designate what mailing list they would like to send to.  The mailing list is a taxonomy vocabulary.

function email_module_compose_email($form, &$form_state) {						
	$form['to']['views_send_to_name']['#disabled'] = TRUE;
	$form['campaign_options'] = array(
		'#type' => 'fieldset',
		'#title' => t('Campaign options'),
		'#collapsible' => TRUE,
		'#collapsed' => FALSE,
		'campaign_name' => array(
			'#type' => 'textfield',
			'#title' => t('Campaign Name'),
			'#name' => 'campaign_name',
			'#default_value' => '',
			'#size' => 60,
			'#maxlength' => 128,
			'#required' => FALSE,
			'#description' => t('Provide a alias this email is known by. For internal use only. If left blank, the campaign name will default to the subject of the email.')
		),
	);	
	$form['sender'] = array(
		'#type' => 'fieldset',
		'#title' => t('Sender'),
		'#collapsible' => TRUE,
		'#collapsed' => FALSE,
		'sender_name' => array(
			'#title' => t('Sender\'s name'),
			'#description' => t('Enter the sender\'s human readable name.'),
			'#type' => 'textfield'
		),
		'sender_email' => array(
			'#title' => t('Sender\'s e-mail'),
			'#description' => t('Enter the sender\'s e-mail address.'),
			'#type' => 'textfield',
			'#required' => TRUE,
		),
	);	
	$form['recipients'] = array(
		'#type' => 'fieldset',
		'#title' => t('Recipients'),
		'#collapsible' => TRUE,
		'#collapsed' => FALSE,
	);
	// Get vid from machine_name
	$mailing_lists = old_taxonomy_vocabulary_machine_name_load('mailing_list');
	$list_vid = $mailing_lists->vid;
	$options = array();
	// Build and execute the query 
	$query = new EntityFieldQuery();
	$query->entityCondition('entity_type', 'taxonomy_term')
		->propertyCondition('vid', $list_vid);		
	$result = $query->execute();
	// Populate the $options array with the mailing lists and their TIDs.
	foreach($result['taxonomy_term'] as $tid) {
		$term = taxonomy_term_load($tid->tid);
		$nodeCount = db_query("
			SELECT nid FROM {node} n
			INNER JOIN {field_data_field_subscriber_list} sl
				ON sl.entity_id = n.nid
			WHERE n.type = 'contacts'
			AND sl.field_subscriber_list_tid = $tid->tid;
		")->rowCount();
		$options[$term->tid] = $term->name . ' (' . $nodeCount . ' recipient' . ($nodeCount != 1 ? 's)' : ')');
	}	
	$form['recipients']['list'] = array(
		'#title' => t('Mailing List'),
		'#description' => t('Choose a mailing list you wish to send this email to.'),
		'#required' => TRUE,
		'#type' => 'select',
		'#options' => $options,
	);	
	$form['campaign'] = array(
		'#type' => 'fieldset',
		'#title' => t('Campaign Content'),
		'#collapsible' => TRUE,
		'#collapsed' => FALSE,
	);	
	$form['campaign']['subject'] = array(
		'#title' => 'Subject',
		'#description' => t('Enter the e-mail\'s subject.'),
		'#required' => TRUE,
		'#type' => 'textfield',
	);	
	$form['campaign']['message'] = array(
		'#title' => t('Message'),
		'#description' => t('Enter the body of the message.'),
		'#type' => 'text_format',
	);	
	$form['submit'] = array(
		'#type' => 'submit',
		'#value' => 'Next',
	);	
	return $form;
}

STEP 2: Queue up your job using the Batch API

Now we're going to take our form submission and load the email_nid and contact_nid into the queue.  Because this list could, potentially, have over 10,000 recipients we are going to use Drupal's Batch API to load all of the emails that are going to be sent into the queue. Though I won't paste the code for the email_module_save_draft function here, all it does is make a node with the fields listed in the form function above and returns the node ID of the saved node.

function email_module_compose_email_submit($form, &$form_state) {
	if($nid = email_module_save_draft($form, $form_state)) {
		$list = taxonomy_select_nodes($form_state['values']['list'], FALSE);
		$list_total = count($list);
		$batch = array(
			'title' => t('Queue email for sending.'),
			'operations' => array(
				array('email_module_queue_email', array($nid, $list, $list_total))
			),
			'init_message' => t('The email queue process is starting.'),
			'progress_message' => t('Queueing emails...'),
			'error_message' => t('The queue process has encountered an error.'),
			'finished' => 'email_module_queue_email_finished',
		);
		batch_set($batch);
		batch_process('node/1');
	}
}

For those unfamilliar with Drupal batch, this is the basic syntax needed to run a batch job.  First, I set the variable $nid = the node ID of the email.  I do this so we can have a specific ID for every email sent which allows us to have tracking history associated to specific emails.  Taxonomy_select_nodes will return a single dimensional array of node IDs.

Once we have our email ID and an array of people that need to be sent this particular email, we're ready to go.  I set my $batch = to an array of things Drupal batch needs to parse the job and tell it to go.

Step 3: Write your data into the queue

function email_module_queue_email($email_nid, $list, $list_total, &$context) {			
	$limit = 10;
	if(!isset($context['sandbox']['offset'])) {
		$context['sandbox']['offset'] = 0;
		$context['sandbox']['progress'] = 0;
	} else {
		$context['sandbox']['offset'] = $context['sandbox']['offset'] + $limit + 1;
	}	
	$queue = DrupalQueue::get('email_module');
	/*

	 * For this next loop. Instead of re-querying the entire list and only selecting the next 10... we have our

	 * entire list, in an array, in memory.  So why not select 10 array rows in each iteration to deal with?

	 * This can be adjusted by setting the $limit variable to anything.  I set it to 10 for kicks and giggles.

	 */

	for($i = $context['sandbox']['offset']; $i <= $context['sandbox']['offset'] + $limit; $i++) {

		if(isset($list[$i])) {

			// Write the contact NID and email NID to the queue for processing later. (This allows us to have multiple email campaigns queued at the same time.)

			$contact = node_load($list[$i]);

			$item = array('to' => $contact->title, 'contactNID' => $list[$i], 'emailNID' => $email_nid);

			$queue->createItem($item);
			
			$context['sandbox']['progress']++;

			$context['message'] = t('Now queueing email %current of %total', array('%current' => $context['sandbox']['progress'], '%total' => $list_total));

		}

	}

	

	if($context['sandbox']['progress'] != $list_total) {

		$context['finished'] = $context['sandbox']['progress'] / $list_total;

	}

}

Without getting too deep into Batch API, let's focus on what we're doing here.  The three arguments $email_nid, $list, and $list_total have been passed from email_module_compose_email_submit (line 3 of $batch array).  First, I have some logic.  This function is going to be run over and over until the queue batch process is done.  I need a way to return to the Batch API how much has been completed.  Additional math to return percentages could be done, but that is not done here.

The magic begins here:


$queue = DrupalQueue::get('email_module');

Using the Drupal Queue API, we can create our database queue by simply calling this class function.

Then, we are going to loop through our array beginning from where we last left off.  The efficiency here is that we are not re-generating the array of 10,000+ contacts every time.  That variable has already been passed into the function.  However, we don't want to loop through all 10,000+ contacts in one go (which would defeat the purpose of even using the batch API). Instead, for the first iteration, we go through array keys 0-10.  The next iteration will do 11-21. The next would do 22-32.  And so on until it has added a row to the database queue, each with three columns...

And those values are:


$contact = node_load($list[$i]);

$item = array('to' => $contact->title, 'contactNID' => $list[$i], 'emailNID' => $email_nid);

$queue->createItem($item);

Essentially, we are writing 3 values to the queue that say "This is the email address it's going to", "This is the node ID of the contact the email address belongs to", and "This is the node ID of the email I want them to get".

Remember: These are simple values. Either a string or an integer. We are writing them to a QUEUE for Drupal to process later. What we don't want is for the user to have to sit and watch a progress bar while we query all this info and send the email.  It would simply take too long.

STEP 4: Parse the Queue on Cron Run

So, we have our queue.  Now what?

function email_module_cron_queue_info() {

	$queues = array();

	$queues['email_module'] = array(

		'worker callback' => 'email_module_send_queued_mail',

		'time' => 30,

	);

	

	return $queues;

}

This nifty hook will literally run any function you write for a period of however long you want to run it for.  Because it's a hook_cron_queue_info(), you specify the name of the queue you want it to pass and every iteration of the loop that runs the function email_module_send_queued_mail gets passed an argument of $item.  Remember the three values we wrote to the queue above (email addresscontact NID, and email NID)?  Those are all now available to us in the variable $item.

function email_module_send_queued_mail($item) {

	

	/*

	 * This is where we can get all of the information about the email being sent and who it is currently going to.

	 * We store everything we want to pass to drupal_mail in $params to be called later in the drupal_mail hook.

	 */



	$email = node_load($item['emailNID']);

	$params = array();

	$params['to'] = $item['to'];

	$params['from'] = $email->field_from['und']['0']['value'];

	$params['subject'] = $email->field_subject['und']['0']['value'];

	$body = $email->body['und'][0]['value'];

	$params['body'] = array($body);

	$params['headers'] = array(

		'MIME-Version' => '1.0',

		'Content-Type' => 'text/html; charset=iso-8859-1; format=flowed',

		'Content-Transfer-Encoding' => '8Bit',

		'X-Mailer' => 'Drupal',

		'From' => $params['from'],

		'Sender' => $params['from'], //this should be from_name;

		'Return-Path' => $params['from'],

	);

	

	drupal_mail('email_module', 'email_module', $params['to'], language_default(), $params, $params['from'], TRUE);

}



function email_module_mail($key, &$message, $params) {	

	switch($key) {

		case "email_module" :

			$message['subject'] = $params['subject'];

			$message['to'] = $params['to'];

			$message['from'] = $params['from'];

			$message['body'] = $params['body'];

			$message['headers'] = $params['headers'];21

      break;

	}

}

STEP 5: Get your Cron Job to run frequently, without killing your site's performance

There are quite a few options for this but I'm going to have to strongly recommend Ultimate Cron.  Ultimate Cron will find all of your queue cron jobs and set them, by default, to an every 1 minute run time.  It also prevents every single cron job from running every time the cron runs, thus maximizing the efficiency of Drupal Cron.  But a true cron is best suited for this.

What you want is to have a server service hit your external cron URL every minute to five minutes (depending on how often you want your server to check for/send out emails). This method varies from server to server.

Awesome! What have we just built, exactly?

  • We have built a batch function that efficiently loops through and loads a potentially LARGE amount of data by minimizing the amount of data that is being written into the queue.
  • We have built a cron function to parse through that database queue.  Since it runs on cron, emails are sent in the background as your site operates as normal!  Once the emails are in the queue, your content developers just have to set it and forget it and (depending on the size of the mailing list) the server will run through the queue as often as you define (in your server cron config) and parse out as many emails as it can within the time limit you set (which in this example is 30 seconds. See email_module_cron_queue_info)
  •  
  •  

What else can I do with it?

  • The hardest thing about parsing large amounts of data is PHP timeouts.  The hard part is done.  Just loading the data into the queue was the tough part, which is solved in this example.
  • Because you have a queue, why not build a page that shows your users how many emails out of the grand total have been sent?
    • You have a total number of emails sent by checking what email list the current email NID was sent to...
    •  
    •  
  • Templatize! Create an area where a user can manage templates and save which template to the email node. Then when you're going to send the email, inject the markup of your template into the body of the email!
  • Write your templates to have pseudo-logic so that you can pull dynamic content into your emails!
  • This list can go on and on thanks to Drupal API!
  •  
  •  

Recommended Modules to use with this code:

  1. Ultimate Cron
  2. Mime Mail
  3.  
  4.  

Outlier

Video, Web, and Design Agency