Skip to main content

This site requires you to update your browser. Your browsing experience maybe affected by not having the most up to date version.

We've moved the forum!

Please use forum.silverstripe.org for any new questions (announcement).
The forum archive will stick around, but will be read only.

You can also use our Slack channel or StackOverflow to ask for help.
Check out our community overview for more options to contribute.

General Questions /

General questions about getting started with SilverStripe that don't fit in any of the categories above.

Moderators: martimiz, Sean, Ed, biapar, Willr, Ingo, swaiba

Truncating text while maintaining HTML


Go to End


4 Posts   2638 Views

Avatar
flipsidenz

Community Member, 49 Posts

11 September 2012 at 7:22am

Hi,

I've built a news section to my site where the landing page is a summary of the latest 10 news items which a read more below each.

I'm using $Content.FirstParagraph to truncate my content before the read more.

Key issue here is if for some reason the first paragraph sits within a div:

<div>
<p>Hello World!</p>
</div>

The opening div tag will come through in my summary, but not the closing div tag - resulting in all sorts of styling headaches.

I'm wondering if anyone has tackled this issue before? What's the best way to show a summary of your content while retaining HTML opening and closing tags? Is this possible?

Avatar
flipsidenz

Community Member, 49 Posts

23 October 2012 at 11:57am

I didn't have much luck finding a saphire specific answer to this question, so I went searching for a PHP function to help me with this and came across this post: http://stackoverflow.com/questions/1193500/php-truncate-html-ignoring-tags

Here is my end result. I'm still curious as to whether there is an easier way to achieve this?

	/**
	 * Get a bbcode parsed summary of the blog entry
	 */
	function ParagraphSummary(){
		if(self::$allow_wysiwyg_editing) {
			return $this->printTruncated(800, $this->obj('Content')->FirstParagraph('html'));
		} else {
			$parser = new BBCodeParser($this->Content);
			$html = new HTMLText('Content');
			$html->setValue($parser->parse());
			return $this->printTruncated(800, $html->FirstParagraph('html'));
		}
	}
	/**
	* Responsible for closing any tags which have been left open in the summary...
	*/
	public function printTruncated($maxLength, $html, $isUtf8=true)
	{
		$printedLength = 0;
		$position = 0;
		$tags = array();
		$string = '';
	
		// For UTF-8, we need to count multibyte sequences as one character.
		$re = $isUtf8
			? '{</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;|[\x80-\xFF][\x80-\xBF]*}'
			: '{</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;}';
	
		while ($printedLength < $maxLength && preg_match($re, $html, $match, PREG_OFFSET_CAPTURE, $position))
		{
			list($tag, $tagPosition) = $match[0];
	
			// Print text leading up to the tag.
			$str = substr($html, $position, $tagPosition - $position);
			if ($printedLength + strlen($str) > $maxLength)
			{
				$string .= substr($str, 0, $maxLength - $printedLength);
				$printedLength = $maxLength;
				break;
			}
	
			$string .= $str;
			$printedLength += strlen($str);
			if ($printedLength >= $maxLength) break;
	
			if ($tag[0] == '&' || ord($tag) >= 0x80)
			{
				// Pass the entity or UTF-8 multibyte sequence through unchanged.
				$string .= $tag;
				$printedLength++;
			}
			else
			{
				// Handle the tag.
				$tagName = $match[1][0];
				if ($tag[1] == '/')
				{
					// This is a closing tag.
	
					$openingTag = array_pop($tags);
					assert($openingTag == $tagName); // check that tags are properly nested.
	
					$string .= $tag;
				}
				else if ($tag[strlen($tag) - 2] == '/')
				{
					// Self-closing tag.
					$string .= $tag;
				}
				else
				{
					// Opening tag.
					$string .= $tag;
					$tags[] = $tagName;
				}
			}
	
			// Continue after the tag.
			$position = $tagPosition + strlen($tag);
		}
	
		// Print any remaining text.
		if ($printedLength < $maxLength && $position < strlen($html))
			$string .= substr($html, $position, $maxLength - $printedLength);
	
		// Close any open tags.
		while (!empty($tags))
			$string .= sprintf('</%s>', array_pop($tags));
		
		return $string;
	}

Avatar
Bigfork

Community Member, 23 Posts

23 October 2012 at 11:57pm

I had the same problem and never solved it. Fortunately, I didn't need to but if I did then a custom function would've been required.
You could use PHP strip_tags to selectively remove things like <div> while leaving <strong> for example, then truncate the cleaned up version.

What I never worked out was how to do this custom function without modifying core code, so would be interested to know if this is possible by extending a class.

Avatar
martimiz

Forum Moderator, 1391 Posts

24 October 2012 at 5:34am

I've never tested this but I think you should be able to create a decorator for the Text class, something like this:

class ExtendedText extends Extension {

	function ParagraphSummary() {
		...
		return $data;
	}
}

Then in config: Object::add_extension('Text', 'ExtendedText');