Skip to main content

This site requires you to update your browser. Your browsing experience maybe affected by not having the most up to date version.

General Questions

General questions about getting started with SilverStripe that don't fit in any of the categories above.

Moderators: martimiz, Sean, biapar, Willr, Ingo, swaiba, simon_w

Truncating text while maintaining HTML


Reply

4 Posts   1025 Views

Avatar
flipsidenz

11 September 2012 at 7:22am Community Member, 49 Posts

Hi,

I've built a news section to my site where the landing page is a summary of the latest 10 news items which a read more below each.

I'm using $Content.FirstParagraph to truncate my content before the read more.

Key issue here is if for some reason the first paragraph sits within a div:

<div>
<p>Hello World!</p>
</div>

The opening div tag will come through in my summary, but not the closing div tag - resulting in all sorts of styling headaches.

I'm wondering if anyone has tackled this issue before? What's the best way to show a summary of your content while retaining HTML opening and closing tags? Is this possible?

Avatar
flipsidenz

23 October 2012 at 11:57am Community Member, 49 Posts

I didn't have much luck finding a saphire specific answer to this question, so I went searching for a PHP function to help me with this and came across this post: http://stackoverflow.com/questions/1193500/php-truncate-html-ignoring-tags

Here is my end result. I'm still curious as to whether there is an easier way to achieve this?

   /**
    * Get a bbcode parsed summary of the blog entry
    */
   function ParagraphSummary(){
      if(self::$allow_wysiwyg_editing) {
         return $this->printTruncated(800, $this->obj('Content')->FirstParagraph('html'));
      } else {
         $parser = new BBCodeParser($this->Content);
         $html = new HTMLText('Content');
         $html->setValue($parser->parse());
         return $this->printTruncated(800, $html->FirstParagraph('html'));
      }
   }
   /**
   * Responsible for closing any tags which have been left open in the summary...
   */
   public function printTruncated($maxLength, $html, $isUtf8=true)
   {
      $printedLength = 0;
      $position = 0;
      $tags = array();
      $string = '';
   
      // For UTF-8, we need to count multibyte sequences as one character.
      $re = $isUtf8
         ? '{</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;|[\x80-\xFF][\x80-\xBF]*}'
         : '{</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;}';
   
      while ($printedLength < $maxLength && preg_match($re, $html, $match, PREG_OFFSET_CAPTURE, $position))
      {
         list($tag, $tagPosition) = $match[0];
   
         // Print text leading up to the tag.
         $str = substr($html, $position, $tagPosition - $position);
         if ($printedLength + strlen($str) > $maxLength)
         {
            $string .= substr($str, 0, $maxLength - $printedLength);
            $printedLength = $maxLength;
            break;
         }
   
         $string .= $str;
         $printedLength += strlen($str);
         if ($printedLength >= $maxLength) break;
   
         if ($tag[0] == '&' || ord($tag) >= 0x80)
         {
            // Pass the entity or UTF-8 multibyte sequence through unchanged.
            $string .= $tag;
            $printedLength++;
         }
         else
         {
            // Handle the tag.
            $tagName = $match[1][0];
            if ($tag[1] == '/')
            {
               // This is a closing tag.
   
               $openingTag = array_pop($tags);
               assert($openingTag == $tagName); // check that tags are properly nested.
   
               $string .= $tag;
            }
            else if ($tag[strlen($tag) - 2] == '/')
            {
               // Self-closing tag.
               $string .= $tag;
            }
            else
            {
               // Opening tag.
               $string .= $tag;
               $tags[] = $tagName;
            }
         }
   
         // Continue after the tag.
         $position = $tagPosition + strlen($tag);
      }
   
      // Print any remaining text.
      if ($printedLength < $maxLength && $position < strlen($html))
         $string .= substr($html, $position, $maxLength - $printedLength);
   
      // Close any open tags.
      while (!empty($tags))
         $string .= sprintf('</%s>', array_pop($tags));
      
      return $string;
   }

Avatar
feejin

23 October 2012 at 11:57pm Community Member, 22 Posts

I had the same problem and never solved it. Fortunately, I didn't need to but if I did then a custom function would've been required.
You could use PHP strip_tags to selectively remove things like <div> while leaving <strong> for example, then truncate the cleaned up version.

What I never worked out was how to do this custom function without modifying core code, so would be interested to know if this is possible by extending a class.

Avatar
martimiz

24 October 2012 at 5:34am Forum Moderator, 1091 Posts

I've never tested this but I think you should be able to create a decorator for the Text class, something like this:

class ExtendedText extends Extension {

   function ParagraphSummary() {
      ...
      return $data;
   }
}

Then in config: Object::add_extension('Text', 'ExtendedText');