Skip to main content

This site requires you to update your browser. Your browsing experience maybe affected by not having the most up to date version.

General Questions /

General questions about getting started with SilverStripe that don't fit in any of the categories above.

Moderators: martimiz, Sean, biapar, Willr, Ingo, swaiba, simon_w

Truncating text while maintaining HTML


Reply


4 Posts   1057 Views

Avatar
flipsidenz

Community Member, 49 Posts

11 September 2012 at 7:22am

Hi,

I've built a news section to my site where the landing page is a summary of the latest 10 news items which a read more below each.

I'm using $Content.FirstParagraph to truncate my content before the read more.

Key issue here is if for some reason the first paragraph sits within a div:

<div>
<p>Hello World!</p>
</div>

The opening div tag will come through in my summary, but not the closing div tag - resulting in all sorts of styling headaches.

I'm wondering if anyone has tackled this issue before? What's the best way to show a summary of your content while retaining HTML opening and closing tags? Is this possible?

Avatar
flipsidenz

Community Member, 49 Posts

23 October 2012 at 11:57am

I didn't have much luck finding a saphire specific answer to this question, so I went searching for a PHP function to help me with this and came across this post: http://stackoverflow.com/questions/1193500/php-truncate-html-ignoring-tags

Here is my end result. I'm still curious as to whether there is an easier way to achieve this?

   /**
    * Get a bbcode parsed summary of the blog entry
    */
   function ParagraphSummary(){
      if(self::$allow_wysiwyg_editing) {
         return $this->printTruncated(800, $this->obj('Content')->FirstParagraph('html'));
      } else {
         $parser = new BBCodeParser($this->Content);
         $html = new HTMLText('Content');
         $html->setValue($parser->parse());
         return $this->printTruncated(800, $html->FirstParagraph('html'));
      }
   }
   /**
   * Responsible for closing any tags which have been left open in the summary...
   */
   public function printTruncated($maxLength, $html, $isUtf8=true)
   {
      $printedLength = 0;
      $position = 0;
      $tags = array();
      $string = '';
   
      // For UTF-8, we need to count multibyte sequences as one character.
      $re = $isUtf8
         ? '{</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;|[\x80-\xFF][\x80-\xBF]*}'
         : '{</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;}';
   
      while ($printedLength < $maxLength && preg_match($re, $html, $match, PREG_OFFSET_CAPTURE, $position))
      {
         list($tag, $tagPosition) = $match[0];
   
         // Print text leading up to the tag.
         $str = substr($html, $position, $tagPosition - $position);
         if ($printedLength + strlen($str) > $maxLength)
         {
            $string .= substr($str, 0, $maxLength - $printedLength);
            $printedLength = $maxLength;
            break;
         }
   
         $string .= $str;
         $printedLength += strlen($str);
         if ($printedLength >= $maxLength) break;
   
         if ($tag[0] == '&' || ord($tag) >= 0x80)
         {
            // Pass the entity or UTF-8 multibyte sequence through unchanged.
            $string .= $tag;
            $printedLength++;
         }
         else
         {
            // Handle the tag.
            $tagName = $match[1][0];
            if ($tag[1] == '/')
            {
               // This is a closing tag.
   
               $openingTag = array_pop($tags);
               assert($openingTag == $tagName); // check that tags are properly nested.
   
               $string .= $tag;
            }
            else if ($tag[strlen($tag) - 2] == '/')
            {
               // Self-closing tag.
               $string .= $tag;
            }
            else
            {
               // Opening tag.
               $string .= $tag;
               $tags[] = $tagName;
            }
         }
   
         // Continue after the tag.
         $position = $tagPosition + strlen($tag);
      }
   
      // Print any remaining text.
      if ($printedLength < $maxLength && $position < strlen($html))
         $string .= substr($html, $position, $maxLength - $printedLength);
   
      // Close any open tags.
      while (!empty($tags))
         $string .= sprintf('</%s>', array_pop($tags));
      
      return $string;
   }

Avatar
Bigfork

Community Member, 22 Posts

23 October 2012 at 11:57pm

I had the same problem and never solved it. Fortunately, I didn't need to but if I did then a custom function would've been required.
You could use PHP strip_tags to selectively remove things like <div> while leaving <strong> for example, then truncate the cleaned up version.

What I never worked out was how to do this custom function without modifying core code, so would be interested to know if this is possible by extending a class.

Avatar
martimiz

Forum Moderator, 1106 Posts

24 October 2012 at 5:34am

I've never tested this but I think you should be able to create a decorator for the Text class, something like this:

class ExtendedText extends Extension {

   function ParagraphSummary() {
      ...
      return $data;
   }
}

Then in config: Object::add_extension('Text', 'ExtendedText');