Monday, August 25, 2008

ColdFusion - Stripping tags with rereplacenocase

When you want to output a part of an RSS feed or content that contains tags, you'll want to at least strip out the image tags. If you know that all tags are lowercase, you can use rereplace, otherwise, use rereplacenocase.

To get rid of all tags:
rereplacenocase("<[^<]*>",variable,"all")
This regular expression searches for all patterns that begin with a <, have something in the middle that's not a < and end with a >.

To get rid of image tags only:
rereplacenocase("<img[^<]*>",variable,"all")
This expression is same as previous, except it searches for patterns beginning with <img.

To get rid of two or more specific tags:
rereplacenocase("<(img|/?div|br)[^<]*>",variable,"all")
This expression searches for patterns beginning with <img, <div, </div or <br. The reason I'm using * instead of a + is because closing tags don't have characters between the beginning pattern and the ending pattern.

19 comments:

  1. <img src="url" alt="image description" /> ;-)

    ReplyDelete
  2. o, lol, I knew what am image tag was I just didn't make the connection.

    but I don't get the whole outputting RSS feed that contains tags, why would you want to take out the images?

    ReplyDelete
  3. Because when you're outputting partial feeds, like mike in midwood on his blogroll, if you have image tags, they will render.

    ReplyDelete
  4. ahh I see, so your saying that by MikeInMidwood's blog roll because its a partial feed it will give anything that's in the beginning of the post, even if its an image?

    ReplyDelete
  5. Yah, so the programmer needs to make sure to strip out images and generally all tags, unless he wants to keep bold, italic and maybe titles.

    ReplyDelete
  6. I have never seen images on it and the babysitter has an image at the beginning of every post.

    ReplyDelete
  7. Dude, this is a programming post, not general post, hence the title. For programmers, not for users.

    ReplyDelete
  8. MikeInMidwood: he isn't talking about your blog posts. He's talking about your Blog Roll.

    but wait

    Moshe: he's right I do have an image by every post. Since he hasn't removed the image tag, then how come no image shows up in his blog roll thing?

    ReplyDelete
  9. Again, if you don't understand what the title says, don't know what rereplacenocase is and don't know what regular expressions are, this post is not for you. And the tags for this post are "programming" and "coldfusion".

    ReplyDelete
  10. Ha, my knowledge of programming extends just until the "font color=" part. Sometimes, I admit, I dabble in *gasp* "bold"

    Anyway, Mr. Programmer, I didnt understand a word of this post, and therefore I'm going to assume that it's a really deep, thought out post. I'll recommend it to friends

    ReplyDelete
  11. ya nye panimayu
    i can't even tell what language you are speaking altogether

    ReplyDelete
  12. tormoz
    coldfusion and regexp, duh.

    ReplyDelete
  13. Mo, check ur facebook inbox please.

    ReplyDelete
  14. Ok, thats fair play, but say you are pulling from a feed and you do want to keep an image, but only 1 to keep the formatting in place. eg, you pull in the rss through cfhttp, drop it to xmlcontent and find out that within the description.xmltext you got 8 photos pulling through that just looks naff, what regex can you do to only remove all apart from the first one?

    ReplyDelete
  15. Not sure. What you can do is look for first occurrence, use mid() to grab everything after that and use the regexp on that substring. Then concatenate them back together.

    ReplyDelete
  16. Found this handy tag http://www.cflib.org/index.cfm?event=page.udfbyid&udfid=425

    allows me to do it very nicely

    ReplyDelete
  17. Fantastic help. Thanks.

    ReplyDelete