Archive for the ‘web services’ tag

Removing HTML from RSS Feeds

with one comment

Sometimes, you want to use an RSS feed as a data source. This may be for publishing on your website or for mashing up data with another service.

Recently, I’ve been building a corporate intranet with the functionality to embed RSS feeds within the content. It’s a great way of generating fresh content in a controlled environment with minimum effort. Unfortunately, there are quite a few RSS feeds which include embedded links for sharing the item via services like ShareThis or Feedburner’s FeedFlare. One such example was the Post Online RSS Feed.

This seemed like an ideal job for Yahoo’s Pipes to tidy up. Here is the process I went through to create the sanitized version:

  1. Create a Pipe on Yahoo Pipes (you may need to sign in with a Yahoo! account).
  2. Drag Fetch Feed from Source on to the canvas.
  3. Enter the URL of your feed.
  4. Drag Regex from Operators on to the canvas.
  5. Select the item to sanitize, e.g. item.description.
  6. In the replace text box, enter <(.|\n)*?>.
  7. Leave the with text box empty.
  8. Check the g checkbox.
  9. Drag a connection from the bottom of Fetch Feed to the top of Regex.
  10. Drag a connection from the bottom of Regex to the top of Pipe Output.
  11. To test your Pipe, highlight the Pipe Output box and check the results in Debugger panel at the base of the browser.
  12. If all has gone to plan, you can Save your new Pipe, give it an appropriate name and start using your new clean feed.

Hopefully, you’ll find this useful. If you do, leave a comment below. Equally, if you encounter any issues with the process, share the problem and how you may have fixed it.

Written by Si

March 6th, 2009 at 12:14 pm