Why would you need to ever screen scrape content from your own website? Why not just use RSS feeds? If you want to use your own RSS feeds from your own content to display it somewhere else on your site you can do that! But what if there isn’t an RSS feed from the section in question ? What can you do?
I desperately needed to do just that and worked non stop until I got it hashed out. The blah blah below is what I did and why I had to do it.
Task: build a site using the expensive shopping cart that your client purchased. The client has big expectations and you are faced with a big problem: the shopping cart doesn’t create an RSS feed and you used WordPress to power the rest of the website. Now you have 2 separate components running one site: a flat file shopping cart and WordPress as a CMS , how do you get them to play nice? Are you a PHP programmer? (In my case sadly,no). Well if you aren’t a programmer you can still use wp-blogheader.php to do neat enough things like pull your WordPress theme’s navigation menu in the shopping cart header and your WordPress theme’s footer in the shopping cart, too. In the end you’ve worked really hard getting 2 very different PHP applications to look exactly the same. But you aren’t happy yet. You want some automation. You don’t want to make your clients work that hard.
WordPress is easy to use – my clients quickly catch on to how they can do pretty much anything with it and they can do it themselves.
But how do I get that shop content onto the homepage? Like, dynamically? The home page features their upcoming events and very carefully handcrafted (by me) Featured Products Posts that match the shop pages content in looks. But the whole point of this exercise was that my un HTML savvy clients be able to update their website themselves. I installed a few plugins to help them along such as Post Template . With this they can copy and paste image and page URLs, add some tags and a title and publish. But there is still all that damn HTML. I’d rather they not have to even see it at all.
Another thing is they sometimes run out of a product that was featured in a post on the home page. The product then pretty much ceases to exist in the shopping cart. When this happens and then someone follows the link in the Featured Product Post very bad things happen. Babies start crying and birds cease their joyful song. I’m kidding: the link leads to a page with a nasty PHP error printed on it.
Obviously the next step really was RSS. Or getting some sweet RSS action from a page that wasn’t putting out. I went looking for services like Page2RSS, Feedity and Feed4All. I looked at Dapper and Yahoo Pipes. On my Code page you can see an example of the use of Page2RSS and the MultiFeedSnap plugin displaying content from my client’s shopping cart.
Dapper was really neat (you can make flash badges!) so I am definitely not giving up on them but I didn’t get Yahoo Pipes. Feedity inserted ads in place of the product images and Feed4All just didn’t seem to work. So far my experiment has let me avoid classic screen scraping and I feel like I’m getting close to getting what I want. There are other plugins to explore, too. But MultiFeedSnap works pretty well I must add.
Have you had any clever triumphs in this arena? I’d love to hear about it.