Site Ripping Bastards: collecting the proof…

One of my last posts was about someone else ripping off one of my company’s sites and that at least this one may go somewhere. So the powers-that-be want some documentation.

So we decided to give them print outs of all the site pages and a copy on disc of the site.

For “downloading” the offending company’s website, we needed to use a auto-download program (becuase doing it by hand would have just been a colossal waste of time). Plus whomever ripped off the site in the first place probably did the same thing (though I really hope they weren’t that smart and spent a ridiculous amount of time doing it manually).

After trying several Firefox extensions and some other trial-crippled software packages, I landed with HTTrack Website Copier, a free open-source application that allows you to input the URL of a website and it will spider the site and download the pages and images, retainign the linking structure of the site, so that you have a navigable (is that a word?) version stored locally. Worked like a champ. There are Windows, Linux and Mac versions available.

It also works with dynamic sites, so if you ever have a client that wants to have their dynamic database driven PHP site stored on a CD so they can brows it locally at a tradeshow, desert island, Iditarod dog race in Alaska or some other place that doesn’t have internet access. (But of course it may also really help to have a static version if your database ever craps out on you and you still want a functional site).

Now for the print-outs issue, that would take a while, so here was my workaround:

  1. Use Yahoo Site Explorer to return a list of pages* it had in it’s index and use the “Export to TSV” option usually found at the bottom right of the results page. This will give you a full list of the pages for that site that has indexed, rather than just the 1st 10 links in the paginated results.
  2. Open the TSV file in Excel and save it as HTML
  3. Open the HTML file in Internet Explorer (I prefer Firefox, but IE is what makes the next step possible)
  4. Choose to print that one page with all the links and in the print options look for “print linked documents
  5. Kick back and reload your paper and toner as necessary

This was just a hack for getting this done and I am sure there are other, better options. I am sure there is a Firefox extension I am not aware of that woudl have really helped with this. If you know of one, I would love to know and link to it.

*Since I was depending on Yahoo’s index of the site, it is a possibility I may have missed some pages, but it gave me more than enough to show what we needed. Just a disclaimer.

Hope this helps anyone who might find themselves in the same boat.

P.S.
There may be some Mac folks out there thinking “An Automator workflow would have been able to take care of all that” and you may be right, but the 3-4 hours I spent with it never seemed to be able to connect the steps correctly, even after trying the very promising “Download URLs as PDFs” action. Automator was the one of the coolest features I had seen in an OS and it made me quite interested in moving to Mac, but so far for me, it has been nothing but a big tease- it looks like it can do almost anything and makes it seem easy, but the pieces never really fit together the way I need them too. But I’ll keep trying.

Share and Enjoy:
  • Digg
  • StumbleUpon
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Technorati
  • Google
Posted in software, tips & tricks | Leave a comment

Quick and easy Online Sitemaps.xml generator

Great tool to share today.

Ran across this website XML-Sitemaps.com that does exactly what you would want it to do.

You enter the website you want it to crawl and generate the XML sitemap for and a few minutes later it provides links to XML (compressed and uncompressed), text and HTML sitemaps for easy use.

It also allows you to specify the frequency, priority and created dates.

There is a 500 page cap on the service, but they also offer a PHP script version for about $20 bucks that you can install on your own server to generate sitemaps for unlimited pages.

Check them out and save yourself some time.

XML-sitemaps.com :  Free Online Google Sitemap generator

Share and Enjoy:
  • Digg
  • StumbleUpon
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Technorati
  • Google
Posted in seo, tips & tricks, tools | Leave a comment

Site Ripping Bastards: Redux!

For anyone who read my previous post on getting one my sites ripped off,  this is not a follow-up, but if you read that post you’ll understand that I don’t expect to have any resolution to that individual issue for the very reasons that made it so frustrating.

However, another site has done the same thing and even worse. And these guys are open-season.

But what’s really interesting is how we found these site-ripping bastards:

We work closely with a specialized 3rd party ecommerce service provider that provides a hosted service, where mulitple clients have “stores” all on their servers. So this service provider was contacted by a client with an odd situation. Apparently they billed and sent a gift card to a customer who had not ordered from their site but from another site in another state.

How on earth could that have happened? (ahh, come on you know how)

The other business ripped off the site we built for our customer and did not change one of the links that led to our customers hosted store. So when their customer followed the “bad” link they hadn’t changed, it sent their customer to our customer’s site and since they looked identical, their customer had no idea they were buying from another company entirely.

Quite amusing, actually, especially for my co-workers who were really hoping that I would call the offending company (they really enjoy it when I rip into people for some reason).

But alas, while it was our work that was ripped off, it is work that we were paid for so it is our client’s property and their fight. They have been informed. They are not happy. And they are the ones with the lawyers on staff.

At least no bureaucratic nonsense should prevent this one from getting interesting.

Share and Enjoy:
  • Digg
  • StumbleUpon
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Technorati
  • Google
Posted in wtf? | 1 Comment

The Bread Truck, The Backlash and “selling air”…

I haven’t posted in a while, because to be honest, I have been avoiding the computer as much as possible in my down time. Work has been crazy and there’s been little time between tasks to really see what else was going on in the rest of the world that I may comment on. So, I decided to comment quickly on my current feelings of burnout and overload.

So what has a passionate web guy wanting to give it all up to drive a bread truck or go back to his days as a bookstore clerk or pizza delivery guy?
There just seems to be too much going on and just keeping up with what’s going on in web development nowadays seems to be a full-time job on its own.

I’ll probably get crucified by the web community as a whole for saying it, but sometimes I wish the Internet would just SLOW THE F**K DOWN!

The signs of an impending backlash of our society’s always-on, broadband, social media networking (r)evolution are getting stronger and stronger. Mulitasking is bullshit and we get less done when we try to get more done. Email is evil and makes workers less productive and reduces effective communication. And at the end of it all, I am spending all my time, creating things that do not physically exist and will most likely be obsolete before they are even complete.

My movie buff mind keeps falling back to a scene in “City Slickers” where during a “what does your parent do” presentation at his son’s elementary school, Billy Crystal, a radio advertising exec staring down the barrel of a mid-life crisis, comes to the realization that when you really break it down, he “sells air.” Intangible, fleeting, moments in time. Air. So what is the difference between air and a line of code, a well-placed pixel or a search engine ranking. Well, people actually breathe Air.

Is it really a good idea for us to eat sleep and breathe the other stuff?
Guess I need to learn how to drive a stick. The bread truck is calling.

Share and Enjoy:
  • Digg
  • StumbleUpon
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Technorati
  • Google
Posted in wtf? | 1 Comment

R.I.P. Netscape Browser: Time of Death… Feb 1, 2008.

The once great, now irrelevant Netscape browser now has an official date to put on it’s tombstone.

As of February 1, 2008, Netscape will no longer be supporting the browser that helped start it all. Of course, they are the ones who started the Mozilla Foundation and became a victim of their own success with Firefox, but everyone pretty much saw this coming when AOL bought them.

Doesn’t seem like they are too broken up about it either. You can read the official announcement at the official Netscape Blog:

End of Support for Netscape web browser 

Share and Enjoy:
  • Digg
  • StumbleUpon
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Technorati
  • Google
Posted in browsers | Leave a comment