Jul
30
Sneaky Web Robots Scrape Content
Filed Under F5, Networking, Virtualization | Leave a Comment
One of my clients has several content sites on a variety of different topics. His sites are constantly being hit by screen scrapers. There are a variety of methods for mitigating this, such as installing an F5 ASM, but it seems the robots have a new way.
Here is one in particular robot that is particularly aggravating to my customer:
204.236.xxx.xx – - [28/Jul/2011:14:50:28 -0500] “GET /dir/page-on-site.php HTTP/1.1″ 200 2750 www.clientdomain.com “-” “Mozilla/5.0 (compatible; Bender; http://XXXXXXXXXXXXXXXXX.tumblr.com)”
Notice the trickery: Bender uses a subdomain on the well known site Tumblr. On the page, it lists the briefest of instructions about using robots.txt . I have my doubts as to whether they really respect robots.txt files. I will update the share after some more testing.
I recommended to my customers that they they just block the ip address range of the robot at the firewall and be done with it.
However, this is exactly what I recommended a few months prior to them. Why are they still being scraped?
It turns out that the ip address they were using was from Amazon.com. One’s first reaction might be that Amazon is scraping content. However, that is not the case. These are rented servers on Amazon’s EC2 network. They are basically just temporary ip addresses. If you block one, you may just end blocking something from Amazon later.
I hope the robots.txt file changes make a difference. Otherwise, it is going to be a lot of billable hours for me.
Screen-scraping is bad.
Jul
29
I downloaded the netinstall iso; I figured it would be better than downloading their 5 gigabyte DVD iso’s. Why so much crud?
During install it asks for a url to get installation files from.
The installer wants to know where the “install.img” file is. After several attempts, I used this:
http://centos.mirrors.tds.net/pub/linux/centos/6.0/os/x86_64/images/
I am using a standard virtual machine for it. I chose Redhat 6 (64-Bit) with 2 nics and one gig of RAM. Centos 6 is Redhat 6.
It went fine. I was left with a much smaller install as well. I did not install the GUI.
Jul
21
VirtualBox: Cloning a Virtual Machine
Filed Under Computers, Linux, Virtualization | Leave a Comment
TO CLONE A VIRTUAL MACHINE, JUST EXPORT YOUR CURRENT MACHINE AND IMPORT IT BACK IN.
I like Virtualbox for desktop virtualization. I had a lot of problems with it when using it for a server guest, but for workstation tasks, it is great.
Do not use VirtualBox for server virtualization. It is not designed for it. Use XEN or VMWARE instead.
So, I am sitting here with a nicely built Ubuntu 11 virtual machine. I have updated all the software and made everything just right for my needs. Now, I want to clone the machine so I can have a few of them running. I like to separate my work functions into different VMs for organization and efficiency.
Well, how do you do it? I do not see a “clone” function built into the GUI. I do see that you can Export your virtual machine to an ova file, so I figure you can just import the exported file as a new machine?
So, I started the process and looks like it will take a very long time. There must be a better way!
Virtualbox 4.1 which was just released a few days ago seems to have a clone feature. However, the release notes say that it has been disabled until it is fixed in a future maintenance version of code. Oracle owns VirtualBox.
It looks like the way to do it is at the command line.
http://www.virtualbox.org/manual/ch08.html#vboxmanage-clonevm
This is the section I need:
VBoxManage clonevm <uuid>|<name>
[--snapshot <uuid>|<name>]
[--mode machine|all]
[--options link|keepallmacs|keepnatmacs|
keepdisknames]
[--name <name>]
[--basefolder <basefolder>]
[--uuid <uuid>]
[--register]
OK, I am running Windows 7 Enterprise as the host computer. So, I will need to use the command line. I installed GNU utilities on my computer, so I get to use a nice Bash shell instead of the regular Windows command prompt.
And, it seems like the command is not working. I guess I am stuck without the clonevm command (I am running 4.0). There is a clonevdi command, however.
OK, the old school way is to clone the virtual hard drive and edit the other files. It is only slightly more painful.
So, starting up the clonevdi command. It seems to be running really slow as well! –> 0% for a long time.
So, I took a look at the filesystem. It looked like it was 2/3rds done already. I think I’ll go back to the export command.
This is turning out be NOTES.
OK, I used the export functionality. Initially, it says it will take 30 minutes… but it ended up taking 5 minutes. So, you need to ignore the time it gives you to completion. Otherwise, you will be scared off like me. It was on 0% for a long time at the beginning.
Now, to import it again. You get the option to rename the machine and change the location of the files. Importing took 4 minutes.
So, just export and import your virtual machine. It will be good.
Jul
21
What is an RFC?
Filed Under Computers, Networking | Leave a Comment
RFC stands for Request for Comments. Basically, it is where Internet communication standards are formalized.
http://www.ietf.org/rfc.html
It is an important organization and function. Otherwise, none of the devices on the web would be able to work together and communicate.
The standards are put up for comment – basically published to the Web. Then many big companies and smart individuals express their opinion. After several years a standard is agreed upon (by the important people) and all companies try to follow it. It is voluntary, but you would be crazy to go your own way without a lot of cash.
Jul
18
Bigip version 11 Beta Testing
Filed Under F5, Networking, Virtualization | Leave a Comment
I have been playing with Bigip version 11 beta. The GUI is better looking than version 10. There are some interesting new features. Several of my customers would probably like the dns express, for example. Basically, the box becomes one of the fastest DNS responders in the market.
Generally speaking, I do not immediately upgrade my clients to the latest build when major releases come out. However, I just wanted to take a quick look and decided to test it out on my favorite ESXi server.
The upgrade from 10.2.2 went smoothly. I just uploaded the .iso file to the box using the Web GUI, and then I applied it to the other boot slot. After a reboot, I was in version 11. My stuff worked. (Granted, I only had a few vips)
