Ruby

How to write code efficiently (and this has nothing to do with code)

Posted on Updated on

The world is falling down with books, methodologies, techniques, tips and tricks on how to be more efficient in life, work, and everything else you can think of. What isn’t often discussed is how the desktops, both real and virtual can be tailored to act as a strong foundation, and take things to the next level.

I’ve been thinking about this ever since I tidied my (physical) desktop a couple of weeks ago. Wow.. what a difference. Not my strength keeping that tidy, but it got me thinking about my virtual desktop too. As a Ubuntu User with multiple virtual desktops available to me, I’ve always had a strong sense of standard placement of specific applications but after some thought, have taken that to the next level.

The Physical Desktop

I love my latpop, and there are times when I get in the habit of sitting in front of the fire for a few days with it on my lap, but nothing beats the productivity benefits of a desk, a second monitor, a real keyboard and mouse, kick ass sound, and somewhere to set your coffee.

desk

I’m not really going to say much more about it than that  – the pictures says 1000 words.

The Virtual Desktop(s)

Now things get interesting.  If you are a Windows user then unless things have changed since I last braved using one, you are SOL when it comes to virtual desktops.  For Mac and Linux uses, multiple virtual desktops are things that we’ve been using (or haven’t bothered to use) for years.

Anyhoo. Ubuntu and Mac users can have as many virtual desktops as they like.  Not sure about Mac, but with Ubuntu you can configure how they are arranged, and maybe because of my old Cube days, I like 4 virtual desktops side by side, and configured so you when you get to one end you wrap straight around to the other.

That means I have four full desktops that I can flick back and forward between, and each has 2 monitors.. 8 distinct areas.

So the secret to making this a haven of joy and efficiency is always keeping things in the same place relative to each other.  For example,  say all you use is a browser and Word Processor all day long as part of your core job, then an email client, and a music player.

You might do something like this:

Virtual Desktop 1

Laptop: Music Player

Monitor: Email

Virtual Desktop 2

Laptop: Browser

Monitor: Word Processor

Why is that good?  If you are working on a document, and you need to send an email you know that the email client is just over there on the Virtual Desktop to the right.  Not a great example, but what happens when things get a bit more complex.  I’m a rails developer, and at the very least that involves:

  • a browser
  • a terminal running the rails server (and putting out useful information)
  • an editor
  • a terminal running a rails console (used constantly while writing code)
  • often a MySQL GUI

As if this isn’t enough, in development one is typically working with many open files at the same time, so multiple tabs within the editor.  It’s this last part that made me really rethink my old strategy and for the first time move editing and the browser to different virtual desktops.. and try something that has turned out to be incredibly valuable.

I now have 4 different editors open, with the folder tree in each open to a specific folder of a rails project.

  1. Models
  2. Views
  3. Controllers
  4. Project Root (for other.. migrations, configs, helpers, css / js)

Each of these editors

  • is not full screen
  • has a corner visible no matter what (so you can get to it with a click)
  • is always in the same on the screen (I go M/V/C/Other clockwise starting top corner).

It’s all kinds of awesome.  The result of this separation of browser and editor left some really great gaps for other things.  Here’s a rundown of my 4 Virtual Desktops going left to right.

face1

face2

face3

face4

I have to say after running with this for a couple of weeks, I can’t imagine going back.  My fingers have absolutely learned things like CTRL-S, CTRL-ALT-LEFT,F5 (Save code, move left one desktop, refresh browser) and having the Rails server beside the browser and the rails server beside the editor makes so much sense.

Give it a go!

Advertisements

Fast Reliable Proxy Servers for page scraping and page spider anonymity.

Posted on Updated on

Understanding Proxy Servers. What are they for? Why would you use one?

Proxy servers are really used for one of two things:

1) A stepping stone.
Think of a Proxy Server as a step between you and where you want to go.  For example, you can configure a proxy server in most browser configurations. The result is something like this:

– you want to view google.com
– your browser sends the request to the proxy server
– the proxy server pulls up google.com
– the proxy server tunnels whatever it gets back to you.

Typical uses:

– ANONYMITY. Google.com doesn’t see your address in the request. It just sees the Proxy Server.   Of course there is no real anonymity on the web – but we are talking about what someone sys admin at google can tell about the traffic.
– SPOOFING LOCATION. If you are in the US, and there’s a website in the UK that is locked down to only allow people in the UK to access content, then by using a UK Proxy Server, that website thinks you are in the UK and you are good to go.

2) A Gateway

This is frequently more of a corporate thing, where a company comes up with various reasons why they need a Proxy Server, but in general it’s so they can control the crap out of you. If you work for a big company then while on the internal network you might only be able to surf the internet through corporate Proxy Server. There are legit and useful reasons to do things this way, but more often than not, a biggie is that with all traffic tunneling through a single point it’s much easier for the IT dept to block access to certain sites or make sure you aren’t watching donkey porn at work.

For this post, we are more interested in (1), the stepping stone, and what options you have, and that really depends on what you are doing.

TYPE A: FREE (or pay once) PROXY SERVICES

If you are on vacation in Mexico and you want to use http://www.hulu.com, then the best thing to do is go to somewhere like http://hidemyass.com/proxy-list which publishes lists of Open Proxy Servers. The more recent the listing, the more likely it’s going to work.

Results will vary. A proxy might work, or it might not, and it might be horribly slow. For sure, a proxy that works today probably won’t work tomorrow.  You also have no idea WHY this proxy is available, and it’s definitely a possibility that it’s sniffing whatever traffic you are putting through it, so not the best time to Instant Message your credit card details to someone.

If these services have a Premium offering, the basic message is the same – you just get the the list delivered to you in a better format.

TYPE B: PAID PROXY SERVICES

There are a bunch out there, and if you are doing some sort of web scraping or web spidering or similar where reliability and performance is important to you, there is no other way to go.  PAID proxies typically ensure that they can keep up with needs by charging by throughput . That makes them a BAD idea for your Vacation Property in Mexico for Hulu / Netflix because media bandwidth adds up quickly.

Here are some of the things that differentiate offerings.

1) Revolving IPs

This is a good thing. Basically this means that you always call the same proxy server, but every time that proxy server makes an outbound call it cycles round-robin through a bunch of IP addresses.. so if there are 10 revolving IP addresses and you hit a page 10 times, then in theory each time is with a different IP.

Typically the IP addresses are sequential, so it’s not exactly rocket science for someone on the receiving end to see what is going on, but it’s still a bonus.

2) Multiple Proxy Servers

A decent service will give you credentials for a number of proxy servers, possibly in different countries. That means that you can round-robin call each of them from your application to further tangle the path between you and your final destination (See this post for an example of how you can easily do this with Ruby).

3) Short Term IP use

Not only do decent services round-robin between different IP addresses, they often throw those IP addresses away periodically and start with new ones. The advantage here is that if someone sees incoming traffic and blocks an IP, then that block is only good until the Proxy Service throws that IP away.

A really good PAID proxy service. ProxyMesh.com

It’s been about 6 months since I really dug in, but I really like ProxyMesh.com

– Their prices are reasonable starting at $1 / gig.
– They have highly maintained US and UK proxies with revolving Short Term IPs
– They also manage a list of Open Proxie servers (like the ones listed at hidemyass.com) but THEY manage the list on their end. You just call the same ProxyMesh Proxy, and they farm out the request to any one of hundreds of proxy servers – and that list is very fluid as proxy servers come and go worldwide. These proxy servers are of course much less reliable than their core service, but offer significantly more anonymity.

That’s all folks.

Build your own Linux / Ubuntu System and Network Health Monitor Application – Part I

Posted on Updated on

At home I have various devices doing various things.. and it’s important to me that I know they are working. There are many tools out there designed to keep an eye on server health, but they don’t do everything that I want in the way that I want, and I’m a big believer that coding = creating, so there’s nothing wrong with reinventing the wheel just for the sake of it.

For me it started recently when I repurposed an old android phone as an IP webcam pointing out the front window. After that it was a natural progression to install the excelent linux app Motion to pick up movement from that feed, and store images and mpegs to disk – for security. Well no point in storing the images on the nice machine that is likely to be stolen in event of a break in.. so let’s store them on an old EEE-Box that’s running headless in a hidden corner of the basement 😉

Well there you have it. Definite needs for a frequent health check:

– is the camera working?
– is the EEE nfs mount mounted?

I wrote a ruby script and cron to solve that problem, but then it just grew. Now I have the system running on multiple machines:

– checking each other
– making sure all my websites are responding to ping
– validating that certain URLs are giving a 200 response
– making sure that disks aren’t filling up
– checking that certain processes / background applications are running

On top of that:

– my laptop knows if I’m at home or away and tests accordingly
– the system creates Desktop icons for each problem (and removes them when problems resolve)
– it can generate Ubuntu desktop notifications
– it can notify me of issues with a text message
– each machine manages hostname|issue style files on Dropbox so whereever I am my laptop can tell me what’s  going on at home.
– it shuts up at night
– for every problem it finds, it checks to see if there are instructions to try to correct it

All this is achieved with:

– 200 lines of core ruby methods that perform all the tests
– 50 lines of code relating to specific issue resolution
– 100 lines of control code –  one liners which are basically “If you are this machine, then do this test”

In Part II we’ll take the very basics of that code, and create a simple single ruby script to keep an eye on a machines own health.

How to add some anonymity to your data scraping Ruby and Rails apps

Posted on Updated on

I have no idea how often people ACTUALLY look at their logs looking for someone scraping their pages, but sometimes you want to just fly under the radar.  I generally don’t agree with stealing web content by scraping, but I do believe that if someone is in the data distribution business, but they suck at it,  it’s ok to bend the rules a little. For example, if they offer an RSS feed that is buggy, slow, huge etc. but their homepage offers the same information more reliably – go for it.

There are basically two things at play here:

  1. Spoofing a user-agent  (pretending to be a browser not a script)
  2. Spoofing the source of the request.

Here’s a little function you can call to get a random user agent, based on a list of really common user agents.. Thanks to the guy who posted this to a blog, sorry, I don’t have the reference anymore.

def self.random_desktop_user_agent
    user_agents = [
      "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",
      "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20100101 Firefox/16.0",
      "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/536.26.17 (KHTML, like Gecko) Version/6.0.2 Safari/536.26.17",
      "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)",
      "Mozilla/5.0 (Windows NT 5.1; rv:13.0) Gecko/20100101 Firefox/13.0.1",
      "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts; .NET CLR 1.1.4322; PeoplePal 6.2)",
      "Mozilla/5.0 (Windows NT 5.1; rv:5.0.1) Gecko/20100101 Firefox/5.0.1",
      "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.112 Safari/535.1",
      "Opera/9.80 (Windows NT 5.1; U; en) Presto/2.10.289 Version/12.01",
      "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11",
      "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)",
      "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) )",
      "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)",
      "Mozilla/5.0 (Windows NT 6.1; rv:2.0b7pre) Gecko/20100921 Firefox/4.0b7pre",
      "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322)",
      "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11",
      "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 3.5.30729)",
      "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11",
      "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.112 Safari/535.1",
      "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",
      "Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.0) Opera 7.02 Bork-edition [en]",
      "Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20100101 Firefox/5.02",
      "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",
      "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0",
      "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MRA 5.8 (build 4157); .NET CLR 2.0.50727; AskTbPTV/5.11.3.15590)",
      "Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0",
      "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",
      "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0",
      "Mozilla/5.0 (Windows NT 6.1; rv:16.0) Gecko/20100101 Firefox/16.0",
      "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.57.2 (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2",
      "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)",
      "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.91 Safari/537.11",
      "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/536.26.17 (KHTML, like Gecko) Version/6.0.2 Safari/536.26.17",
      "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0",
      "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Firefox/17.0",
      "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; TencentTraveler ; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 2.0.50727)",
      "Mozilla/5.0 (iPad; CPU OS 6_0_1 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A523 Safari/8536.25",
      "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6",
      "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",
      "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:16.0) Gecko/20100101 Firefox/16.0",
      "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11",
      "Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/20100101 Firefox/17.0",
      "Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/20100101 Firefox/17.0"]
   return user_agents.sample
  end

I usually put such things in a model utility.rb so I can just call it with Utility.random_desktop_agent

That takes care of user agent, now on to proxy.  You’re going to go out and get your own list of proxies.. whether you get some reliable free ones or pay for services.  The best services you call a single proxy of theirs, and it will then cycle through a bunch of IP addresses with the call.  Each call, through a different one round robin.  They then dump those IPs every 30 minutes or so.. not bad.

def self.random_proxy_server
    proxies = [["proxy_server1","proxy_port", "proxyuser_if_authenticated", "proxypassword_if_authenticated"],["proxy_server2",
      "proxy_port","proxyuser_if_authenticated", "proxypassword_if_authenticated"]]
]] return proxies.sample end

Again – I put that in the Utility.rb model.

So put it all together..

Calling a page looks something like this with open-uri

proxy = Utility.random_proxy_server
open( url :proxy_http_basic_authentication => ["#{proxy[0]}:#{proxy[1]}", "#{proxy[2]}", "#{proxy[3]}"], "User-Agent" => Utility.random_desktop_user_agent)

That’s about it.

K