Pages

Thursday, November 08, 2007

Why iSCSI

I really should blog more. Oh well.

Its hard to have a conversation these days without someone trying to convince me that Fiber Channel is dead and iSCSI is where it is at. Whilst I agree with this in the long term I don't see it in the short term. Sure anyone can get iSCSI working in a lab hacking a few free things together, but thats not comparing apples with apples. You need to compare deploying iSCSI in a best practice environment to a FC one.

Here is my current argument, email me if you think I am wrong or have missed something.
  • Fiber Channel switch costs. If you only have two hosts, which can be common in a VMware environment, you can get away with no FC switches. Just connect each port of dual dual HBA to each storage processor on you SAN. If you have more than two hosts then you are going to need some FC switches, say $3.5K each. If you are following best practices for your iSCSI you would have a separate switching infrastructure for the storage, these are going to cost you about the same. Let alone what a 10G switch will cost.
  • HBA cards. Well a dual port HBA card is going to cost you about $2K. Two high end network cards with TOE are going to cost you about $1.6K, thats not a great saving.
  • A lot of the arguments are based on 10G Ethernet. People are not already running this so they are going to have to go out and purchase bleeding edge switches and cards compared the now commodity fiber channel equivalents.
Now there are lots of other elements to consider but for the big ticket items for the size of installations I see (two to ten servers running VMware) the cost difference is not drop dead argument of iSCSI is so much cheaper than FC.

If it was not for VMware I think FC would be dead a lot sooner than it otherwise would have been, its still got some legs for a while.

Monday, March 19, 2007

Update on OpenFiler

Just an update on my old post on OpenFiler.

I wanted to give iSCSI a play and I notices that the OpenFiler people have a prebuilt VMware appliance, sweet.

Downloaded the appliance, created a new virtual disk and presented it via iSCSI. I then install the Microsoft Initiator and connected things up. It was reasonably straight forward. The only two issues I had was you can't configure anything as iSCSI until you setup some networks and the chap authentication caused me grief for quite a while. Lastly there is very little debugging or technical info for OpenFiler, that I could find. However, I did get it running in a short period of time, that was cool.

A greener data center

Having build a few smaller data centers in my time I have an interest in the topic.

Came across a new organisation The Green Grid. They have an interesting white paper on Guidelines for efficient data centers.

There is some interesting info and its very practical. There is the usual stuff like hot and cold aisles and so on. However it does mention some issues which I have seen in the field but you don't hear reported much such as this one.

COORDINATION OF AIR CONDITIONERS
Many datacenters have multiple air conditioners that
actually fight each other. One may actually heat while
another cools and one may dehumidify while another
humidifies. The result is gross waste that may require
a professional assessment to diagnose.

Tuesday, January 30, 2007

Google, phpBB and session ids

Okay, an update on that hungry googlebot sucking bandwidth. I ended up emailing google to ask them to slow the bot and got an interesting reply. According to google its the session ids that cause 'problems for our robots'. Google refer you to a posting about removing session ids from phpBB.

Only problem is that the information they refer to is from 2004. If you look over at the phpBB forum there is a knowledge base article "Why doesn't google spider my forum?" This again refers to session ids and has the same code changes, but this one is from 2002.

As its now 2007 phpBB has changed a bit since 2002 and 2004, so its not so simple a change to make. Here is a quick summary of what I found and what I did.

First the change they suggest to sessions.php will not work as expected any more. The line of code they change does not exist any more, its been broken up. Also further down the code after their change the function updates or sets the session data. If it can't do that it sets the session id to a md5 hash of a random number. Also the code for the pages is a bit naughty and references both the global $SID and session_id in the $userdata hash.

My solution to all of this was to at the start of the function session_begin to declare the gobal $HTTP_SERVER_VARS; and the right at the end of the function before it assigns all of the data to the $userdata hash overwrite the session_id there.

Therefore the end of the function now looks like this, the first lines in blue are the new code.


if (strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'Googlebot')) {

$session_id = '';
}

$userdata['session_id'] = $session_id;
$userdata['session_ip'] = $user_ip;
$userdata['session_user_id'] = $user_id;
$userdata['session_logged_in'] = $login;
$userdata['session_page'] = $page_id;
$userdata['session_start'] = $current_time;
$userdata['session_time'] = $current_time;
$userdata['session_admin'] = $admin;
$userdata['session_key'] = $sessiondata['autologinid'];

setcookie($cookiename . '_data', serialize($sessiondata), $current_time + 31536000, $cookiepath, $cookiedomain, $cookiesecure);
setcookie($cookiename . '_sid', $session_id, 0, $cookiepath, $cookiedomain, $cookiesecure);

$SID = 'sid=' . $session_id;

return $userdata;

Only problem I can conceive might happen is that it will insert all of the session keys into the database earlier on and they might not get cleaned up. Doubt it but something to check. In a day or two I should be able to see how it goes.

I tested with wget and without forcing the user agent to googlebot I got ?sid= with real id's in just about every URL in a page. With the user agent specified the same get to a page had an empty ?sid= value. So it looks a go.

Tuesday, January 09, 2007

Hungry Googlebot

Wow that Googlebot can get hungry. I admin a site and forum for my 4WD club and the google crawler has been sucking up the bandwidth big time. Over 60% of the bandwidth for the month is from Google. If you do a search for "excessive bandwidth usage by googlebot" it turns out there are a few people who are having the problem.

The forum is phpBB so there is a lot of cruft around the messages which has to get sent to Google with every page that they request, even though the content is only small. Some clever person should write some code that determines if the browser agent is a bot and only return a very simple version of the page with the key content and some simple links for it to follow.

Well hopefully it will finish indexing soon ...