Update on OpenFiler

Monday, March 19, 2007 0

Just an update on my old post on OpenFiler.

I wanted to give iSCSI a play and I notices that the OpenFiler people have a prebuilt VMware appliance, sweet.

Downloaded the appliance, created a new virtual disk and presented it via iSCSI. I then install the Microsoft Initiator and connected things up. It was reasonably straight forward. The only two issues I had was you can't configure anything as iSCSI until you setup some networks and the chap authentication caused me grief for quite a while. Lastly there is very little debugging or technical info for OpenFiler, that I could find. However, I did get it running in a short period of time, that was cool.

A greener data center

0

Having build a few smaller data centers in my time I have an interest in the topic.

Came across a new organisation The Green Grid. They have an interesting white paper on Guidelines for efficient data centers.

There is some interesting info and its very practical. There is the usual stuff like hot and cold aisles and so on. However it does mention some issues which I have seen in the field but you don't hear reported much such as this one.

COORDINATION OF AIR CONDITIONERS
Many datacenters have multiple air conditioners that
actually fight each other. One may actually heat while
another cools and one may dehumidify while another
humidifies. The result is gross waste that may require
a professional assessment to diagnose.

Google, phpBB and session ids

Tuesday, January 30, 2007 0

Okay, an update on that hungry googlebot sucking bandwidth. I ended up emailing google to ask them to slow the bot and got an interesting reply. According to google its the session ids that cause 'problems for our robots'. Google refer you to a posting about removing session ids from phpBB.

Only problem is that the information they refer to is from 2004. If you look over at the phpBB forum there is a knowledge base article "Why doesn't google spider my forum?" This again refers to session ids and has the same code changes, but this one is from 2002.

As its now 2007 phpBB has changed a bit since 2002 and 2004, so its not so simple a change to make. Here is a quick summary of what I found and what I did.

First the change they suggest to sessions.php will not work as expected any more. The line of code they change does not exist any more, its been broken up. Also further down the code after their change the function updates or sets the session data. If it can't do that it sets the session id to a md5 hash of a random number. Also the code for the pages is a bit naughty and references both the global $SID and session_id in the $userdata hash.

My solution to all of this was to at the start of the function session_begin to declare the gobal $HTTP_SERVER_VARS; and the right at the end of the function before it assigns all of the data to the $userdata hash overwrite the session_id there.

Therefore the end of the function now looks like this, the first lines in blue are the new code.



if (strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'Googlebot')) {

$session_id = '';
}

$userdata['session_id'] = $session_id;
$userdata['session_ip'] = $user_ip;
$userdata['session_user_id'] = $user_id;
$userdata['session_logged_in'] = $login;
$userdata['session_page'] = $page_id;
$userdata['session_start'] = $current_time;
$userdata['session_time'] = $current_time;
$userdata['session_admin'] = $admin;
$userdata['session_key'] = $sessiondata['autologinid'];

setcookie($cookiename . '_data', serialize($sessiondata), $current_time + 31536000, $cookiepath, $cookiedomain, $cookiesecure);
setcookie($cookiename . '_sid', $session_id, 0, $cookiepath, $cookiedomain, $cookiesecure);

$SID = 'sid=' . $session_id;

return $userdata;

Only problem I can conceive might happen is that it will insert all of the session keys into the database earlier on and they might not get cleaned up. Doubt it but something to check. In a day or two I should be able to see how it goes.

I tested with wget and without forcing the user agent to googlebot I got ?sid= with real id's in just about every URL in a page. With the user agent specified the same get to a page had an empty ?sid= value. So it looks a go.

Hungry Googlebot

Tuesday, January 09, 2007 0

Wow that Googlebot can get hungry. I admin a site and forum for my 4WD club and the google crawler has been sucking up the bandwidth big time. Over 60% of the bandwidth for the month is from Google. If you do a search for "excessive bandwidth usage by googlebot" it turns out there are a few people who are having the problem.

The forum is phpBB so there is a lot of cruft around the messages which has to get sent to Google with every page that they request, even though the content is only small. Some clever person should write some code that determines if the browser agent is a bot and only return a very simple version of the page with the key content and some simple links for it to follow.

Well hopefully it will finish indexing soon ...

DNS mystery

Wednesday, November 29, 2006 0

DNS has always been an interest of mine. Reading all of O'Reillys DNS book and joining AUDA when it was founded in Australia. Still a lot of people just don't get it ...

I stumbled across a good DNS checker site http://www.dnsreport.com/. You type in your domain name and it does a full test of everything, especially the MX and mail servers. In the times of increasing SPAM the setup of your MX is becoming real important as many providers are becoming so pedantic about everything being just right.

Its because of MX problems that I found this site. My provider, godaddy.com have some nasty email requirements. Sometimes people get a "553 Bogus helo" errors when emailing people hosted at GoDaddy. Can be hard to get this fixed.

Powered by Blogger.