Thursday, December 11, 2008

Hebrew SEO: גוגל, עברית, וה"א הידיעה


נראה ש-google לא מתייחס נכון לה"א הידיעה.
אם נחפש לדוגמא הממשלה נקבל בתוצאות את "פורטל השרותים של ממשלת ישראל" אבל גוגל מסמן בבולד רק את המילה הממשלה ולא את המילים ממשלה או ממשלת שהן הטיות חוקיות לחלוטין.
בכיוון ההפוך, אם נחפש ממשלה נקבל תוצאות אחרות לגמרי, כאשר המילים ממשלת וממשלה מודגשות אבל המילה הממשלה אינה מוגדשת.
לפי ההבדלים בתוצאות החיפוש נראה שהבעיה קיימת לא רק בהדגשה אלא גם באינדוקס.
בדקתי גם בחיפוש לפי הילדים מול ילדים ומילים נוספות. הדבר נראה נכון גם לגבי live של מיקרוסופט (בלי לדבר על yahooשבכלל לא טרחו לתמוך במורפולוגיה עברית)
המסקנה,לצרכי אופטימיזציה במנועי החיפוש של גוגל כדאי להגדיר את מילות המפתח של האתר עם ובלי ה"א הידיעה.



Thursday, November 20, 2008

drupal memcache items disappearing from the cache

The scenario: you are using memcached as your Drupal cache system. You do a cache_set for some item. You do a cache_get for this item. The item is there. You do a cache_get again after a few seconds/minutes, the item is not there.

You might be experiencing this issue of the memcache module: In short it means that the memcache module flushes the entire cache cluster whenever someone does a wildcard cache_clear_all. Since it's very likely that all your bins are in a single cluster you get to a situation that whenever someone-on-the-other-side-of-the-world does something that is probably completely unrelated to you, your item will still get flushed from the cache.

Note that since flush does not activly removes the items from the cache but rather marks them as expired you will not see any memcached stats that will hint to this issue.

I have commented on the issue. Here are some highlights:
...we are currently using gaolei's patch which is fast albeit expensive. We are using it on a large production system that depends heavily on memcache and we did not see any problems yet. (thank you gaolei).

We have tried to implement a lock-add-unlock scenario such as was suggested in several comments but this will be a definite nightmare for high-traffic/high-update sites...

I would suggest letting the user choose which flush mechanism to use: the current one, or the salt one. I see no reason why the module's developers should decide for me. On a site with little updates and little memory i would prefer to have the whole memcache flush; On a site with many updates and tons of memory i'm willing to sacrifice space for the sake good performance. a simple memcache_flush_method variable would do the job just fine.

Monday, November 10, 2008

MYSQL Query Optimization: Avoiding ORs

So, you've probably heard that using ORs in your queries is heavy and should be avoided when possible. Here is a live example. We had the following query running on a medium sized table in MYSQL (around 200K rows):

SELECT count(*) FROM interactions p
WHERE (
(p.employer = 69 AND p.flag1 = 1)
OR
(p.employee = 69 AND p.flag2 = 0)
)

Let's think of interactions as a table holding the interaction between employers and employees, and this query should tell the user the total number of interactions where user number 69 is involved.

The query looked pretty innoccent on an idle MYSQL server taking around 200ms, but when the server got busy the query execution time was reaching 5-6 seconds.

Altough we did have indexes on interactions.employer and interactions. employee doing EXPLAIN showed that MYSQL was not using them.

After some digging and fiddling and trying we ended up with:

SELECT count(*) as count FROM (
SELECT 1 FROM interactions p
WHERE (p.employer = 69 AND p.flag1 = 1)
UNION ALL
SELECT 1 FROM privatemsg p
WHERE (p.employee = 69 AND p.flag2 = 0)
) t2

This query took on the busy server less then 250 milliseconds and less then 80ms on the idle server. Doing EXPLAIN showed that MYSQL was doing two queries, and using the correct index on each one.

After some more digging we noticed that if we EXPLAIN the original query on the idle server the optimizer occasionally converts it to a UNION, but for some reason this did not always happen and in any case took about twice the time then the UNION query.

To sum up the results here is a simple table

Query typeServer activityQuery Time
Using ORIdle200ms
Using ORBusy5500ms
Using UNIONIdle78ms
Using UNIONBusy189ms


Conclusion: always consider alternatives to OR, but make sure you check them well against real-time examples.

A Further Note: When timing queries in MYSQL alway use the optimizer hint /*! SQL_NO_CACHE */ to make sure you are not getting results from the query cache (if you have one set up).

Wednesday, September 10, 2008

Internet Explorer does not support http vary header

It was always assumed, but I just saw this quote from Microsoft:

Internet Explorer does not fully implement the VARY header per Requests for Comments (RFC) 2616. The Internet Explorer implementation of VARY is that it does not cache any data except for Vary-Useragent

Tuesday, September 9, 2008

Internet explorer cannot open the internet site: operation aborted

During an integration with a third party provider we were unfortunate enough to get the error
Internet explorer cannot open the internet site http://localhost: operation aborted
Yes, we have read http://support.microsoft.com/kb/927917 and did move the 3rd-party's script to be just before the </BODY> tag but to no avail.

After a long and hard effort by this provider they pointed out that while the view-source indeed looked like the script is just before </BODY> when looking at the DOM itself (they used Dominspector, I used the IE Developer Toolbar) it showed that the script was inside a <DIV> element.

As it turns out there was some code on the page that wrote an unclosed <DIV>. This made IE "fill in the blanks" and guess (wrongly) where it should place the closing </DIV> Fixing this javascript made the error go away on most pages. But not all.

I found out that the broken pages used some JQuery plugins (tooltips, jqmodal, etc.) to produce fancy decorations and effects. These scripts attached to $document.ready and did $('body').appendTo(...). This effectively added a couple of items to the DOM between the 3rd-party script and the </BODY>. I am still not quite sure why this should cause IE to choke but since we have a tight schedule we simply changed those scripts to add their stuff to some other elements. This indeed solved out problem.

They funny (or maybe sad) thing was that on Firefox we did not get any error but the page simply disappeared. Chrome worked just fine.

Tuesday, August 26, 2008

Misplaced elements with position:relative

We had a situation where we had three <DIV> elements in a column and the middle element had content dynamically loaded into it after the page has completed loading. When the middle element finished loading, some of the content of the last element seemed to forget its parenthood and floated nicely over the second element.


elements with postion relative not placed properly after dynamic content loading

Naturally this only happened in Internet Explorer (what else).

After a lot of digging we noticed that the misplaced content of <DIV>#3 had a position:relative style (indicated as #4 in the image above). It is "well known" that IE does not handle relative positioning properly when the layout of the page change.

Luckily I chanced upon an article by Holly Bergevin et al. called On having layout which was enlightening. Understanding that IE will only respect the element's state if it have the hasLayout property quickly solved the problem.

In our specific case I used display:inline-block to force <DIV>#3 to have hasLayout set to true, which made IE respect the element's style and re-render the element's content (i.e. <DIV>#4) after re-positioning it. My only concern now is what effect this might have on rendering performance, but it will have to do for now.

Tip: you can use IE Developer Toolbar to check if an element hasLayout. It will show the hasLayout property as set to -1 if hasLayout is true.

Further credit is due to Ingo Chao who wrote relatively positioned parent and floated child – disappearance.

Monday, July 14, 2008

Absolute vs. Relative URLs and SEO

I've seen several places where people suggest that one should use absolute URLs (http://domain.com/page) to do internal linking instead of relative URLs (/page). Those people relate to a google article where it is quoted that
We also suggest you link to other pages of your site using absolute, rather than relative, links with the version of the domain you want to be indexed under. For instance, from your home page, rather than link to products.html, link to http://www.example.com/products.html . And whenever possible, make sure that other sites are linking to you using the version of the domain name that you prefer

This seems to me a classic example of taking things out of context. If you read the entire article you can see that Ms. Fox is talking about a specific case where your site can be accessed by more then one domain name (e.g. http://www.domain.com and http://domain.com).

There is nothing google ever wrote that i could find that say that absolute URLs are better if your site is only accessed by one domain name.

There is one exception i can think of: if your domain is "coolstuff.com" for example and you do use absolute URLs, then the word "coolstuff" will appear in your pages alot. This might be something that may boost your ranking with regards the the word "coolstuff". But this is just a guess.

More reading: Google Canonicalization Problems? - Crawling, indexing, and ranking | Google Groups

Please note: this is my personal opinion. I've had at least two SEO experts that claim that absolute URLs will give better performance on google ranking. Since I'm strong-headed I will hold to my opinion until I'll be proven (with numbers) otherwise. Feel free to comment with supporting or conflicting opinions and data.

Sunday, July 6, 2008

Save binary file in Tcl under V6

This issue pops up every couple of years. It's nothing new, but still worth documenting.

The Challenge: To save a posted file to a file system, not using SUBMIT_STATIC_FILE.

The Problem: Vignette V6 pre-process the form submitted data so that any <input type="file"> are encoded as an hex string. If you ERROR_TRACE the variable for the file, you will see a string that looks like 0x1D00ABADD1D9....

The Solution: binary format to the rescue. and with a bit of trimming, you get it all in a few lines:

proc save_files { } {
#FileData,FileExtention,Field1 are posted from the <form>

set filename [generate_filename]
set bin_filename "${filename}.[SHOW FileExtention]"
set xml_filename "${filename}.xml"

### This will save a binary file.
set file [open $bin_filename "w"]
fconfigure $file -translation binary
puts -nonewline $file [binary format H* [string range [SHOW FileData] 2 end-1]]
close $file

### This will save a utf-8 xml (text) file
set file [open $xml_filename "w"]
fconfigure $file -encoding "utf-8"
puts $file "<?xml version='1.0' encoding='utf-8' ?>"
puts $file "<formdata>"
puts $file "<Field1>[HTML_ESCAPE [SHOW Field1]]</Field1>"
#...more fields...
$puts $file "<Attachment>$bin_filename</Attachment>"
puts $file "</formdata>"
close $file
}


* Remember to use H* and not h*.
* The example above also saves a text file in utf-8 with extra data from the form
* Make sure your form is enctype=multipart/form-data
* This will not work in Storyserver 4.2.

Thursday, June 5, 2008

Sunday, June 1, 2008

Apache Rewrites and SetEnv

It seems that variables set with Apache's SetEnv (and hence SetEnvIf and BrowserMatch) are being ignored by RewriteRule and RewriteCond. The rumor has it to be a design issue. The only way i could find around this problem is to use the [E= rewrite rule directive.

#Use RewriteRule & RewriteCond as a replacment for SetEnv and BrowserMatch
#SetEnv browser="-ie"
RewriteRule ^(.*)$ - [E=browser:-ie]
#BrowserMatch "IE7" browser="-ie7"
RewriteCond %{HTTP_USER_AGENT} "MSIE 7"
RewriteRule ^(.*)$ - [E=browser:-ie7]
#BrowserMatch "Firefox" browser="-moz"
RewriteCond %{HTTP_USER_AGENT} "Firefox"
RewriteRule ^(.*)$ - [E=browser:-moz]

RewriteCond %{DOCUMENT_ROOT}/cache/index%{ENV:browser}.html -f
RewriteRule ^(.*)$ cache/index%{ENV:browser}.html [L]


PS. If you need to debug rewrite rules you can add the following lines to your httpd.conf:
RewriteLog logs/rewrite.log
RewriteLogLevel 5

Wednesday, May 14, 2008

Annoying FireBug Bug

I came across an annoying bug in firebug. If there is a global variable with the same name as a local variable then firebug does not correctly show the value of the local variable in the tooltips and the watch window.

The most annoying thing is that it has been reported several months ago, but still no fix.

UPDATE: Confirmed Fixed in 1.2 on FF3.

Sunday, March 23, 2008

Drupal SMTP Authentication with PEAR::Mail

If you want to use smtp authentication with Drupal you can use PEAR::Mail with drupal_mail_wrapper.


  1. Install PEAR::Mail. On linux you can just run the following command as root:
     pear install --alldeps Mail


  2. Create a new file, say includes/smtpmail/smtpmail.inc and paste the following code into it:

    <?php
    require_once 'Mail.php';
    require_once 'PEAR.php';

    function drupal_mail_wrapper($mailkey, $to, $subject, $body, $from, $headers) {
    global $smtpmail;

    $headers['From'] = $from;
    if ( empty($headers['To'])) {
    $headers['To'] = $to;
    }
    $headers['Subject'] = $subject;

    $recipients = $to;
    $mail_object =&Mail::factory('smtp', $smtpmail);
    $output = $mail_object->send($recipients, $headers, $body);
    return $output;
    }


  3. Edit your settings.php and add the following at the end:

    $conf = array(
    'smtp_library' => 'includes/smtpmail/smtpmail.inc'
    );
    global $smtpmail;
    $smtpmail= array();
    $smtpmail["host"] = 'your.mail.host';
    $smtpmail["auth"] = TRUE;
    $smtpmail["username"] = 'your_user_name';
    $smtpmail["password"] = 'your_password';



From now on every call to drupal_mail will go through your drupal_mail_wrapper and will be sent using PEAR::Mail with SMPT authentication.

I'm sure some more serious drupaler would have made this a nice module, with configuration screens and the lot, but I'm too lazy today.

Monday, March 17, 2008

"The system cannot execute the specified program" Error

You are trying to run some program on Windows (such as apache.exe or htpasswd.exe) and you are getting "The system cannot execute the specified program" error. This usually means that the program you are trying to run was compiled against DLLs that are not on your system. You should use Dependency Walker to see which DLLs are missing, and then install them.

The Apache 2.x binary windows distribution, specifically, was compiled against the Visual Studio 2005 re-distributable package, which you can download from microsoft.

Further reading (1) and (2)

Edit (Nov 2011):
Following this comment (by Annonymous) please note that you may need to download and use the 2008 package and not the 2005 one: The official download for the Microsoft Visual C++ 2008 SP1 Redistributable Package (x86) or The official download for the Microsoft Visual C++ 2008 Redistributable Package (x64)


Sunday, February 17, 2008

Remote Root Access to MySql

To help me remember:

mysql -u root
mysql> SET PASSWORD FOR 'ROOT'@'LOCALHOST" = PASSWORD('new_password');
mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'password' WITH GRANT OPTION;
mysql> FLUSH PRIVILEGES;
mysql> exit;

Source: ben robison :: Howto: Remote Root Access to MySql

Wednesday, January 16, 2008

phpize: command not found

If you ever get phpize: command not found error when trying to install a PEAR package on a linux system, this is probably becuase the php-devel package is not installed.

Uninstall Google Desktop Gadget

In order to uninstall any Google Desktop gadget you can do the following:
1. Close Google Desktop
2. Open 'My Documents' folder and locate and open 'My Google Gadgets' folder in it
3. Find the gadget you want to remove and delete it.
4. Open Google Desktop again.

(source: uninstall RSStoSpeech Google Desktop gadget - RSS To Speech Gadget | Google Groups)

Thursday, January 3, 2008

document.body.scrollTop in IE

It seems that document.body.scrollTop does not work in IE6 if the document's doctype is defined as
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
To find out the scrolling offset in a way that works for both DTD3 and DTD4.01 you can use
(document.documentElement.scrollTop?document.documentElement.scrollTop:document.body.scrollTop)
Same goes for scrollLeft.
See more at document.body.scrollTop in IE