Php Zend's Lucene highlighter and unicode

344

This one is driving me mad. I am trying to get search results out of Lucene, but it just won't behave. Here is what I am doing:

$userQuery = Zend_Search_Lucene_Search_QueryParser::parse($_GET['query'], 'utf-8');
$search->results = $this->index->find($userQuery);

Then I retrieve hits and on each hit I am trying to highlight matches.

$html = $query->highlightMatches($hit->body, 'utf-8');

I am searching for "attività": it will find a correct hit, but it won't highlight anything, and it will output the complete text, together with correct accents (so I see the "attività" word unhighlighted).

If I omit the 'utf-8' parameter in highlightMatches, it will highlight the attività word, but the "à" character is truncated from the output, so it will display "attivit". The output string in this case is ASCII encoded.

What is wrong??! My pages are utf-8 encoded. I add document with the following logic:

// Following two lines are at the initialization so they hold for all code
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('UTF-8');
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
  new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive ()
);
...
$doc->addField(Zend_Search_Lucene_Field::Text($fieldName, $fieldValue, "UTF-8"));
...

Any help greatly appreciated!!

300

Answer

Solution:

I encountered the same problem when using the Zend Lucene Highlighter. It appears, Zend wants to convert the string to highlight to UTF-8 using iconv before returning it to your view or further processing.

In my case, iconv could not detect that my string was already UTF-8 and thus failed converting. The following line of code of Zend_Search_Lucene_Anaylsis_Analyzer_Common_Utf8 in the reset() function caused the problem:

$this->_input = iconv($this->_encoding, 'UTF-8', $this->_input);

I simply commented it and then it worked. Since there is no conversion going on anymore, there is nothing to fail converting.

I hope it helps.

People are also looking for solutions to the problem: php - Why won't preg_match work?

Source

Didn't find the answer?

Our community is visited by hundreds of web development professionals every day. Ask your question and get a quick answer for free.

Ask a Question

Write quick answer

Do you know the answer to this question? Write a quick response to it. With your help, we will make our community stronger.

Similar questions

Find the answer in similar questions on our website.