<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
	<channel>
		
		<title>typo3-media.com: Latest TYPO3 Blog News</title>
		<link>http://www.typo3-media.com/</link>
		<description>Latest News of the TYPO3 Blog</description>
		<language>en</language>
		<image>
			<title>typo3-media.com: Latest TYPO3 Blog News</title>
			<url>http://www.typo3-media.com/fileadmin/tt_news_article.gif</url>
			<link>http://www.typo3-media.com/</link>
			<width></width>
			<height></height>
			<description>Latest News of the TYPO3 Blog</description>
		</image>
		<generator>TYPO3 - get.content.right</generator>
		<docs>http://blogs.law.harvard.edu/tech/rss</docs>
		
		
		
		<lastBuildDate>Mon, 09 Apr 2012 18:10:00 +0200</lastBuildDate>
		
		
		<item>
			<title>Find top caches - Cachemgm 2</title>
			<link>http://www.typo3-media.com/blog/monitor-cache-activity.html</link>
			<description>Follow up post - introducing 2 more features in the cachemgm extension.</description>
			<content:encoded><![CDATA[In the previous post I wrote about the new features in the cachemgm extension regarding the logging of the cache activity.
Now there are two more features available:
<ol><li>Use Tx_Cachemgm_Cache_Frontend_LogablePhpFrontend to also log PHP Caches</li><li>Use the new cli command &quot;cachemgm_top&quot; to watch the top caches</li></ol>
<h3>Find top cache identifiers</h3>
Once you have setuped the Logable Cache Frontends you can simply call:
 <i>cli_dispatch.phpsh cachemgm_top</i>
To see a refreshing list of the top cache activity.
All filter arguments do also apply here - so you can use:
<i>cli_dispatch.phpsh cachemgm_top --filterAction=MISS</i>
to only see the top cache misses.

It gives you something like this:
&nbsp;<img src="uploads/RTEmagicC_cachemgm-top.png.png" height="292" width="579" alt="" />
Read in the next post for what this can be useful...
]]></content:encoded>
			<category>news</category>
			
			
			<pubDate>Mon, 09 Apr 2012 18:10:00 +0200</pubDate>
			
		</item>
		
		<item>
			<title>Don‘t slow applications with caches</title>
			<link>http://www.typo3-media.com/blog/cachemgm-cache-log.html</link>
			<description>Ok the title is a bit provocative - it could have also been something like „select your caches...</description>
			<content:encoded><![CDATA[Ok the title is a bit provocative - it could have also been something like „select your caches wisely“ or „caching is easy - but caching right is difficult“<br /><br />TYPO3 comes with a flexible Caching Framework, backported from FLOW3. The key concept is, that you easily can use caches by selecing a frontend and a backend. With the frontend you decide what you want to save (most likely a variable) and with the backend you decide where to save it. <br /><br />What the framework does not provide out of the box is some statistics for the caches. But this can easily be achieved by using a cache frontend that is able to provide some logs. Read on if you want to know more :-)
<h2>New functionalities in the „cachemgm“ extension.</h2>
<br />Maybe some know this extension already - it was initiated by Kasper and offers functions to check and analyse the cached pages.<br /><br />In the current version on forge this extension comes with two additional functionalities to check the caches offered by the caching framework:<br /><br />
<h3>Overview of the configured caches:</h3>
<img src="uploads/RTEmagicC_cachemgm-be.png.png" height="322" width="444" alt="" />
This module should be self explaining, it simply shows all configured caches and you can check further details.<br /><br />
<h3>Analyse your caches with the cache log:</h3>
The extension comes with an extended VariableFrontend. In addition to the standard frontend it logs all operations for potential evaluations. Inspired by varnish, the logging is happening as fast as possible by using an OS message queue or shared memory. This way it shouldn‘t have significant impact on cache times. But it needs PHP shared memory functionalities compiled into PHP.<br /><br />To use this frontend you need to configure it in your localconf.php like this:<br /><br /><i>$GLOBALS['TYPO3_CONF_VARS']['SYS']['caching']['cacheConfigurations']['extbase_object']['frontend'] = 'Tx_Cachemgm_Cache_Frontend_LogableVariableFrontend';</i><br /><br />Then of course you need some tools to read the log (since the log is not persited). You can use the cachemgm_log cli command for this:<br /><br /><i>php cli_dispatch.phpsh cachemgm_log</i>
<br />And you should get something like this:
<img src="uploads/RTEmagicC_cachemgm-log.png.png" height="156" width="559" alt="" /><br /><br />And after terminating the log reader ( Ctrl+C) - you should get a summary like this:<br /><img src="uploads/RTEmagicC_cachemgm-stat.png.png" height="513" width="277" alt="" /><br /><br />By using the parameters „--cache“ and „--filterUrl“ you can limit the logs that are evaluated to a certain cache and/or request url.<br /><br /><br />
<h3>Dependency Injection Caching Analysis:</h3>
Lets take a small example: A website with an extbase USER_INT object. In this case there are around&nbsp; 150 objects that need to be created by the DI-Container per request.What do you think is faster? Using the „NullBackend“ or the default „DbBackend“ for the „extbase_object“ cache?I did a (really small and not representative) benchmark to that Url (20 requests) with running cachemgm_log - and here are the results:
<b><br />Using Database Backend I got the following result</b>:<br />Time per request:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2223.411 [ms] (mean)<br /><br />And the cachemgm_log showed:<br /><br />Cache &quot;extbase_object&quot;:<br />-------------------------<br />&nbsp;get method:<br />&nbsp;&nbsp; Hits:2997<br />&nbsp;&nbsp; Misses:<br />&nbsp;&nbsp; Hit-Rate:0.996<br />&nbsp;&nbsp; Average GET-Hit time:935ns<br />&nbsp;&nbsp; Overall Hit Time:2.8sec<br />&nbsp;has method:<br />&nbsp;&nbsp; Hits:2966<br />&nbsp;&nbsp; Misses:<br />&nbsp;&nbsp; Overall Time:3.35sec<br />&nbsp;set method:<br />&nbsp;&nbsp;&nbsp; Sucess Writes:<br />&nbsp;&nbsp;&nbsp; Failed Writes:0<br />&nbsp;&nbsp;&nbsp; Overall Write Time:0ns<br /><b><br />Using no cache at all I got the following result:<br /></b>Time per request:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2147.321 [ms] (mean)<br /><br />And the cachemgm_log showed:<br /><br />Cache &quot;extbase_object&quot;:<br />-------------------------<br />&nbsp;get method:<br />&nbsp;&nbsp; Hits:<br />&nbsp;&nbsp; Misses:<br />&nbsp;&nbsp; Overall Hit Time:0ns<br />&nbsp;has method:<br />&nbsp;&nbsp; Hits:<br />&nbsp;&nbsp; Misses:2844<br />&nbsp;&nbsp; Overall Time:954ms<br />&nbsp;set method:<br />&nbsp;&nbsp;&nbsp; Sucess Writes:2823<br />&nbsp;&nbsp;&nbsp; Failed Writes:42<br />&nbsp;&nbsp;&nbsp; Average Write time:364ns<br />&nbsp;&nbsp;&nbsp; Overall Write Time:1.03sec<br />&nbsp;&nbsp;&nbsp; Sucess Writes:<br />&nbsp;&nbsp;&nbsp; Failed Writes:0<br />&nbsp;&nbsp;&nbsp; Overall Write Time:0ns<br /><br /><br />To summarize this: Using the database cache backend we had a total of over 6 seconds spend for reading something out of the cache. The overhead of using the Nullbackend was only 2 seconds (some of this might only because of the cache log actions). That means for the above test szenario its slower to read the informations from cache than to build the information from scretch.<br /><br />This is a general downside of the Database Cache Backend - when used for high frequented caches! Especially when used on high traffic sides or even worse on load balanced environments with a single database this effect could be much bigger.<br /><br />What this shows is, that selecting and benchmarking different Caches is crucial for a high performance application. And of course there are more numbers to watch at - than only the cache times.<br /><br />]]></content:encoded>
			<category>Inside TYPO3</category>
			<category>development</category>
			
			
			<pubDate>Thu, 05 Apr 2012 15:48:00 +0200</pubDate>
			
		</item>
		
		<item>
			<title>TYPO3 Caching and the cHash</title>
			<link>http://www.typo3-media.com/blog/chash-caching-typo3.html</link>
			<description>Explains TYPO3 caching and the need for a flexible cHash calculation</description>
			<content:encoded><![CDATA[<h1>cHash?</h1>
There is a pretty old but nice article from Kasper, that explains the purpose of cHash[1]. 
To summarize this: If a  cHash (or cache Hash) is part of the current URL, and if that cHash is correct, TYPO3 generates a cache entry for the generated content. The cHash is only correct if TYPO3 itself generated the URL.
So with the help of the cHash TYPO3 can cache multiple variants of a page - depending on its request parameters. This is used for example to cache all the detail views of&nbsp;your news records.
<h2>What is the goal of caching</h2>
If we try to look at the problem of caching a bit more generic we can find the following goals:
<ol><li>Content that can be cached should be cached, and we want as less cache entries as possible. To cache content means its likely that the generated content is valid for a certain amount of visitors.</li><li>Parts of the content that should not be cached should not be cached, but the rest of the page should be cached. For example you might have dynamic elements like a shopping basket or a username.</li><li>We never want uncached page rendering. </li></ol>
Lets look at this 3 things more in detail:
<h3>1) Cache generated pages in TYPO3</h3>
Since the content on a page can differ depending of many things, TYPO3 Caches seperate variants based on:
<ul><li>Diffrent usergroups</li><li>the different states of all typoscript conditions. (This depends on your typoscript setup)</li><li>Diffrent values of predefined Core parameters if they are valid (id, type, MP)</li><li>Based on other incoming parameters if the cHash matches. This last point is important to understand - since this is what you need to make your extension output cacheable the way you want.</li></ul>
<h3>2) Have dynamic content</h3>
TYPO3 has the concept to support dynamic elements on a page.<br />It  works in a way, that the page itself is cached, but in the cache there  can be Placeholders that get replaced before the content is send to the  user.
This can be achieved using _INT Typoscript Objects like USER_INT or COA_INT.
Extbase supports uncached actions - but at the end it uses the same concept - since the extbase converts itself to USER_INT if an uncached action is called.
There are third party extensions that modifies the _INT behaviour - so that the dynamic elements are loaded by ajax calls on the client side. Which is a good approach and gives additional performance -  if you can rely on enabled javascript.
<h3>3) Avoid uncached contend:</h3>
That is easy. Just set the disableNoCacheParameter setting in your localconf.php to 1 and the pageNotFoundOnCHashError also to 1.

<h2>cHash details</h2>
Imagine the following Use-Cases: 
<b>News Single Plugin:</b>
You have a single view that displays some data based on a &quot;tx_news_pi1[uid]&quot; parameter. &nbsp;<br />So this is straight forward:  If you have a valid cHash for that parameter TYPO3 caches a separate variant of this page (e.g. id=12&amp;tx_plugin_pi1[uid]&amp;cHash=&lt;correct_news_cHash&gt; )<br /><br /><b>Search Plugin:</b>
Its a different story for a search plugin: You have a search that searches for things based on a query: You are not allowed to have a cHash Parameter then. 
You want this for two reasons: 1) You dont want to cache every search 2) The user enters the searchword - and its hard to generate the correct cHash.
Your URLs need to look like this:<br />&nbsp;&nbsp; id=12&amp; tx_search_pi1[query]=test<br />&nbsp;&nbsp; id=12&amp; tx_search_pi1[query]=test2

TYPO3 will then have only one Cache entry for that page. But your plugin will be a USER_INT and therefore can evaluate this parameter each time and displays the correct search.<br /><br />
<h2><b>current cHash problems and limitations</b></h2>
It is not possible to have this behaviours mixed on one page:&nbsp; If you have elements on your page that should be cached based on some parameters but you still want to have elements that needs to be uncached. And there are many use-cases for this.
To come back to the two examples, imagine you have the search and the news plugin on one page:

Url:&nbsp; <i>id=12&amp;tx_search_pi1[query]=test&amp;tx_plugin_pi1[uid]=1&amp;cHash= &lt;correct_news_cHash&gt;</i>
=&gt; will not work since the cHash is not valid anymore. (It triggers 404 - or delivers uncached page - depending on your settings)
Url: <i>id=12&amp; tx_search_pi1[query]=test&amp;tx_plugin_pi1[uid]=1<br /></i>Will not work since with a missing cHash TYPO3 might deliver the wrong cached content for the news extension.<br /><br />Url:&nbsp; <i>id=12&amp; tx_search_pi1[query]=test&amp;tx_plugin_pi1[uid]=1&amp;cHash=&lt;the_new_correct_cHash&gt;<br /></i>If you manage to gerenate this urls, it will work but you pollute the Cache Table with new entries for every tx_search_pi1 parameter<br /><br />Also the current extbase uncached actions solution is another example for the same problem. Since extbase replaces USER to USER_INT on the fly if an uncached action is called it needs to have the cHash Parameter set. Which results in a new Cache entry for every Parameter combination of your uncached action. This is not what you want.<br /><br />A third aspect of that problem is, that parameters like &quot;L&quot; or other Parameters that might be part of some conditions are used for cHash calculation also. That can cause tricky bugs and uncached pages also. Its not required that parameters, that are evaluated in typoscript conditions require a cHash.<br /><b><br /></b>
<h2><b>The solution: Flexible cHash calculation</b></h2>
The cHash should (and have to) be only calculated and evaluated for parameters that are used in cachable plugins (or actions). &nbsp;<br /><br />For the above example that means:
Url: <i>id=12&amp;tx_search_pi1[query]=foo&amp;tx_plugin_pi1[uid]=1&amp;cHash=&lt;correct_news_cHash&gt;</i><br />and Url:<i> id=12&amp;tx_search_pi1[query]=bar&amp;tx_plugin_pi1[uid]=1&amp;cHash=&lt;correct_news_cHash&gt;</i><br />should both result in a cache hit.<br /><br />Url:<i> id=12&amp;tx_search_pi1[query]=test&amp;tx_plugin_pi1[uid]=1&amp;cHash=wrongcHash</i><br />&nbsp;should throw an error (404) since its a wrong cHash<br /><br />Url:<i> id=12&amp;tx_search_pi1[query]=test&amp;tx_plugin_pi1[uid]=1<br /></i>should throw an error since the cHash is missing but required for tx_plugin_pi1[uid]
Thanks to all the reviewers - especially Tolleiv - the improvement made it into the Core [2] - and you now have the possibility to control which parameters should be used for cHash logic in the install tool:
cHashExcludedParameters: The the given parameters will be ignored in the cHash calculation. Example: L,tx_search_pi1[query]
<br />cHashOnlyForParameters: If set only the given parameters will be evaluat<br />ed in the cHash calculation. Example: tx_news_pi1[uid]. Only use this if you are sure to know all relevant parameters.
cHashRequiredParameters: Configure Parameters that require a cHash. If no cHash is given but one of the parameters are set, then TYPO3 triggers the configured cHash Error behaviour
cHashExcludedParametersIfEmpty: Configure Parameters that are only relevant for the chash if there's an associated value available. And asterisk &quot;*&quot; can be used to<br />skip all empty parameters.<br /><br />]]></content:encoded>
			<category>news</category>
			
			
			<pubDate>Fri, 02 Mar 2012 22:58:00 +0100</pubDate>
			
		</item>
		
		<item>
			<title> SOLR Search Request Handlers explained</title>
			<link>http://www.typo3-media.com/blog/solr-search-request-handlers.html</link>
			<description>With SOLR you can execute complex queries over your indexed documents. Like with other software the...</description>
			<content:encoded><![CDATA[With SOLR you can execute complex queries over your indexed documents. <br />Like with other software the possibilities are grown over time and there are many different configurations and parameters that could be used in order to specify a query in SOLR. <br />This post tries to summarize the main concepts and parameters and how they could be combined with the goal to get a more global picture of how SOLR and SOLRs search request handlers are working - and how powerful they can be.
(Take your time to read the article and have a SOLR Installation by hand if you want to run the examples. I hope this article helps to gain some more insides on how to search with SOLR.)
<h1>Request Handlers</h1>
Lets start from the most basic: In SOLR you have so called Request Handlers that are responsible to answer your request. All RequestHandlers for your SOLR Installation are configured in the solrconf.xml. RequestHandlers have a certain name and a class assigned that is responsible for handling the request. If the name starts with a &quot;/&quot; you can reach the request handler by calling the correct path.
For example the update Handler is configured like this:<br /><i>&lt;requestHandler name=&quot;/update&quot; class=&quot;solr.XmlUpdateRequestHandler&quot; /&gt;</i>
That means you can reach this handler by calling &lt;your_solr_url&gt;<i>/update&nbsp;<br /></i>
If the name is not starting with &quot;/&quot; you can by default call the request handler with the path <i>select</i> and the parameter <i>qt</i> like this: &quot;<b>/select&amp;qt=standard</b>&quot;... 
These kind of requestHandlers are normally reserved for handlers that searches for something (the so called &quot;<b>search Handlers</b>&quot;).&nbsp;
You can have a look at your solrconf.xml for more details regarding the configured handlers or call the plugin handler like this: <i>/admin/plugins</i>&nbsp; (if it is configured of course).
You can check the solrconfig.xml examples here: <link http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_5/solr/example/solr/conf/solrconfig.xml>solrconfig.xml  in solr SVN</link> and the <link https://svn.typo3.org/TYPO3v4/Extensions/solr/trunk/resources/solr/typo3cores/conf/solrconfig.xml>solrconfig.xml included in the TYPO3 solr extension</link> (which is used in the examples below)
<h2>Search Handlers</h2>
So we can define the search handlers as SOLR request handlers that returns a list of results to you. Often they have  names that do not start with &quot;/&quot; and therefore are used by specifiing the qt parameter in the url like &quot;.../select&amp;qt=&lt;searchhandlername&gt;&quot;
The <b>search Handlers</b>&nbsp;normally uses different components that do parts of the work. For example there are components for Querying, Facetting, Highlighting etc. 
All the search Handlers should understand the so called <b>common query parameters</b> [3]. The main ones  are:
<ul><li>q: The query string that is parsed using a query parser</li><li>sort: For defining the sorting</li><li>start, rows - to define offest and result count</li><li>fq - to define filter queries - you can use multiples of them - and the results are cached. (The caching can also be disabled)</li><li>fl - specify the list of fields that should be returned for the matched documents. For the upcoming SOLR 4.0 this parameter has nice additional features - like pseudo fields, function results as field etc [7]</li><li>defType - specify the query parser</li><li>debugQuery - set this to &quot;on&quot; to see the parsed query and details for scoring calculation.</li></ul>
<br />And additionally the used components &quot;understand&quot; his parameters. 
Please also note, that some of these parameters (and all other parameters) can be configured with default values in the solrconf.xml.
<h2>Query Parsers:</h2>
The Solr.SearchHandler can be configured to use different query Parsers for &quot;translating&quot; the value of the &quot;q&quot; parameter to the correct Lucene Query. Therefore the query parser is one of the most important parts for the search handlers - and understanding whats happening behind the scenes is useful.
Unless configured otherwise the query parser used is the standard <b>lucene</b> query parser.
There are other parsers that are often used - like dismax and edismax.&nbsp; All have its pro and cons. The parser that is used is defined with the defType parameter (either explicit or implicit by using a configured search handler). 
Another way of specifying the parser that should be used is the LocalParams[1] syntax. We will have a look at some examples later.
 Here is the full list of query parsers,&nbsp; that allows you to build really fancy querys:
<ul><li><b>lucene</b> - standard (the standard parser) - see below for details</li><li><b>dismax</b> - aims to deal with a &quot;human query string&quot; - see below for details. (there is also the extended edismax parser in newer versions)</li><li><b>func</b> - to build function queries. Not really useful as standalone parser - but often used together with others (_val_ hook for lucene parser and bf parameter for dismax parser)</li><li><b>boost</b> - to boost a query </li><li><b>frange</b> - can be used to speed up range queries</li><li><b>field</b> - simple field query useful in filter querys</li><li><b>prefix</b> - simple prefix query - useful e.g. in filter querys</li><li><b>raw</b> / <b>term</b> - create raw term query from input</li><li><b>query</b> - allow for combining different querytypes (&quot;nesting&quot;)</li></ul>

<h1>Query Parsers at a closer look</h1>
<h2>the lucene query parser (standard)</h2>
The Lucene Query Parser understands a subset of Lucene Query (see [2])<br /><br />Lets have a look at some first basic example:<br /><i>/select?q=forum&amp;rows=10&amp;qt=standard&amp;wt=standard&amp;debugQuery=on</i>
Please note 2 details here: first the <i>debugQuery=on</i> is used which includes the result of the query parsing together woth ranking details. And second the lucene parser is used by setting the requestHandler to <i>standard</i> ( query type: <i>qt=standard </i>). The &quot;standard&quot; search request handler&nbsp; is normaly defined to use the lucene parser, because of the following snippet in the solrconfig.xml. (It might be different for your installation)
<i>&lt;requestHandler name=&quot;standard&quot; class=&quot;solr.SearchHandler&quot;&gt;<br />&nbsp;&nbsp;&nbsp; &lt;!-- default values for query parameters --&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp; &lt;lst name=&quot;defaults&quot;&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;str name=&quot;echoParams&quot;&gt;explicit&lt;/str&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp; &lt;/lst&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp; &lt;arr name=&quot;last-components&quot;&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;str&gt;spellcheck&lt;/str&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp; &lt;/arr&gt;<br />&nbsp; &lt;/requestHandler&gt;</i>
 An alternative way of specifing this query would have been to set the query parser with the defType parameter like this:<br /><i>/select?q=forum&amp;rows=10&amp;wt=standard&amp;debugQuery=on&amp;defType=lucene</i>
The resulting parsed query of this example is: <i>&quot;content:forum&quot;</i> meaning it will return all documents with the term <i>&quot;forum&quot;</i> in the content field.<br />The scoring of the documents is calculated mainly from termFrequency (=how often the term matches in the field) and fieldNorm (= the overall lenght of field)<br /><br />Here is another example:<br /><i>/select?q=forum billing&amp;rows=10&amp;defType=lucene&amp;wt=standard&amp;debugQuery=on</i>
With the resulting Query: <i>&quot;+content:forum +content:billing&quot;</i> - note that both terms are marked as obsolete.
<b>Examples - search in different fields:<br /></b>You can use the lucene parser to explicitly search in some some fields. For easier&nbsp; explanation the next examples only show the q parameter:<br /><i>q=content:forum title:billing</i><br />Resulting Query: <i>&quot;+content:forum +title:billing&quot;</i>
So this example will return doc that match the term &quot;<i>forum</i>&quot; in the content and &quot;<i>billing</i>&quot; in the title. Note that the terms are automatically combined using &quot;AND&quot; (both are obsolete). If you dont want this you can set the parameter q.op to &quot;OR&quot; or explicitly write the query like:<br /><i>q=content:forum OR title:billing<br /></i>Resulting Query:<i> &quot;content:forum title:billing&quot;<br /></i><br />You can also boost certain term matches for the default score calculation like this:<br /><i>q=+content:nokia +content:prepaid^10 title:billing<br /></i><i>q.op=OR<br /></i>Resulting Query: <i>&quot;+content:nokia content:prepaid^10.0 title:billing&quot;</i>
Guess what this query does? It will find all documents that match the term &quot;nokia&quot; - and it will additional boost up documents that matches the term &quot;prepaid&quot; in the content and &quot;billing&quot; in the title. Where the term prepaid is 10 times more relevant for the score calculation, but still documents are listed that don't have the term &quot;prepaid&quot; in it, since the term is not marked as obsolete.
Multiple terms in one field can also be written like this:<br /><i>q=title:(nokia handy)<br /><i>Resulting Query</i>: &quot;+title:nokia +title:hallo</i>&quot;
<b>Hooks and function queries:</b><br />The Standard query parser allows much more: like negative term queries, fuzzy search and range queries. Last but not least two &quot;hooks&quot; are available: <b>_val_</b> and <b>_query_</b> that allows the combined usage with function queries (_val_) and&nbsp; the support for nested subqueries of any type (_query_). <br />An example query using _val_:<br /><i>q=forum _val_:&quot;{!func}id&quot;<br /></i>Resulting Query: &quot;<i>+content:forum +FunctionQuery(str(id))</i>&quot;<br /><br />So this query will find documents with the term &quot;forum&quot; and it will add the id to the score value (of course there are better se-cases: like using a funcion query that adds value to the score based on a date field for example). We will see examples for nested queries later.<br /><br /><br /><b>To summarize the pros of the lucene parser:</b>
<ul><li>used to select on specific fields if you need fine grained control</li><li>enables powerful usage of Lucene Query Parser with a lot of features [2]</li><li>two hooks that allows more complex use cases</li></ul>
<h2>The dismax query parser</h2>
The dismax parser is designed to interpret user input like &quot;nokia akku tips&quot; or &quot;smartphone nokia -iphone&quot;, also it has a lot of parameters to control the score calculation using PhraseQueries, Boostfunction etc.<br />Lets see some examples:<br /><i>/select?q=forum&amp;rows=10&amp;qf=content&amp;qt=dismax&amp;wt=standard&amp;debugQuery=on<br /></i><br />Because we set the requestHandler with the qt parameter to &quot;dismax&quot; the request Handler configured as following is used:<br />&nbsp;&lt;requestHandler name=&quot;dismax&quot; class=&quot;solr.SearchHandler&quot; &gt;<br />&nbsp;&nbsp;&nbsp; &lt;lst name=&quot;defaults&quot;&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp; &lt;str name=&quot;defType&quot;&gt;dismax&lt;/str&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp; &lt;str name=&quot;echoParams&quot;&gt;explicit&lt;/str&gt;<br />&nbsp;&nbsp;&nbsp; &lt;/lst&gt;<br />&nbsp; &lt;/requestHandler&gt;<br /><br />So in fact this will&nbsp; use the dismax query parser without any predefined parameters and the query will expand to: <i>&quot;+(content:forum) ()&quot;. </i>Pay attention to the extra parameter qf (query fields) that needs to have the list of fields to search in.<br />Another way of forcing the dismax query parser would have been to use LocalParams Syntax like this: <i>q={!dismax}forum</i>
Lets look at a second example (again only the q parameter is shown)<br /><i>q=forum +must -dont</i><br />Resulting Query: <i>&quot;+( ( (content:forum) +(content:must) -(content:dont))~1) ()&quot;</i><br /><br />So the parser also supports the explicit absense or existence of a term. The &quot;<i>~1</i>&quot; in the parsed query means that only one of the expressions need to match (see the mm parameter below)<br /><br />The dismax parser allows to search in multiple fields with different relevancy by setting the qf (query fields) parameter:<br /><i>q=forum<br />qf=content^2 title^10</i><br />Resulting Query: <i>&quot;+(content:forum^2.0 | title:forum^10.0) ()&quot;</i>
<br />The dismax parser comes with a couple of parameters that helps to translate the human query to a relevant lucene query [4] - lets have a closer look at some of them.
<b>The mm (minimum should match) Parameter:</b><br />Here you define how many terms must match - lets say you search for &quot;nokia sony panasonic&quot;, a mm set to &quot;2&quot; will mean that only 2 terms need to match (and it will find documents with only nokia and sony in it for example)<br />To make it more concrete look at this example:<br /><i>/select?q=nokia sony panasonic&amp;rows=10&amp;qt=dismax&amp;debugQuery=on&amp;mm=2&amp;qf=content^10 title^20</i>
<i></i>This will result in the parsed query: <br /><i>&nbsp; (<br />&nbsp;&nbsp;&nbsp; (content:nokia^10.0 | title:nokia^20.0) <br />&nbsp;&nbsp;&nbsp; (content:sony^10.0 | title:sony^20.0) <br />&nbsp;&nbsp;&nbsp; (content:panasonic^10.0 | title:panasonic^20.0)<br />&nbsp; )~2</i>
This is where you see how powerful the dismax parser is - the parsed queries get soon very big - especially when using many queryfields and also using the other dismax parameters.
<b>Other Parameters:</b><br />The dismax parser supports many more parameters - mainly to control the scoring:
<ul><li>Phrases (fo boosting documents where all terms match in close proximity) <ul><li>pf - Phrase Fields (e.g. content^2)</li><li>ps - Phrase sloop (e.g. 15 ) </li></ul></li><li>bf - boost function, shorthand for _val_ hook. (Simelar like for the lucene parser)</li><li>bq - boost query - any additional query that is executed and added to the scoring</li></ul>
For more details see [4]
So the big advantage of the dismax parser is, that you have a good scoring calculation based on normal queries. (much better than a simple lucene parser result)
<h2>Boost Query Parsers:</h2>
Lets have a look at the boost Query Parser. You can use it using LocalParams Syntax like this:<br /><i>q={!boost b=&lt;the boost function query&gt;}&lt;any other query&gt;</i>
Note that the boost query itself set with the local param &quot;b&quot; and that the boost query needs always another &quot;normal&quot; querystring as well.
For example <i>q={!boost b=id}forum</i><br />will result in the parsed Query: &quot;<i>boost(content:forum,str(id))&quot;</i>
The result means, that all documents that matches &quot;forum&quot; in the content field will be returned and additionally boosted by the id field (which is not very useful of course but serves as an example). You also see that the lucene parser is used for the appended query string &quot;forum&quot; - since this is the default parser.
If you want to use the boost query together with a dismax query you can do this like this:<br /><i>q={!boost b=id}{!dismax}nokia sony panasonic<br /></i>Resulting Query: <i>&quot;boost(+(((content:nokia^40.0 | title:nokia ....))~2 (content:&quot;nokia sony panasonic&quot;~15^2.0),str(id))&quot;</i>
(If you want to use this please note that the configured default parameters for the dismax parser are used - but you can also override them inline in the localParams syntax)
<h2>Function Query Parser</h2>
The last query parser for this article is the function query parser. This parser allows to use functions in your query - the normal use case is to influence the scoring in a way you would like to have it (e.g. boosting newer documents...)
Lets look at a simple example:
<i>/select?q={!func}id&amp;rows=10&amp;debugQuery=on</i>
This results in this query: str(id)
It will simply find all documents (a function query itself do not filter) and the scoring equals the fieldvalue of the field id.
Another example is this:
<i>/select?q={!func}product(2,price_f)&amp;debugQuery=on&amp;fl=id,score,price_f&amp;fq=id:2</i>
The resulting query is: <i>&quot;product(const(1.0),sfloat(app_price_f))&quot;</i>
Please note that we used the parameter &quot;fq&quot; to filter for the docuemnt with the id 2 and we also displayed the &quot;score&quot; field: So this query will give us the doubled price of the document in the score field. Imagine you can use this to calculate and return fancy stuff for documents (like term frequency etc).
<b>Typical usage of function query<br /></b>
Typically the function query is used together with other parsers: Most of the parsers have support for function querys:
<ul><li>lucene parser: using the _val_ hook</li><li>dismax parser: using the _val_ hook or the bf parameter</li></ul>
Example dismax and bf usage:<br /><i>/select?q=nokia&amp;debugQuery=on&amp;qf=content title&amp;qt=dismax&amp;bf=id</i>
Will result in: <i>&quot;+(content:nokia | title:nokia) () str(id)&quot;</i>
<h2>Combining dismax and lucene</h2>
With the above knowlege its easy to combine the two parsers in different ways:
<b>Lucene in dismax:</b><br />Using the bq parameter of dismax to add lucene query:
<i>/select?q=car&amp;debugQuery=on&amp;qf=content&amp;qt=dismax&amp;bq=pagetype:app</i>
Resulting query: <i>&quot;+(content:car) () pagetype:app&quot;</i>
<b>Dismax in lucene:</b>
Using the _query_ hook for lucene parser to add a dismax query:
<i>/select?q=pagetype:app _query_:&quot;{!dismax%20qf=content}car&quot;&amp;debugQuery=on&amp;qt=standard</i>
Resulting query: <i>&quot;+pagetype:app +(+(content:car) ())&quot;</i>
<h1>The fq (filter query) parameter</h1>
If you go to the beginning of the article you see that the fq parameter is a parameter supported for all search handlers. There are some special things with this parameter:
<ul><li>You can also use different query parsers (using localparam syntax). The default (and the only one that makes really sense) is lucene</li><li>You can have multiple fq parameters in the url: all are evaluated</li><li>The result of the query is cached (can be disabled)</li><li>The filter query do not influence the scoring calculation therefore we can perform filtering without fearing to change the score value of individual documents</li></ul>
Its suggested to use the filter queries (like the name says) for facet flltering - the caching feature is more helpful for complex query parts.
The advantage is, that the filter query uses the filterCache from SOLR. It works something like this:
<ol><li>all filter queries are executed seperate. If caching is not disabled the resulting document ids are cached. So if the same filter query is used again its not executed - but retrieved from the cache.</li><li>Before the final result is returned, solr calculates the intersection between the main query result and all filter query results.</li></ol>
I still want to do some tests to see under which circumstances the filterCache and the intersection calculation maybe worse than a single main query. (But thats a topic of its own). In general using filter queries results in a better usage of the SOLR caching. See also [8]
<h1>Search components</h1>
&quot;Search components enable a SearchHandler to chain together reusable pieces of functionality to create custom search handlers without writing code. &quot; [6]
Search Components can be enabled for the search request handlers (per default most of them are already enabled), they are configured with certain parameters and normally modify the result by adding more informations. This article won't explain them in detail but for the completeness the most important ones are listed here:
<ul><li>FacetComponent - Adds informations that can be used to show filters (facets) with the search result.</li><li>Highlighting - Adds preview snippets for the result documents based on the query</li><li>StatsComponent - similar to FacetComponents but only returns infos like min and max values of a certain field. (Useful to display range filters)</li><li>SpellCheckComponent - advanced spell checking for the query</li><li>QueryElevantionComponent - adds or boost documents based on a editorial maintained file</li><li>TermVectorComponent - can return infos like frequent terms in a field ( if the field is configured to store termVectors)</li><li>TermsComponent - provide access to the indexed terms in a field - often used for autosuggest</li></ul>]]></content:encoded>
			<category>news</category>
			<category>development</category>
			
			
			<pubDate>Mon, 23 Jan 2012 08:23:00 +0100</pubDate>
			
		</item>
		
		<item>
			<title>JavaScript device detection</title>
			<link>http://www.typo3-media.com/blog/javascript-device-detection.html</link>
			<description>
For some recent project we required some device specific logic. After searching for javascript...</description>
			<content:encoded><![CDATA[
For some recent project we required some device specific logic. After searching for javascript based libaries we couldn't find any appropriate&nbsp;and decided to write a small device detection library based on user agend parsing.
You can find it on GitHub: <link https://github.com/danielpoe/DeviceDetection _blank>https://github.com/danielpoe/DeviceDetection</link>


 
Please fell free to use and improve it with further checks and user agends. The lib comes with some basic QUnit test cases that can also be extended with the time.
Example Usage:

<pre>var detection = DeviceDetection(); <br />detection.isApple(); <br />detection.isTouchDevice(); <br />detection.isAndroid(); <br />detection.isSmartPhone(); <br />detection.isDesktop(); <br />detection.isTablet(); </pre>
]]></content:encoded>
			<category>news</category>
			
			
			<pubDate>Sat, 21 Jan 2012 16:07:00 +0100</pubDate>
			
		</item>
		
		<item>
			<title>Sessions with Extbase</title>
			<link>http://www.typo3-media.com/blog/sessions-extbase-typo3.html</link>
			<description>Sessions are used to save informations that are bound to the current website user. In most cases...</description>
			<content:encoded><![CDATA[Sessions are used to save informations that are bound to the current website user. In most cases this informations can be transient - meaning they are only relevant for the time where the user visits the website.&nbsp;
A typical example is something like a Basket or informations about the current Marketingcampaign for the visitor.
Therefore in most cases the Session is a relevant concept for your Domain. Imagine a simple BasketController, you may want to use something like this:
<pre>class BasketController {   <br />   ...<br />   public function addItemAction(Product $product) {<br />      $basket = $this-&gt;userSession-&gt;getBasket();<br />      $basket-&gt;addItemByProduct($product);<br />      $this-&gt;userSession-&gt;saveBasket($basket);<br />   }<br /> } </pre>


If you have a look at the attached UML scetch: That means there is a Session Class in the Domain Layer of your application, that should be of scope singleton and defines a well defined Interface for your Domain. (In this example to get the basket Instance, that acts as Aggregate Root for the Items in the basket).
For the actual storage of the Objects in the Session the Persitence Implementation is part of your applications system Layer. Here you can find a sample SessionStorage Implementation that works for TYPO3 4.x:
<pre>class Tx_Extkey_System_Session_SessionStorage implements t3lib_Singleton {<br />	<br />	const SESSIONNAMESPACE = 'tx_extkey';<br />	<br />	/**<br />	 * Returns the object stored in the user´s session<br />	 * @param string $key<br />	 * @return Object the stored object<br />	 */<br />	public function get($key) {<br />		$sessionData = $this-&gt;getFrontendUser()-&gt;getKey('ses', self::SESSIONNAMESPACE.$key);<br />		if ($sessionData == '') {<br />			throw new LogicException('No value for key found in session '.$key);<br />		}<br />		return $sessionData;<br />	}<br />	<br />	/**<br />	 * checks if object is stored in the user´s session<br />	 * @param string $key<br />	 * @return boolean<br />	 */<br />	public function has($key) {<br />		$sessionData = $this-&gt;getFrontendUser()-&gt;getKey('ses', self::SESSIONNAMESPACE.$key);<br />		if ($sessionData == '') {<br />			return false;<br />		}<br />		return true;<br />	}<br /> <br />	/**<br />	 * Writes something to storage<br />	 * @param string $key<br />	 * @param string $value<br />	 * @return	void<br />	 */<br />	public function set($key,$value) {<br />		$this-&gt;getFrontendUser()-&gt;setKey('ses', self::SESSIONNAMESPACE.$key, $value);<br />		$this-&gt;getFrontendUser()-&gt;storeSessionData();<br />	}<br />	<br />	/**<br />	 * Writes a object to the session if the key is empty it used the classname<br />	 * @param object $object<br />	 * @param string $key<br />	 * @return	void<br />	 */<br />	public function storeObject($object,$key=null) {<br />		if (is_null($key)) {<br />			$key = get_class($object);<br />		}<br />		return $this-&gt;set($key,serialize($object));		 <br />	}<br />	<br />	/**<br />	 * Writes something to storage<br />	 * @param string $key<br />	 * @return	object<br />	 */<br />	public function getObject($key) {<br />		return unserialize($this-&gt;get($key));		 <br />	}<br /> <br />	/**<br />	 * Cleans up the session: removes the stored object from the PHP session<br />	 * @param string $key<br />	 * @return	void<br />	 */<br />	public function clean($key) {<br />		$this-&gt;getFrontendUser()-&gt;setKey('ses', self::SESSIONNAMESPACE.$key, NULL);<br />		$this-&gt;getFrontendUser()-&gt;storeSessionData();<br />	}<br />	<br />	/**<br />	 * Gets a frontend user which is taken from the global registry or as fallback from TSFE-&gt;fe_user.<br />	 *<br />	 * @return	ux_tslib_feUserAuth	The current extended frontend user object<br />	 * @throws	LogicException<br />	 */<br />	protected function getFrontendUser() {<br />		if ($GLOBALS ['TSFE']-&gt;fe_user) {<br />			return $GLOBALS ['TSFE']-&gt;fe_user;<br />		}<br />		throw new LogicException ( 'No Frontentuser found in session!' );<br />	}<br />}</pre>
And finally this is the sample code for the UserSession Class:
<pre>class Tx_Extkey_Domain_UserSession implements t3lib_Singleton {<br />	/**<br />	 * @var Tx_Extkey_System_Session_SessionStorage<br />	 */<br />	private $sessionStorage;<br />	<br />...	<br /><br />	public function __construct(Tx_Extkey_System_Session_SessionStorage $sessionStorage) {<br />		$this-&gt;sessionStorage = $sessionStorage;<br />	}<br />	<br /><br />	public function getBasket() {<br />		if ($this-&gt;sessionStorage-&gt;has('Basket')) {<br />			return $this-&gt;sessionStorage-&gt;getObject('Basket');<br />		}<br />		else {<br />			return $this-&gt;objectManager-&gt;create('Basket');<br />		}<br />	}<br />	<br />	public function saveBasket(Basket $basket) {<br />		$this-&gt;sessionStorage-&gt;storeObject($basket);<br />	}<br />		<br />}</pre>
<h3>Serialisation drawbacks..</h3>
The objects are stored in the User Session as serialized string. This can cause several problems: 
<ul><li>Not all associated framework objects are serializable. For example you will have problems serializing domain entities (too big, lost connection to persitence manager...)</li><li>Cyclic relations will cause problems</li><li>The serialized version can simple be too big</li></ul>
Thats where you need to think what data should be stored in your serialized objects. PHP5 gives you the possibility to clean up with your data in the __sleep method and to reconstitute them in the __wakeup method. This is where you:
<ul><li>can store ids instead of entitys and try to get them with a repository on wakeup</li><li>throw injected singletons away and try to get them back on __wakeup</li></ul>
<h3>Dependency Injection</h3>
There are two ways of injecting objects after deserialization:
<ol><li>in the __wakeup method using the Extbase ObjectManager (t3lib_div::makeInstance('Tx_Extbase_Object_ObjectManager')</li><li>explicit In the UserSession Class like:</li></ol>
<pre>	public function getBasket() {<br /> 		if ($this-&gt;sessionStorage-&gt;has('Basket')) {<br /> 			$basket =  $this-&gt;sessionStorage-&gt;getObject('Basket'); <br />                        $basket-&gt;injectSomeThing($this-&gt;objectManager-&gt;get('Something'));<br /> 		}<br /> 		else {<br /> 			return $this-&gt;objectManager-&gt;create('Basket');<br /> 		}<br /> 	} </pre>
<h2>In FLOW3</h2>
Well in FLOW3 you don't need to deal with a &quot;SessionStorage&quot; Implementation and think about __sleep and __wakeup... this is all handled by the Framework and the @scope session&nbsp; annotation.
See Roberts Post about this:
http://robertlemke.de/blog/posts/2010/08/19/session-handling-and-object-serialization
]]></content:encoded>
			<category>pattern</category>
			
			
			<pubDate>Mon, 21 Nov 2011 21:05:00 +0100</pubDate>
			
		</item>
		
		<item>
			<title>Finding more with SOLR</title>
			<link>http://www.typo3-media.com/blog/solr-noun-expansion.html</link>
			<description> For a recent project we launched we had to deal with very different quality of the indexed...</description>
			<content:encoded><![CDATA[ For a recent project we launched we had to deal with very different quality of the indexed documents and I want to share some approaches that you can use with SOLR to match fuzzy on terms (for different languages).
<b>For example:</b> If someone searches for &quot;Autonavigation&quot; it should find documents with &quot;car navigation&quot; and &quot;auto navigationsgerät&quot;.
There are the follwoing Issues to solve here:
<ol><li>You need to split the word &quot;Autonavigation&quot; into &quot;Auto&quot; and &quot;navigation&quot;</li><li>You need to search for the translated, synonyms and untranslated tokens.</li><li>Be fuzzy enough</li><li>Be relevant</li></ol>
<h3>Using&nbsp;DictionaryCompoundWordTokenFilterFactory to split nouns</h3>
A stable way to split words by meaningful subwords you can use the solr.DictionaryCompoundWordTokenFilterFactory Filter. This Filter uses a dictionary of words and splits tokens by detected words:

<pre>&lt;!-- 1 split subwords english nouns --&gt;<br />&lt;filter class=&quot;solr.DictionaryCompoundWordTokenFilterFactory&quot; dictionary=&quot;wordlists/german-common-nouns.txt&quot; minWordSize=&quot;5&quot; minSubwordSize=&quot;4&quot; maxSubwordSize=&quot;15&quot; onlyLongestMatch=&quot;true&quot;/&gt;<br />&lt;!-- 2 split subwords german nouns --&gt;<br />&lt;filter class=&quot;solr.DictionaryCompoundWordTokenFilterFactory&quot; dictionary=&quot;wordlists/english-common-nouns.txt&quot; minWordSize=&quot;5&quot; minSubwordSize=&quot;4&quot; maxSubwordSize=&quot;15&quot; onlyLongestMatch=&quot;true&quot;/&gt;<br />			</pre>
<h3>Using SynonymFilterFactory to translate</h3>

<pre>&lt;filter class=&quot;solr.SynonymFilterFactory&quot; synonyms=&quot;wordlists/translationsgroups-english-german-nouns.txt&quot; ignoreCase=&quot;true&quot; expand=&quot;true&quot;/&gt;<br />				 </pre>
The file has synonymgroups like: &quot;auto,car&quot; - so that  the search uses always the english and german version of a noun
<h3>Using the correct Stemmer</h3>
Try out different stemmer and analyse the results using the SOLR admin-&gt;analyse GUI. For german the Stemmer order (from aggressive to unaggressive) should be like this for example:

<ul><li> GermanStemFilterFactory</li><li>SnowballPorterFilterFactory (language German)</li><li>GermanLightStemFilterFactory</li><li>GermanMinimalStemFilterFactory</li></ul>
<h3>Be relevant</h3>
Using a aggressive stemmer and the explained token expansions you will have more hits when you search. But there is the risk to find documents that are not so relevant (the old recall to precision problem).
Therefore I prefer to have a &quot;text&quot; field that is configured less aggressive and dont uses language expansion. In addition you could configure a &quot;expandedtext&quot; field, that uses the described configuration.
When doing search you need to place a query that searches in both fields, but having a higher boost set to the &quot;text&quot; field.

<pre>&lt;fieldType name=&quot;expandedtext&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;&gt;<br />	&lt;analyzer type=&quot;index&quot;&gt;<br />		&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/&gt;<br />		&lt;filter class=&quot;solr.HyphenatedWordsFilterFactory&quot;/&gt;<br />		&lt;filter class=&quot;solr.WordDelimiterFilterFactory&quot; generateWordParts=&quot;1&quot; generateNumberParts=&quot;1&quot; catenateWords=&quot;1&quot; catenateNumbers=&quot;1&quot; catenateAll=&quot;0&quot; splitOnCaseChange=&quot;1&quot;/&gt;<br />		&lt;!-- Case insensitive stop word removal.<br />			add enablePositionIncrements=true in both the index and query<br />			analyzers to leave a 'gap' for more accurate phrase queries.<br />		--&gt;                          	<br />		&lt;filter class=&quot;solr.LowerCaseFilterFactory&quot;/&gt;<br />		&lt;filter class=&quot;solr.StopFilterFactory&quot;<br />				ignoreCase=&quot;true&quot;<br />				words=&quot;stopwords_de.txt&quot;<br />				enablePositionIncrements=&quot;true&quot;<br />				/&gt;<br />		&lt;filter class=&quot;solr.StopFilterFactory&quot;<br />				ignoreCase=&quot;true&quot;<br />				words=&quot;stopwords_en.txt&quot;<br />				enablePositionIncrements=&quot;true&quot;<br />				/&gt;<br />		&lt;!-- 1 split subwords english nouns --&gt;<br />		&lt;filter class=&quot;solr.DictionaryCompoundWordTokenFilterFactory&quot; dictionary=&quot;wordlists/german-common-nouns.txt&quot;<br />           		minWordSize=&quot;5&quot; minSubwordSize=&quot;4&quot; maxSubwordSize=&quot;15&quot; onlyLongestMatch=&quot;true&quot;/&gt;<br />           	&lt;!-- 2 split subwords german nouns --&gt;<br />           	&lt;filter class=&quot;solr.DictionaryCompoundWordTokenFilterFactory&quot; dictionary=&quot;wordlists/english-common-nouns.txt&quot;<br />           		minWordSize=&quot;5&quot; minSubwordSize=&quot;4&quot; maxSubwordSize=&quot;15&quot; onlyLongestMatch=&quot;true&quot;/&gt;<br />		&lt;!-- 3 expand english words to include the german translation --&gt;<br />		&lt;filter class=&quot;solr.SynonymFilterFactory&quot; synonyms=&quot;wordlists/translationsgroups-english-german-nouns.txt&quot; ignoreCase=&quot;true&quot; expand=&quot;true&quot;/&gt;<br />		&lt;filter class=&quot;solr.LowerCaseFilterFactory&quot;/&gt;				<br />		&lt;filter class=&quot;solr.GermanStemFilterFactory&quot;/&gt;		<br />		&lt;filter class=&quot;solr.SnowballPorterFilterFactory&quot; language=&quot;English&quot;/&gt;			<br />		&lt;filter class=&quot;solr.RemoveDuplicatesTokenFilterFactory&quot;/&gt;<br />	&lt;/analyzer&gt;<br />	&lt;analyzer type=&quot;query&quot;&gt;<br />		&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/&gt;<br />		&lt;filter class=&quot;solr.WordDelimiterFilterFactory&quot; generateWordParts=&quot;1&quot; generateNumberParts=&quot;1&quot; catenateWords=&quot;0&quot; catenateNumbers=&quot;0&quot; catenateAll=&quot;0&quot; splitOnCaseChange=&quot;1&quot;/&gt;<br />		&lt;filter class=&quot;solr.LowerCaseFilterFactory&quot;/&gt;<br />		&lt;filter class=&quot;solr.GermanStemFilterFactory&quot;/&gt;	<br />		&lt;filter class=&quot;solr.SnowballPorterFilterFactory&quot; language=&quot;English&quot;/&gt;<br />		&lt;filter class=&quot;solr.RemoveDuplicatesTokenFilterFactory&quot;/&gt;<br />	&lt;/analyzer&gt;<br />&lt;/fieldType&gt; </pre>
<h3>Downloads:</h3>
Actually finding and creating the correct dictionaries ad synonymlists takes most of the time. Here are the lists that I created based on the links at the end of the article. The translation was created using google translate.
<ul><li><link fileadmin/files/wordlists/english-common-nouns.txt>list of common english nouns</link></li><li><link fileadmin/files/wordlists/german-common-nouns.txt>list of common german nouns (Substantive)</link></li><li><link fileadmin/files/wordlists/translationsgroups-english-german-nouns.txt>noun synonym list with german, english translation groups</link></li></ul>
<h3>Links for wordlists</h3>
Open Source Wordlists:<br />http://sourceforge.net/projects/germandict/<br /><link http://wordlist.sourceforge.net>http://wordlist.sourceforge.net/</link> 
Dictionaries that can be used by open office:<br />http://wiki.services.openoffice.org/wiki/Dictionaries
List of most used words in different languages:<br />http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists<br /><br />German synonyms:<br />http://www.openthesaurus.de/about/download<br /><br />Other lists of lists:<br />http://www.dict.org/w/databases/dict<br />http://fmg-www.cs.ucla.edu/geoff/ispell-dictionaries.html
<h3>SOLR related links</h3>
List of all Filters:<br />http://lucene.apache.org/solr/api/org/apache/solr/analysis/package-summary.html
<h3>Other Links</h3>
Nice blog around text technologies:<br />http://www.texttechnologies.com/?pagename=about
&quot;Institute der Deutschen Sprache&quot;: http://www.ids-mannheim.de/kl/projekte/methoden/derewo.html
Solr suggestion of Stemmers:<br />http://wiki.apache.org/solr/LanguageAnalysis
Open Source text processing software GATE:<br />http://gate.ac.uk/<br /><br />Open Source linguistic text processing LingPipe:<br />http://alias-i.com/lingpipe/
Semantic Indexing<br />http://knowledgesearch.org/<br />http://www.cs.washington.edu/research/textrunner/reverbdemo.html]]></content:encoded>
			<category>news</category>
			
			
			<pubDate>Sat, 08 Oct 2011 23:15:00 +0200</pubDate>
			
		</item>
		
		<item>
			<title>Need for speed: TYPO3 Website-Caching</title>
			<link>http://www.typo3-media.com/blog/website-caching-login.html</link>
			<description>Caching is always a tricky thing and a good caching strategy always depends on the use cases of the...</description>
			<content:encoded><![CDATA[Caching is always a tricky thing and a good caching strategy always depends on the use cases of the website.&nbsp;
<h2>Website Caching Levels</h2>
Of course there are tons of possible ways to cache a content on a website - I only want to look at the most promising cache levels:
<ol><li>Cache-Control Headers: If your website sends proper &quot;Cache-Control: public&quot; HTTP Headers together with a proper expire that's perfect for caching: Your browser will store the website in its internal cache - and there is nothing that could be faster than this.</li><li>Varnish: Varnish is a &quot;web application accelerator&quot; - its build to cache content based on powerful configurations. (It also supports loadbalancing and failover handling)</li><li>TYPO3 Page Cache: TYPO3 caching framework is very flexible and should be used of course. It can handle Full-page cache based on parameters, logins and conditions. It also supports dynamic marker replacement and dynamic content parts (USER_INT).</li><li>Application Caches (Feature specific caches in our application logic)</li></ol>

<h2>How dynamic is my website?</h2>
The first thing to consider is, if your website looks different based on certain criterias. For example (from easy to hard to solve):
<ol><li>based on time (news articles)</li><li>based on a login (e.g. after login I see personalized informations)</li><li>based on the user (even without login - e.g. based on location, cookie informations, browser history...)</li></ol>

<h2>Decide on a cache strategie</h2>
<h3>1 Use Cache-Control where possible</h3>
The best you can do is allow client side caching. In TYPO3 this is pretty easy, you just need to set:
config.sendCacheHeaders=1
The precondition is, that the page is cacheable (no user INT).
TYPO3 then sends proper Cache-Control Headers.The Expire time is depending on the settings in the page: You can control the expire time for all pages in the page properties (Cache expire). (Default is always one day or you set the default with config.cache_period ). Also you can set the expire to end at midnight with:
config.cache_clearAtMidnight =1&nbsp;
<h4>Problem 1: &quot;USER_INT&quot;:</h4>
There are a lot of plugins, that are USER_INT. Some of them could be USER if done in a proper way:
<ul><li>is cHash used correct to have different cacheentries based on parameters?</li><li>Does the plugin only show dynamic content after login? Then this is a nice solution:<br /><pre>plugin.tx_felogin_pi1 = USER <br />[usergroup = *]<br /> 	plugin.tx_felogin_pi1 = USER_INT <br />[global]</pre></li><li>Does the plugin only show dynamic content if a certain parameter is set? Then this is a nice solution:<br /><br />[globalVar = GP:aoe_solr|sword=]<br />&nbsp; plugin.aoe_solr=USER<br />[else]<br />&nbsp;&nbsp; plugin.aoe_solr=USER_INT<br />[global]</li></ul>

Also its always a possible solution to let the client (browser) load the dynamic parts via ajax after dom ready. This is a nice solution - since the user immediately sees the website - and still has dynamic informations. This is for example the case for the login box used on the typo3.org relaunch.
<h4>Problem 2: Login and User specific content</h4>
If your page offers a login, and the page looks different after login you have problems with Cache-Control headers: When the urls to the pages stay the same, your browser might have a cached page and therefore will not ask for a new one that includes the changed content. So&nbsp;if you still want to use Cache-Headers, you have this possibilities
<ul><li>Use ajax calls to add dynamic content (same like above). Especially wen you have a highly personalized website with a lot of user specific content there are solutions around, that use search technologies to display dynamic user-specific content. (e.g. the searchperience recommendation engine and widget features)</li><li>or take care that the urls look different after login (to force the client to load new content):<ul><li>The best is to switch from http to https after login. You can then tell TYPO3 not to send Cache-Control Headers in HTTPS:&nbsp; (the condition depends on the server - also if using varnish you&nbsp; need to find another way of detecting https )<br /><br />[globalString = _SERVER|HTTPS=on]<br />&nbsp; config.sendCacheHeaders=0<br />[global]<br /><br /></li><li>You can also simply add an additional parameter to the URLs after login, and dont send Cache Headers if that parameter exists ( config.linkVars=login )</li></ul></li></ul>
<h2>2. Use Varnish</h2>
If you only have Cache-Headers, TYPO3 still needs to deliver the page for every new user. By adding another layer of cache even this can be avoided.
Fabrizio wrote a nice blog article on setting up Nginx + Varnish: http://www.fabrizio-branca.de/nginx-varnish-apache-magento-typo3.html
<h3>Problem 1: Cookies</h3>
Cookies might be a problem. A typical Varnish configuration will not cache if the server sends cookies. And TYPO3 per default always sends cookies, because you need this for the login. There are several solutions:
<ul><li>If you dont need login you simple may drop the cookies in varnish or set this TYPO3 setting:<br />$TYPO3_CONF_VARS['FE']['dontSetCookie'] = TRUE;</li><li>Better is to tell TYPO3 only to send a cookie if there is a valid user logged in. This can be done by installing some of the extensions (moc_varnish or cacheinfo). You may want to look at http://forge.typo3.org/projects/show/extension-cacheinfo , since this extension not only modifies the cookie behavior, but also adds several X-T3Cache* HTTP Headers that helps a lot...</li></ul>
<h3>Problem 2: rsaauth and PHPSESSION cookie</h3>
If you dont use https for your frontend login, TYPO3 offers a way to still secure your login. This is done with the help of the extension &quot;rsaauth&quot; and can be activated by setting the securityLevel to &quot;rsa&quot;. However this extension is not functional with client side caching - so please use https and deactivate rsa...
A recommended login configuration is:
$TYPO3_CONF_VARS['FE']['loginSecurityLevel'] = 'normal';
and have &quot;saltedpasswd&quot; installed and activated in extension settings.
(For the backend you should use lockSLL )
TYPO3 might still send PHPSESSION cookies - they can be dropped in varnish. ( see also Bug: http://forge.typo3.org/issues/29927 )
<h3>ESI - Edge Side Includes?</h3>
ESI is a technique from varnish to get dynamic content in a cached page. The extension moc_varnish has support for this (rewriting USER_INT to ESI Includes).
But in fact I think ESI don't has&nbsp; advantages: 
<ul><li>Varnish needs to wait till all ESI calls are finished before it can deliver the page. That means the page deliver time is much lower compared to a page without ESI.</li><li>The same could be achieved using ajax calls: Then the user gets the page asap and a single ajax call can be done (instead of multiple ESI calls). I think for most websites that require something like this it's ok to require  javascript support.</li></ul>



]]></content:encoded>
			<category>Inside TYPO3</category>
			
			
			<pubDate>Sat, 17 Sep 2011 23:03:00 +0200</pubDate>
			
		</item>
		
		<item>
			<title>TYPO3 Localisation with TemplaVoila</title>
			<link>http://www.typo3-media.com/blog/localisation-secrets.html</link>
			<description>This article should give a compact summary of a possible best practice multilanguage configuration...</description>
			<content:encoded><![CDATA[This article should give a compact summary of a possible best practice multilanguage configuration for TYPO3 together with TemplaVoila.
<h2>Precondition</h2>
Install the extension &quot;languagevisibility&quot; (see links below). This extension is required for the full localisation power.
You may also want to have a look at the &quot;l10n&quot; extension that adds mass translation support.
<h2>Website Languages</h2>
Add the &quot;website language&quot; records under the TYPO3 page root and configure the desired fallback order. (typically you want to have at least &quot;default&quot; in you fallback order)
<h2>TSConfig</h2>
<pre>mod.SHARED { 		   <br />  defaultLanguageFlag  = us<br />  defaultLanguageLabel = English - US 	 <br />} </pre>
<pre>mod.web_txtemplavoilaM1.enableLocalizationLinkForFCEs = 1</pre>
<pre>mod.web_txtemplavoilaM1.hideCopyForTranslation &gt;</pre>
<h2>TypoScript</h2>
<pre> config {<br />   sys_language_mode = ignore<br />   sys_language_overlay = hideNonTranslated<br />   sys_language_uid = 0<br />   language=en<br />   locale_all = en_US.utf8<br />...<br /> }<br />...<br /> [globalVar = GP:L = 1] 	<br />   config {<br /> 	  sys_language_uid = 1<br /> 	  language = de<br /> 	  locale_all = de_DE.utf8<br />        } <br />... <br /> [end]<br />...</pre>
<h2>TemplaVoila Localisation Modi</h2>
TemplaVoila has 4 localisation modi:
<ol><li>&quot;Disabled&quot; ( langDisabled = 1 )</li><li>&quot;Inlinetranslation&quot; ( langChildren = 1 )</li><li>&quot;Inlinetranslation-seperate&quot; ( langChildren = 0 )</li><li>&quot;databaseoverlay&quot; ( langDatabaseOverlay =1 ) *</li></ol>
(* the mode &quot;databaseoverlay&quot; is added by the extension &quot;languagevisibility&quot; ) 
<h3>Container FCEs</h3>
You should use the mode &quot;disabled&quot; for all FCEs that act as pure containers:
<pre>&lt;meta type=&quot;“array”&quot; /&gt;<br />        &lt;noEditOnCreation&gt;1&lt;/noEditOnCreation&gt;<br />  	&lt;langDisable&gt;1&lt;/langDisable&gt;<br />        &lt;default&gt;<br />          &lt;TCEForms&gt;<br />  	  	&lt;sys_language_uid&gt;-1&lt;/sys_language_uid&gt;<br />  	  &lt;/TCEForms&gt;<br />        &lt;/default&gt;<br />...  </pre>
This setup in the DS tells TYPO3 that this FCE should not be translated. And it tells TemplaVoila, that this FCE should not be edited and that the default value for the language field should be &quot;All&quot;.
<h3>Content FCEs and mixed FCEs</h3>
The typical FCEs have fields with some content, that needs to be localised. You should use the mode &quot;databaseoverlay&quot;. (Inlinetranslation support has two big disadvantages: 1. No concurrent work on languageversions in workspaces and 2. bad usability)

<pre>&lt;meta type=&quot;“array”&quot; /&gt;<br />	&lt;langDisable&gt;1&lt;/langDisable&gt;<br />	&lt;langDatabaseOverlay&gt;1&lt;/langDatabaseOverlay&gt;<br />...   </pre>
If your FCE has translatable content and also container fields that allows for further nesting you need to configure the container fields in a way that it always uses the default language content:

<pre>&lt;field_content type=&quot;array&quot;&gt;<br />...<br /> &lt;TCEforms type=&quot;array&quot;&gt;<br />     &lt;l10n_mode&gt;exclude&lt;/l10n_mode&gt;<br />...<br /> &lt;/TCEforms&gt;<br />&lt;/field_content&gt;<br />...</pre>
If your FCE has translatable content that is not required, you may want to fall back to the default language if this field has no content. This is useful for Images for example: 
<pre>&lt;field_image type=&quot;array&quot;&gt;<br />...<br /> &lt;TCEforms type=&quot;array&quot;&gt;<br />     &lt;l10n_mode&gt;mergeIfNotBlank&lt;/l10n_mode&gt;<br />...<br />  &lt;/TCEforms&gt;<br />&lt;/field_image&gt;<br />...</pre>

(* the &quot;l10n_mode&quot; support for flexforms&nbsp; is added by the extension &quot;languagevisibility&quot; ) 
<h3>page DS</h3>
<pre>&lt;meta type=&quot;“array”&quot; /&gt;<br />  	&lt;langDisable&gt;0&lt;/langDisable&gt;<br />  	&lt;langChildren&gt;1&lt;/langChildren&gt;<br />...   </pre>
This configuration enables the inline localisation for any TemplaVoila page property.
(In addition you have the possibility the maintain a completely own contentelement structure on pages for certain&nbsp;languages)
<h2>Special Attention</h2>
<ul></ul>
<ul></ul>
<ul><li>don't forget to check your correct UTF8 setup. (If TYPO3 version is below 4.5 you need to set &quot;forceCharset = utf-8&quot; and maybe also setDBInit )</li><li>TYPO3 uses the language ISO code as key for inlinetranslation in flexforms. If you have multiple localizations per language you need to add additional pseudo ISO codes. ( Read more in a related article &quot;<link blog/article/multi-language-websites-with-same-language.html>multilanguage websites with same language</link>&quot;)</li></ul>]]></content:encoded>
			<category>Inside TYPO3</category>
			
			
			<pubDate>Wed, 13 Jul 2011 21:08:00 +0200</pubDate>
			
		</item>
		
		<item>
			<title>Inspirations from San Francisco</title>
			<link>http://www.typo3-media.com/blog/typo3-san-francisco.html</link>
			<description>2 weeks ago I went to a business trip to San Francisco. As always I enjoyed being in the Silicon...</description>
			<content:encoded><![CDATA[2 weeks ago I went to a business trip to San Francisco. As always I enjoyed being in the Silicon Valley and SF. The trip began with the <link http://www.lucenerevolution.org/>Lucene Revolution</link> with interesting talks about Lucene and Solr - I was presenting our <link http://www.lucidimagination.com/events/conferences/revolution/2011/presentations-and-abstracts#panasonic-solr>Panasonic Search Case Study</link> there. And it ends with the <link http://t3con11-sf.typo3.org/>TYPO3 Conference in San Francisco</link> where I spoke about &quot;<link http://www.slideshare.net/aoemedia/congstar-web-it-t3-con11-sf>how a complete Telco runs on TYPO3</link>&quot;. In between there was the opportunity to do on-site workshops and meetings with some clients in the Bay area.&nbsp;
After being back in germany again and shrinking the to-do queue a bit its time to summarize some of the inspirations and informations from this trip. In general its always fascinating that a big part of the internet-technologies is connected with the bay area. Not only that google, facebook and co are located there - but you&nbsp;can also go to a Node-JS meetup every week and speak with all the core developers or knock the doors from the github people...
<h2>Lucene Revolution</h2>
Its commonly known that the amount of data is growing exponential and also that its more and more important to have access to the most recent and relevant informations. Also users&nbsp;expect search and information delivery to work fast and to show the most relevant results. That means also, that informations may have to be personalized. That also means, that parts of the websites needs to be different per user (to fit there semantic context) - and that means that content may not be cacheable at all anymore.   
Lucene and Solr are great open source software for searching within a huge amount of different data. They are strong at scaling and have flexible relevancy (score) calculations.
Here are some tools and services that catched my interest during the conference:
<h4></h4>
<h4>realtime search</h4>
In times of Twitter &amp; Co real-time search is getting more and more important. And especially Twitter requires real time indexing of new content. They are indexing 100 million tweets per day and have about 2 billion searches per day - using Lucene! The Twitter Blog explains some more details: <link http://engineering.twitter.com/2010/10/twitters-new-search-architecture.html>http://engineering.twitter.com/2010/10/twitters-new-search-architecture.html</link> 
There is also a very nice blog from the Lucene Core Developer Mike - that also covers recent work on realtime search: http://blog.mikemccandless.com/
<h4>Big Data </h4>
Very interesting is always scaling and handling of big data. Someone published a nice quote: &quot;7 Dwarves of Big Data -- Hadoop, MongoDB, CouchDB, Cassandra, HBASE, memcached, Voldemort ...&quot;. To catch some of this tools:
<b>Apache Hadoop</b>: Is an open-source implementation of frameworks for reliable, scalable, distributed computing and data storage. For example there are components for map-reduce implementations etc...
<b>MongoDB, CouchDB</b>: NonSQL databases that are build for scaling - also on multiple servers.
<b>Cassandra</b>: &quot;The Apache Cassandra Project develops a highly scalable second-generation distributed database,     bringing together <link http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html>     Dynamo's</link> fully distributed design and     <link http://labs.google.com/papers/bigtable.html>Bigtable's</link> ColumnFamily-based data     model.&quot;
<b>HBase</b> is the <link http://hadoop.apache.org/ - externalLink>Hadoop</link>  database. Use it when you need random, realtime read/write access to  your Big Data.     This project's goal is the hosting of very large tables -- billions  of rows X millions of columns -- atop clusters of commodity hardware. 
<h4>semantic (calais)</h4>
Including available semantic informations for better search results is a good idea. 
A nice extraction service is calais - based on large training data this service can extract semantic contexts from any english text. Try using the online-demo with a english news-paper article: http://viewer.opencalais.com/
<h4>edismax in solr 3.1</h4>
Old news but still great - the edismax handler is available in Solr 3.1 and makes live easier when you want to use dismax and lucene syntax for your solr querys.
<h4>Stemming and Language detection</h4>
Especially for european languages or languages like chinese and japanese it is hard to do good stemming based on algorithms. Basis Technologies offers different parsers based on dictionaries that do a great job for different languages.  Unfortunately the products are not cheap. 


<h2>TYPO3 Conference San Francisco</h2>
Of course one of the highlights was the TYPO3 Conference in San Francisco: Great People, great Location, great weather...
A special highlight was the keynote from Jez Humble (yes the one who wrote on one of the best IT books &quot;continuous delivery&quot;). 
Here are some of the many interesting topics at the conference:
<h4>continuous delivery</h4>
Like also written in previous posts - a continuous delivery process with the help of a deployment pipeline is a very good thing to have. We are releasing nearly every project through an automated deployment-pipeline and learned a lot during the last years. The keynote was a good summary of the core ideas. I like the statement &quot;Without testing the default state of your application is broken - unless you prove otherwise . With a deployment-pipeline and automated tests the default state is ok and you are fine to deploy urgent changes to production.&quot;
<link http://vimeo.com/25089865>Video T3CON11-SF: Keynote</link> 
 <link http://robertlemke.de/blog/posts/2011/06/14/continuous-delivery-interview-with-jez-humble>Robert also recorded a nice interview with Jez.</link>
Nice was also the mentioning of two possible deployment methods:
<b>canary releasing:</b> Is a method to only route some people to the new version. This way you can monitor the application and the user behaviour and then decide wether to roll it out for all or not. Thats what google also do often. 
<b>dark launching:</b> Its a nice method to deploy a new feature that should replace an existing feature: What you do is to fire the new implementation with real traffic already in the backround - but the customers still using the old implementation. This way you can test new implementations with less risk before switching them visible.
<h4>FLOW 3</h4>
Robert and Karsten did a great job to prepare nearly 2 days of FLOW3 workshops. All of them are available in the Vimeo channel.&nbsp;
It only about some days that the team will finish the last work on FLOW3 Beta Release and the Documentation - and then its time to consider FLOW3 when it comes to decide on a framework for a new project.
<h4>cloud deployment</h4>
Also very interesting was the talk from Andrei about deploying and hosting TYPO3 projects in the cloud. The video is online: <link http://vimeo.com/25080432>Video T3CON11-SF: Fluffy TYPO3 Automatic Deployment of TYPO3 in the Cloud</link> 
He mentioned also Chef and Puppet - both are tools that helps to automate the setup and configuration of your infrastructure.&nbsp;
I am still searching for the promised code samples in the presentation as well as the mentioned TYPO3 improvements (like storing sessions in a key-value store).
 <b>Puppet:</b> With Puppet you can describe your system configuration and dependencies at a central place and you can automate the deployment and infrastructure management. The learning curve is quite high - but it seems to be worth it. http://www.puppetlabs.com/puppet/introduction/
<b>Scalr:</b> Seems to be a nice GUI based tool to set up your first cloud based infrastructure. http://www.scalr.net/

Thats it for now - here are some relevant links:
<h3>Links:</h3>
<ul><li>T3CON11 San Francisco Vimeo Channel: http://vimeo.com/channels/207300/</li><li>Slideshare: <link http://www.slideshare.net/event/t3con11sf>http://www.slideshare.net/event/t3con11sf</link></li><li>FLOW3 News: http://news.typo3.org/news/article/on-the-road-to-flow3-10-beta-1/</li><li>FLOW 3 Tutorials from Thomas: http://www.layh.com/work/flow3-fluid.html</li></ul>]]></content:encoded>
			<category>news</category>
			
			
			<pubDate>Thu, 23 Jun 2011 20:12:00 +0200</pubDate>
			
		</item>
		
	</channel>
</rss>
