Monthly Archives: November 2012

Getting Solr 4.0 running on Amazon EC2 for django-haystack

After almost two years this is my attempt at getting back to blogging. These instructions are not too detailed. But I spent some time trying to get Solr 4 .0 setup to serve up my django-haystack search results and figured I would document the two hurdles I faced.

I have been running apache-solr for my django-haystack search on my home Ubuntu Linux box. The whole setup was working great but the requirement to keep this box powered on all day , coupled with the noisy fans on the box made me decide to switch to hosting my solr instance on the cloud.
After much “googling” I couldnt find anything warning against running apache Solr on an Amazon micro instance so I decided to give it a try since it was going to be a dev instance anyways.

What I wanted to do : Get a full Solr 4.0 setup which would index my django dev database and serve up search results. I am using a T1.micro instance on the Amazon EC2 cloud

Step 0 : Get a new Amazon micro instance. I am using an Ubuntu 12.04 LTS 64 bit instance. I got all the required packages and java on there and installed tomcat6 ( sorry this step is deficient in details)

Step 1 : Get and install apache-solr . I used the 4.0 (dubbed Solrcloud) release which is quite different from earlier solr releases. From what I understand Solr 4.0 has a better support for distributed indexes .

tar -zxvf apache-solr-4.0.0.tgz
cd apache-solr-4.0.0

Step 2 : Get the schema.xml from django to play nice with the new solr 4.0.
Solr 4.0 has changed a little bit how it does things. While the start.jar is in the same place the conf and schema.xml directory are now in a few places since solr now has split up the data blocks into collection directories. For now I decided to co-opt the collection1 directory to serve up my django index.

The schema.xml from django ( gotten by running “python build_solr_schema”) now needs to be placed into the example/solr/collections/conf directory

cp schema.xml $HOME/apache-solr-4.0.0/example/solr/collections/conf

Edit this schema.xml to add the reserved __versions__ field name into the fields section.Since Solr’s updatelog is “ON” by default , the __version__ field name is required in the configuration. Alternatively you could turn that “OFF” by editing solrconfig.xml. I chose to leave that setting as it is and instead added this now required field name to the schema.xml. Look for the “fields” field in the django generated schema.xml and add the lines shown below anywhere inside that block. Here is what mine looks like after I added the “__version__ field name.

<!-- general -->
<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="django_ct" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="django_id" type="string" indexed="true" stored="true" multiValued="false"/>
<!-- added this field for solrcloud , to play friendly with updatelog -->
<field name="_version_" type="long" indexed="true" stored="true"/>
<dynamicField name="*_i"  type="int"    indexed="true"  stored="true"/>
<dynamicField name="*_s"  type="string"  indexed="true"  stored="true"/>
<dynamicField name="*_l"  type="long"   indexed="true"  stored="true"/>
<dynamicField name="*_t"  type="text_en"    indexed="true"  stored="true"/>

Once this was done. I could easily start my solr jetty server and then ask django to rebuild the index

Step 4 : Start the solr jetty server
In directory apache-solr-4.0.0/example
java -jar start.jar

Step 5 : Rebuild the index

python rebuild_index

And everything is up and running.
Edit: I had some trouble setting up a start and stop script with ubuntu that would start the solr process and stop it on boot and add it to the default run level . Finally I followed the clear instructions at this blog entry and the script described worked just great.