Django, Haystack and Elasticsearch – Part 1

I’m wrapping up a little side project at the moment (more on that very soon) which required full-text search, autocomplete, and a few other bits of search related functionality.

After some research I landed upon the combination of Elasticsearch and the awesome Django application Haystack.

First step was to get Elasticsearch up and running locally on OS X…

1) Download latest zip from http://www.elasticsearch.org/overview/elkdownloads/. A good spot is:

/opt/elasticsearch-1.1.x

2) Create the following directories:

/opt/elasticsearch-1.1.x/data
/opt/elasticsearch-1.1.x/work
/opt/elasticsearch-1.1.x/logs

3) Add the following to your .profile (allows you to run Elasticsearch from the command prompt without the full path):

4) Update the following values in the Elasticsearch config file:

5) Ensure all requirements are installed (django-haystack, pyelasticsearch, requests, simplejson):

6) You should now be able to start Elasticsearch:

7) Add Haystack to your Django config:

8) After you’ve added your search indexes, you can use manage.py to rebuild the search index:

$ python manage.py rebuild_index

Python Introspection

Python has a strong set of introspection features.

Take a look at the following built-in functions:

type()
dir()
id()
getattr()
hasattr()
globals()
locals()
callable()

type() and dir() are particularly useful for inspecting the type of an object and its set of attributes, respectively.

Useful Python and Django Packages – Update

In a follow up to this post, here’s what I’ve been making use of lately:

BeautifulSoupbeautifulsoup4
HTML/XML parser for quick-turnaround applications like screen-scraping.

Pillow
Python Imaging Library (Fork)

Scrapy
A high-level Python Screen Scraping framework

Unipath
Object-oriented alternative to os/os.path/shutil

celery
Distributed Task Queue

django-breadcrumbs
Easy to use generic breadcrumbs system for Django framework.

django-model-utils
Django model mixins and utilities

django-supervisor
easy integration between djangocl and supervisord

django-user-agents
A django package that allows easy identification of visitor’s browser, operating system and device information (mobile phone, tablet or has touch capabilities).

django-widget-tweaks
Tweak the form field rendering in templates, not in python-level form definitions.

factual-api
Official Python driver for the Factual public API

unicode-slugify
A slug generator that turns strings into unicode slugs.

user-agents
A library to identify devices (phones, tablets) and their capabilities by parsing (browser) user agent strings

virtualenv
Virtual Python Environment builder

virtualenvwrapper
Enhancements to virtualenv

Pytz 2013b Fails with PIP 1.4

UPDATE: Pytz is now using a numeric naming convention (i.e. 2013.7) which fixes this bug.

The latest version of PIP only installs stable versions by default, meaning anything with a pre-release version name will be ignored. Unfortunately the Pytz naming convention looks like a beta release (2013b) when it is actually the second release of 2013.

More on this bug can be found here:
https://bugs.launchpad.net/pytz/+bug/1204837

For now the workaround is to add the version when installing:

pip install pytz==2013b

Useful Python and Django Packages

When I first started out with Python / Django it quickly became apparent that there are a ton of open source packages out there. Here’s a rundown of some of the ones I’ve found to be most helpful so far…

boto
Python interface to Amazon Web Services.

django-compressor
Compresses linked and inline javascript or CSS into a single cached file.

django-extensions
Useful extensions for the Django framework.

django-mptt
Utilities for implementing a modified pre-order traversal tree.

django-positions
A Django field for custom model ordering.

django-secure
Helping you remember to do the stupid little things to improve your Django site’s security.

django-storages
A collection of custom storage backends for Django.

oauth2
A fully tested, abstract interface to creating OAuth clients and servers.

python-memcached
A Python-based API for communicating with the memcached distributed memory object cache daemon.

pytz
pytz brings the Olson tz database into Python.

simplejson
simplejson is a simple, fast, extensible JSON encoder/decoder for Python.

South
Intelligent schema and data migrations for ​Django projects.

tweepy
Twitter for Python!

Werkzeug
Werkzeug is a WSGI utility library for Python.

yolk
Command-line tool for querying PyPI and Python packages installed on your system.

I’ll try to keep this list updated as I find more. Please add any must haves in the comments.