Feed on
Posts
Comments

Sorry for a grouchy new years post , but I figured I will attempt to get started blogging for the year with this post. I had authored this post a while back ( October 09) when I struggled with deployment of a python app. Much of what I talk about still holds true. Indeed python deployment remains a problem with many solutions , and therefore quite painful since there is still no sure shot way to deploy a python app across platforms.

Its been almost three weeks since I released my maiden python-wxpython app. While Ive been busy doing other things , I decided to write this post to say how painful the “deployment” process of writing an application has been.
The most ironic fact is that the easiest platform to deploy for has ironically been Windows.
Once I used py2exe and innosetup I had a working setup.exe that installed the app on Windows Xp and Vista , 32 and 64 bit.
The next most fun platform to create a deplyment target for was linux. It was almost trivial to use the cxfreeze to create a binary build for Ubuntu Linux 32 bit and 64 bit . With the help of a Virtualbox VM for 64 bit SUSE, I could even release a binary build for that platform because someone asked.

But thats when the fun ended!. The mac has been an extremely painful platform to deploy python application for!.
I am using py2app to create standalone binaries. The idea is to have an app that I can click on and have it launch on any mac.
Some interesting hurdles:
1) If you create a standaone py2app build on Leopard it is not guaranteed to work on tiger
2) If you create one on Leopard it definitely does not work on snow leopard.

The only solution is to create a standalone app on tiger and then maybe it will work on Leopard.

In the end I did manage to solve my mac deployment issues which I detailed on this wiki page. Also Chris Lasher Lasher pointed me onto his series of links which talk about python deployment, Ian Bickings response is specially worth noting.

Since October 2008 my good friend Atom ( Deepak singh to most of you) and myself have spent many of our Sundays recording the coast to coast bio podcast.  With the last episode the c2cbio podcast has completed 25 episodes, and we are thrilled. These 25 episodes have seen us talking about everything from version control , to meta-programming to synthetic biology to to pubsubhubub  to synchrotoron data collection to hadoop and all things cloud computing. The unifying theme  ( if any) is that mixing technology and science makes both better and definitely a lot more fun.

Its mainly thanks to c2cbio that I finally decided to sit down and start coding . Thanks to our many conversations about what makes a good programmer ,  version control , IDEs , test driven development and agile programming, I decided to try and put these into practice while I attempted to statisfy my code-itch.  Thanks to all the git topics from Atom , I too decided to start playing with git. The fun part was when I could see myself coding a lot more and writing code that even I reusedAnother conversation had us talking about Living code , about how Paul Buchheit built gmail and add-sense  . The take home message for me was to prototype and test extensively, and release often .   All of these lessons combined with version control have ensured that I am few days away from a simple app that hopefully makes creating and keeping track of crystallization solution grids easier.

Thanks to the podcast I feel I get to keep up with the goings on, especially when bench work leaves me with little time to browse and catch up.

Thanks to everyone for listening and writing in  . Heres to podcasting , coding and hopefully a lot more episodes of c2cbio.

Ahh ! for argparse

Its been a while since I blogged . Just a lot happening on the crystallography side of things to allow me the time to blog.

I have also been coding a lot lately and have gotten started with some GUI writing in wxpython.

This post is about trying to get back into the groove by telling you about argparse . Having talked about optparse and command line parsing , I heard about argparse thanks to a talk that I caught  by clicking a link on  the Pycon2009 master schedule .  This link has all the Pycon 2009 slides and video links in one place – a great resource.

For those interested in argparse check out the “Plenary Evening session ” on Sunday Mar 29th 1.20 pm at PyCon2009 where Steven Bethard talks about argparse.

The most important difference between argparse and optparse is that argparse provides better handling for positional arguments in addition to optional arguments . Argparse also provides automatically generated useage information and   takes care of handling cases like when the user forgets to provide any arguments and prints the help information by default.

The key differences are summarized in the excellent documentation at this URL , so I will not rehash it . But I have happily switched over to argparse from optparse and hope argparse becomes part of the python standard libraries soon.

For some code examples check out my github repository especially the script maskconvert.py

I own a kindle2 and its very easy to push pdfs to the kindle. Unlike the misconceptions out there , the Kindle2 is NOT protected and offers “free” ( see below)  conversion of pdfs to the *.azw format that you can then “push” to your kindle2. 

Here is how it works:

When you buy a kindle2 . The owner can register the device with his/her amazon account and then associate a new email address with the kindle2. for eg alberteinstein@kindle.com.

Once you do this you can send pdfs to your device in the following two ways
1) email the pdf attachment to alberteinstein@kindle.com. Amazon converts the file and pushes it to your device wirelessly ( over the “whispernet” ) . you get charged 0.10 per email for pushing it to your device 

OR 

2) You can email the same atatchment to alberteinstein@free.kindle.com  ( note the FREE) . Amazon converts the file and emails it to the amazon account email address (in my this case say alberteinstein@gmail.com) . I can then save the file to my computer and use the provided usb cable to save the converted document to the “documents” folder in the kindle. This way you dont get charged anything. 

In addition to the above “amazon blessed” ways of pushing content to your kindle2. There are standalone applications like Calibre and mobipocket that convert documents (pdfs , word doc files , html pages) to the mobibook format on which kindle2 *.azw format is based. Calibre runs on Mac, Windows and Linux and Mobipocket is a windows only app.  I have not yet tested mobipocket but  Calibre by kovid Goyal is a free and open source app that offers a multi-platform itunes-like front end to push and manage  third party content on the kindle2. With Calibre you can convert pdfs to the *.mobi format and save it to the documents folder for reading on your kindle .

I recently tested an old Acta Cryst paper (1999) for conversion to the kindle2 format using all three approaches above. The conversion and upload to the device was trivial using either the whispernet -push or save to kindle2 via usb via free email or Caliber.app on the Mac.  The pdf immediately shows up in your library . The text of the paper was immensely readable but when it came to equations and symbols all hell breaks loose . Most of the equations in these papers were probably not embedded as images but instead as their equivalent fonts. I am sure the encoding of these fonts to the mobibook format is not trivial and it shows when simple equations line 

B = A + T have the = and + symbols all messed up  with (?) symbols once converted  i.e B ? A ? T

SO it is quite difficult to read the paper on the kindle2 , when many of the equations are garbled.

This is obviously an evolving space. Even for pdf until say five to six years ago it was not uncommon to have  funky characters replacing our alphas and taus in the printed pdf . Whats most intersting is that projects like Calibre bring the power of open-source approaches to the kindle and other ebook reader platforms . I wont be surprised if the open source world rallies behind open software to allow users to create and make available content that can be read on your ebook reader of choice .

Seeing how it took pdf nearly ten years to be the format of choice for electronic wysiwyg documents , I hope we dont have to wait too long for all content to be seemlessly transcoded for reading on any given ebook reader 

refs: Nature in its 2nd April 2009 issue has two news features on ebook readers. , ireadreview an good site for reviews on all thinks ebook readers

Command lines interfaces to programs are very empowering. I started using computers with the  Linux command line and  I have never strayed too far from programs that are predominantly command line based. Whether it was rasmol , povray , phenix or ffmpeg I always found that the command line gave the program a more  transparent interface . By that I mean it was easier ( at least for me ) to figure out how to do decipher a  manual page  than to go looking for a particular functionality in a GUI window.

No in the case of python ,  I have been writing scripts that take in user input from the command line for a while now. In these scripts , I would generally accept only one input and that would be the first in the sys.argv list. If I had more than one input I would iterate over the input list and try and figure out what the inputs were . Even worse in most cases I would  hard code the order  of inputs into my code ( terrible practise). Fortunately  for me , my discovery of the optparse module has changed all that.

The optparse module is  an object oriented ( dont let that scare you) and super-intuitive way to add command line options to any python scriptSo say you want to add an input file command line switch with the -i attribute , All you have to do is

from optparse import OptionParser

optparse_object =OptionParser()

optparser_object.add_option(“-i”,”–infile”, dest=”infile”,help=”input file for script” , metavar=”[infile.txt]“)

Once you do this you can easily have the module parse the sys.argv list and make sense of it .So you would add the following line

options_object, spillover_options = optparser_object.parse_args()

Then options_object.infile  will have the value of the input option . This is specified by the dest section in the add_option argument list) . The nice thing with the module is that all possibilities can be mapped to the same options_object.infile destination . So for eg I have mapped “-i” and “–infile” to the same destination .Even better is the option to add a help string with the help=”help text” argument . This help is then printed out if the user provides an option that the script cannot handle or if the code specifically calls  the optparser_object.print_help() function.

For a concrete example on how to use the opt_parse module consult the docs or my example code on github .

My Mac-laptop’s been running Leopard ( OSX ) for quite some time. For the times when I wanted to use a windows app I had installed both VMWare fusion and Parallels Desktop ver 3 . Both of these did get the job done , but that came at a price ( $49 to $75)

A few days back I tried out a FREE virtualization application from Sun called VirtualBox.

Besides being free, Virtualbox was amazingly easy to install and very very functional.  I did have to go through a full Windows XP install , but I was up and running immediately. This was a lot better than  say VMware fusion , where I had to struggle to install the applications that allowed my mouse to roam freely between the host OS and the guest OS .  Also on linux where I have been using it for some time ..its very stable.

While Virtualbox is free ,  as of version 2.1.3 , it does lack some of the features that Parallels and VMWare provide , like drag and drop , and it does not support the function key for Mac osX ( read this Ars technica review).  Also virtualbox is improving at quite a pace and you never know any feature you are missing might already be in the newest version .

All in all its a great way to test your apps on windows while runing linux for free!

I dont know why I feel so ecstatic at the thought of running an experiment remotely . Maybe its all the NASA TV I watched as a Graduate student or the several telemicroscopy talks I attended next door at the NCMI. This post is a little about the wonders of robotics and the great things engineers do that makes it possible for scientists to do better fundamental research and mostly about little pieces that fit together to enable good science.

Last week we test ran a remote data collection at the Berkeley synchrotron. Normally we would have flown all the way to Berkeley to then manually mount our crystals on the diffraction experiement setup and then spent 24 to 48 hours collecting diffraction data. This time however we shipped our dewar to the synchrotron , had the extremeley helpful beamline scientsists load our crystals which were stored in specially designed pucks onto the crystal mounting robot, after which we controlled the entire experiement remotely sitting in the comfort of our lab at Brandeis or even at home over the weekend.

The amazing part about the experience was the real-time nature of the control . In the video you will see us align a crystal by clicking a window on an nxclient session . The robot responds almost immediately to our click event. The video also shows the robot moving the dewar open and mounting crystals. Its quite something when you realize that the video is a screen capture of our nxclient session. So the video of the crystal moving during centering and the robot motions is all pushed through in near real time. 

Now I am sure any network guru or video delivery specialist is saying , this technology has existed for a while, there is nothing magical about this. But somehow I think its nice to recognize technology such as this for its enabling power.  Just as I am amazed when I conduct a three way video conference with my family in three separate countries over ichat , I am even more amazed that I can seemlessly control an experiment all the way across the country from the comfort of my home .

( The ccp4 wiki has a list of synchrotrons offering remote data collection services )

The protein crystallization grid project I am undertaking has  convinced me of the virtues of version control. Knowing I can revert back to an older version has ensured that I spend more time being adventurous , than being paranoid of going down the wrong track and not being able to trace my path back.

On Coast to Coast Bio , Atom and I have often talked about the many ways people are using git : Blog posts , publication manuscripts and  database entries to name a few. Since crystallographic refinement occupies a significant portion of a crystallographers time , I decided to see how my personal git workflow would adapt to crystallographic refinement.

In crystallographic refinement , most of the routines are scripted using script files which typically manipulate binary data and asci coordinate  files ( the protein databank format) . Each step  spits out a new coordinate  file  and a text log file which serves as a record of that operation . For eg a partuclar refinement step that calls on the phenix refinement routine would be run as

“phenix.refine myinput.pdb mydata.mtz > run1_myinput_mydata.log”

here the input pdb and output log are text files and the data is a binary formatted file ( the mtz format). When finished this analysis would output a pdb file whose name is often automatically “versionned”  by the program using a different name say “myinput_001.pdb” . Versionning is assured by keeping a series of input, output and pdb files all resident on the project directory. Retracing your path is easy if you knew which version you wanted to go back to based on a file name and say some scribbled notes in a Readme file or log file.

Now you would think this is indeed something that works. But imagine the case when you come back to your refinement directory a few weeks or months later. All you see is a directory full of tens of pdb files and log files and hopefully a single Readme file detailing all the steps along the way. This can be quite difficult to follow along with . Stepping back is posisble using an old model file . But once this is done i have to come up with a new naming system to understand the history of the refinement , or worse still rely on timestamps. Also and very importantly  stepping back to a previous step is only possible for files whose names have changed at every step when their contents changed .

After using git for just one project I am quite convinced that git has a lot to offer for crystallographic refinement. Git  allows me  to return my directory at any point of time to its state at an earlier commit . Say I used a series of refinement steps that generated tens of log files and then suddenly decided i was getting nowwhere. With the non-git setup,  i could revert to an earlier model file . But that still leaves tens of log files around cluttering up my work directory. In the git case checking out a previous snapshot returns my working directory to its  clutter free early stage without deleting a record of all my failed approaches.

Also during most refinements i tend to use similar sounding names for my model files. This can quickly get messy  . With git even if accidentally use the same name like myfinalmodel.pdb. I can always version this file without descriptive suffixes.  Importantly also git preserves the history of commits as commit trees . A flat directory heirachy does not achieve this as well as a commit history. Another big plus is that git allows me to work on multiple machines and merge my work between them.  Without this , i am left with moving files back and forth and making sure their content didnt change while keeping their names the same.

At the presnt moment I use the ccp4 and phenix guis as front ends extensively to manage my refinement “workflow” . In the case i am using git . Git sits on top of these files versionning things as they go along. If I had a few months of spare time ( yeah right) I would love to create a backend to ccp4i that builds in sha1 based versionning of all files handled by all the refinement  methods . Its quite nice to use git alongside and watch my commit trees to keep track of my refinement. I have just begun using git in this way and hope to have my screencasts detaling my git workflow soon.

I just realized I need to revert to my pdb files of two build sessions back , so its time to

“git checkout 3ac94e79552c11025d7bb01f9a98b7afc1637e60 myfinalpdb.pdb”

Over my few years writing scripts in Python or Perl ,  I always told myself that the next time I had to solve a problem I would make my code more object oriented and re-useable. Like an addict trying to break a habit , I always failed miserably. It was just too easy to just write one giant script to get the job done. So much so that when I started writing this blog almost two years ago , I once again stated my desire to break this habit. Also along these lines, I always felt I should be using some form of version control. Even for a script it made sense to be able to experiment with code and go back and forth on a source tree. But again I just did not have the time to do any of that!

I am quite happy to say that I have finally managed to write an object oriented set of python classes that create dispense lists for  a liquid handling robot . Also I went ahead and used git for version control all along the way. 

The problem I was solving was fairly routine . I had to create an easy software routine to create dispense lists for the Formulatrix liquid handling robot in the lab. Each dispense list was a list of volumes of  the robot had to dispense to each well of a 96 well plate.

Crystal screens typically keep some components constant across a 96 well plate while varying some others systematically. Therefore the dispense list creator code had to deal with creating gradients of components along the two axes of the plate and also handle component staying constant . Regardless the final output is a tab delimited set of volumes that the robot then uses to carry out the dispense. Imagine a 96 character array with volume for each well and one array for each component.

Like all object oriented approaches tell you , it helps to model the problem space appropriately. This involve defining all the actors in the problem you are trying to solve and writing appropriate classes. For me that was the hardest part. Object oriented analysis and design is Hard! But I persisted in the good faith that the object orientation would pay off , and  once I modeled my classes appropriately the code would just write itself .

As my code grew . Adding new features became easier than it ever had been when I was writing  imperative linear scripts. Also thanks to some timely and excellent help from Michael Foord and Chris Lasher delivered over twitter and a blogpost , I could use the object oriented nature of the code to quite some advantage . Incidentally Michael Foord  writes an excellent pythonic blog at voidspace,is one of the main developers at Resolver Systems and the co-author of IronPython in action and Chris Lasher is on the Bio-python team and an avid Bioinformatician who I got introduced to thanks to Atom.

The code that creates these dispense lists and the examples are posted on github . Take a look and let me know how I did with the design aspect of my classes . The most exciting moments of writing object oriented code is when you realize that adding functionality comes so much easier than procedural code . I am thinking in terms of writing a GUI to wrap the functionality of the code in additon to a web app/form based CGI frontend. Any help and suggestions are welcome.

The icing on the cake was when the code actually worked and I now have a few custom screens with multiple gradients hopefully growing giant membrane protein crystals.

Now onto writing some GUI code.

Almost two years ago I first started writing the codeitch weblog about my code ambitions on wordpress ( codeitch.wordpress.com) .This blog was driven by a strong desire to get better at writing code and importantly building solutions that enabled better science. To do that I had decided to pick up python , Java , javascript and also master Excel and Matlab. Since a new year is always a time for reflections , resolutions and such, I figured I will write down this progress report.

Python: Python has clearly becoming my first choice for tackling any problem. Whether it involved analysing the many trajectories generated by  the channel finding program HOLE , or writing an http based API for Bioscreencast , or an HTTP based web service for Bioscreencastwiki, python has ensured that I can get it done . There is no problem that I dont attempt to solve using python . Python has even ensured that  I jettison a desire to pick up Excel and instead adopt Resolver one , the pythonic spreadsheet instead . My only problem with python is the learning curve to pick up a new API . The dynamic typing sure makes it hard to unravel code by just reading source , something that I was used to doing having leant to code in Java. Further , tools like netbeans and Eclipse with code completion also make it easier to familiarize yourself with a new API  in Java  . In python , I still go about it using the dir builtin function and reading the various forms of python documentation available on the web. In some ways I really cannot wait for something like nbpython to become fully useable.

Java : Since I am still developing code to solve problems for myself or web-centic problems for an apache based webserver . I have almost not written a single line of java code the last year.  Python  has ensured that java takes a backseat.

Javascript : To me javascript is  mainly a UI language , since I regard the browser as the ultimate UI . Since I have yet to get into the UI writing mode , my javascript useage over the last year was restricted to troubleshooting the few glitches we had at Bioscreencast . Again tool support ( like the amazing firebug utility or even netbeans) ensure that I dont abandon picking up javascript in my spare time and definitely plan to use javascript and a browser as my UI of choice  when the need arises.

Excel : Having got on to the python horse, I can hardly justify the headaches that mastering Excel syntaz gave me . And for most simple spreadhsheets I have gotten used to Google spreadsheets.

Matlab: My desire to master matlab was to understand the many algorihtms I encounter in X-ray crystallography. In may ways Python and MatPlotlib have ensured that I can get a lot of that done using python , without the need to learn the Matlab way of things . I have also attended a few Mathematica Seminars and am extremely impressed with the Mathemtica 6 ( and 7) platform . For understanding algotihms and the behavior of functions etc I have been turning to Mathematica and python more than Matlab.

So in summary , python and the pythonic way are what I have  embraced in 2008 and really want to start writing python code that starts to build on this foundation in 2009.I am also really starting to use Mathematica and pick up processing as a java centric visualization platform and hope to write about these in this new year.

Older Posts »