Monthly Archives: February 2009

The Berkeley synchrotron Brandeis Campus: Remote synchrotron data collection courtesy the tubes

I dont know why I feel so ecstatic at the thought of running an experiment remotely . Maybe its all the NASA TV I watched as a Graduate student or the several telemicroscopy talks I attended next door at the NCMI. This post is a little about the wonders of robotics and the great things engineers do that makes it possible for scientists to do better fundamental research and mostly about little pieces that fit together to enable good science.

Last week we test ran a remote data collection at the Berkeley synchrotron. Normally we would have flown all the way to Berkeley to then manually mount our crystals on the diffraction experiement setup and then spent 24 to 48 hours collecting diffraction data. This time however we shipped our dewar to the synchrotron , had the extremeley helpful beamline scientsists load our crystals which were stored in specially designed pucks onto the crystal mounting robot, after which we controlled the entire experiement remotely sitting in the comfort of our lab at Brandeis or even at home over the weekend.

The amazing part about the experience was the real-time nature of the control . In the video you will see us align a crystal by clicking a window on an nxclient session . The robot responds almost immediately to our click event. The video also shows the robot moving the dewar open and mounting crystals. Its quite something when you realize that the video is a screen capture of our nxclient session. So the video of the crystal moving during centering and the robot motions is all pushed through in near real time. 

Now I am sure any network guru or video delivery specialist is saying , this technology has existed for a while, there is nothing magical about this. But somehow I think its nice to recognize technology such as this for its enabling power.  Just as I am amazed when I conduct a three way video conference with my family in three separate countries over ichat , I am even more amazed that I can seemlessly control an experiment all the way across the country from the comfort of my home .

( The ccp4 wiki has a list of synchrotrons offering remote data collection services )

Using git for keeping track of crystallographic refinements

The protein crystallization grid project I am undertaking has  convinced me of the virtues of version control. Knowing I can revert back to an older version has ensured that I spend more time being adventurous , than being paranoid of going down the wrong track and not being able to trace my path back.

On Coast to Coast Bio , Atom and I have often talked about the many ways people are using git : Blog posts , publication manuscripts and  database entries to name a few. Since crystallographic refinement occupies a significant portion of a crystallographers time , I decided to see how my personal git workflow would adapt to crystallographic refinement.

In crystallographic refinement , most of the routines are scripted using script files which typically manipulate binary data and asci coordinate  files ( the protein databank format) . Each step  spits out a new coordinate  file  and a text log file which serves as a record of that operation . For eg a partuclar refinement step that calls on the phenix refinement routine would be run as

“phenix.refine myinput.pdb mydata.mtz > run1_myinput_mydata.log”

here the input pdb and output log are text files and the data is a binary formatted file ( the mtz format). When finished this analysis would output a pdb file whose name is often automatically “versionned”  by the program using a different name say “myinput_001.pdb” . Versionning is assured by keeping a series of input, output and pdb files all resident on the project directory. Retracing your path is easy if you knew which version you wanted to go back to based on a file name and say some scribbled notes in a Readme file or log file.

Now you would think this is indeed something that works. But imagine the case when you come back to your refinement directory a few weeks or months later. All you see is a directory full of tens of pdb files and log files and hopefully a single Readme file detailing all the steps along the way. This can be quite difficult to follow along with . Stepping back is posisble using an old model file . But once this is done i have to come up with a new naming system to understand the history of the refinement , or worse still rely on timestamps. Also and very importantly  stepping back to a previous step is only possible for files whose names have changed at every step when their contents changed .

After using git for just one project I am quite convinced that git has a lot to offer for crystallographic refinement. Git  allows me  to return my directory at any point of time to its state at an earlier commit . Say I used a series of refinement steps that generated tens of log files and then suddenly decided i was getting nowwhere. With the non-git setup,  i could revert to an earlier model file . But that still leaves tens of log files around cluttering up my work directory. In the git case checking out a previous snapshot returns my working directory to its  clutter free early stage without deleting a record of all my failed approaches.

Also during most refinements i tend to use similar sounding names for my model files. This can quickly get messy  . With git even if accidentally use the same name like myfinalmodel.pdb. I can always version this file without descriptive suffixes.  Importantly also git preserves the history of commits as commit trees . A flat directory heirachy does not achieve this as well as a commit history. Another big plus is that git allows me to work on multiple machines and merge my work between them.  Without this , i am left with moving files back and forth and making sure their content didnt change while keeping their names the same.

At the presnt moment I use the ccp4 and phenix guis as front ends extensively to manage my refinement “workflow” . In the case i am using git . Git sits on top of these files versionning things as they go along. If I had a few months of spare time ( yeah right) I would love to create a backend to ccp4i that builds in sha1 based versionning of all files handled by all the refinement  methods . Its quite nice to use git alongside and watch my commit trees to keep track of my refinement. I have just begun using git in this way and hope to have my screencasts detaling my git workflow soon.

I just realized I need to revert to my pdb files of two build sessions back , so its time to

“git checkout 3ac94e79552c11025d7bb01f9a98b7afc1637e60 myfinalpdb.pdb”