Why Bioinformaticians have to grin and bear it!
Nov 7th, 2008 by harijay
If anyone feels this post is provocatively titled. I can only offer as defence sagging traffic to this blog.
But serioulsy! I am writing to defend what I perceive as a tendency among bioinformaticians to complain about Biologists and their tendency to not respect things like structured data , file formats , data portability and various other concepts at the centre-stage of data management.
The bottom line is that experimentalists do what they are trained to do best – experiment. In most cases ( and I agree sadly) experiment design does not extend to the data storage or management level. People tend to store data in ways that makes sense to them and mostly only them. We are all fortunately wired differently and most things make sense to an experimenter only when ordered or “structured” in a way that he/she likes it.
To offer a simple example , I have often found others commenting that they cannot make sense of the table layout I have chosen or the order in which I load my proteins samples on my electrophoresis gel. In most cases I disagree with the offered suggestion and persist with my “bad” ways!.
In defence I offer the statement “It is never difficult to reformat data , It is always difficult to repeat an experiment”
To elaborate : To force formats or structure on an experimenter is often way more difficult than reformatting the data using a computational approach.An experimental workflow can incorporate structure extending all the way to the data storage level , but often this increases the level of effort for the experimenter and may even complicate the experiment in ways that make it harder for the experimenter!. Experimentation is difficult , bioinformatics is easier!
Therefore I say that Bioinformatics will have to live with the burden of data-munging!. The only way out of it is to catch the young experimenter and teach him some aspects of data mining or electronic record management at the very same time that we teach them how to conduct an experiment.
Dont get me wrong, I am not condoning poor experimental record keeping or unstructured data!. In most cases a simple re-think can ensure that are spreadsheets are more comprehensible or structured , but since it will always be easier to write a thousand line reformatting script than to force an experiment to output the data in a format that will make a resident bioinformatican happy .
Till the generation of structured data aware scientists take to the bench in a big way ..I am sorry to say bioinformaticians will just have to grin and bear it!
refs :”The Saunders Principle“, “Comment on the saunders principle from Chris a Miller” ,
I think we have something to talk about this weekend
I _completely_ agree with your remark that “experiment design does not extend to the data storage or management level”.
Apart from that (and being a bioinformatician who has do all the data massaging afterwards): we cannot always ask wet-lab experimenters to provide their results in a fixed format because the type of results is not known yet. Often it is (e.g. running standard PCRs or resequencing), but not always.
What I mainly want from my data providers (read: the lab people generating the data) is that they are *consistent* in reporting. As long as the files I get are nicely structured, I can get thing done. But if they give me an Excel file with for example one sheet of results per chromosome and each sheet has a different format (even though it represents the exact same type of results) I know I’ll have a bad day/week.
I wholeheartedly agree Jandot , its very annoying to see inconsistencies and many a time I too find myself “improving” my data reporting by ad-hoc changes to the format or level of reporting. Somehow I think these problems are more plentiful in academic research . But I am convinced , that the way out is to gently coax everyday experimentalists the importance of structured data. At the risk of stating the obvious, if experimentalists were aware of the purpose of xml or json or rdf etc collaborations would definitely benefit
[...] and the Saunders principle wherein Bioinformaticians struggle with inconsistent formats , prompting code-itch (Hari) to complain that Bioinformaticians have to grin and bear it! [...]
Jan and I agree on the consistency part. It’s the most important part. no consistency, no workflows
I’ve been meaning to comment on this for ages, but wanted to get through the podcast first. I think you guys covered it though; in particular I agree with Deepak that “everyone is right, everyone is at fault”.
I don’t expect experimentalists to become computer programmers, or to care very much about data formats. What I did expect, for a while, is that they’d be interested in anything that made their lives easier or their research more efficient. Experience has shown that if this involves learning even a modicum of new computer skills, it’s usually a vain expectation, with rare exceptions. When I “complain” about biologists, it’s only out of sadness that they are missing out on all the wonderful tools that make my own work easier and more enjoyable. And I speak as a wet lab guy turned bioinformatician, so I’ve seen both sides of the coin.
I don’t know how to overcome this reluctance to consider anything that lies outside of a narrow band of education and skills. It’s extremely prevalent in academia, at least in Australia and leaves me quite depressed at times. But as you say, we’re all wired differently – I just have to accept that people make their own choices as to what’s a good use of their time.
That said, the next person who send me sequences as a Word file is going to find it back in their inbox with a short note as to why this is inappropriate
Thanks Neil for your excellent comment. I couldn’t have said it better . In the end we all benefit by consistency and standardization.
What we do is difficult enough and every skill we apply to make our life easier cannot be underestimated, whether it comes from better designed experiments and data formats or more lucid specs and bioinformatics workflows.
I really respect bench-scientists turned bioinformaticians and think you fortunately bear the burden of bridging the divide . Heres to more conversations that help educate both sides ..and I am with you a 100% about word attachments ..every carrot does need a stick at the other end…thats the only way we will all learn and benefit
Oh and one more thing..I think the “Saunders principle” is a great meme ..I plan to use it the next time I have to talk about the importance of structure and data
The biologists are always ahead of informatics because they’re in the business of creating innovative procedures that won’t necessarily conform to existing data structures. They do know spreadsheets, which we can deal with fairly easily and which they can use as an external data format until we can integrate it (or its abstraction) with our system. The essential elements are good communications and fostering team unity.