Tuesday, March 13, 2007

Beginner's Guide to Bioinformatics

As a computer scientist coming into Bioinformatics I was faced with the heavy task of catching up on my Biology and Chemistry (I was a Physics minor in undergrad but that wasn't applicable to my Bioinformatics catch up). This meant two semesters of General Chemistry, a semester of Organic Chemistry and a semester of Cell Biology. Though all this course work was very educational and useful for my degree I don't think its all that necessary for a someone who may be interested in fooling around with Bioinformatics problems on the side.

Here is a very general overview of cell biology for Non-Biologists wanting to get involved in Bioinformatics:

  1. Proteins are the essential part of all living organisms. Proteins have a variety of functions and are involved in every process within our cells. [Wikipedia]

  2. DNA is the blueprint for proteins. Segments of DNA (genes) translate into proteins. For more detail look into the Translation and Transcription of DNA to proteins.

  3. Cell function is determined by which proteins are expressed and their quantity. This means that some kind of gene regulation must take place. Also one can argue if you know the amount of genes expressed in a cell you can possibly infer that cells function.



For a more specific overview, the following are some of the essential key points for biology and bioinformatics:

  1. Genome - all the DNA in a cell.

  2. DNA - a string of nucleic acids (i.e. GATCACTT…ATCG).

  3. Gene - a substring of DNA that encodes proteins.

  4. Proteins - a string of amino acids (i.e. ACDEF…RSTY).

  5. Gene expression is regulated by the product of other genes. It is a network of interactions.

  6. Post-translation modifications are an important regulation mechanism for gene expression.



You may notice that the above deals quite a bit with string manipulation, hence the strong emphasis for Perl experience in Bioinformatic job postings. You will find that string manipulation is not the only driving force for computer science in Bioinformatics. I will try to explain other topics in subsequent posts.

As for Biologists wanting to do Bioinformatics I can not provide the best advice since I didn't come into Bioinformatics from that direction but I would imagine that you may want to look into the following:

  1. Learn how to program. You want to know how to use a scripting language (preferably Perl) for smaller every day tasks and an object-oriented language such as C, C++, or Java for larger projects.

  2. Learn how to use databases. Bioinformatics deals with very large datasets. At some point your are going to have to deal with either retrieving information from databases or building your very own database so you might as well begin playing with them now.

  3. Install and run a Unix/Linux OS (Optional). This might be my personal bias but I believe if you are going to be working in Bioinformatics and its large data sets eventually you will find yourself either maintaining a server or SSHing into one so you might as well become familiar with that type of environment. At the very least XP users should install Cygwin.



Useful Links:

  • Bioinformatics intro offered at my university.
  • Graduate level of the Bioinformatics intro course.

  • Library of videos that cover a wide range of biological topics (theoretical and practical).

  • RT-PCR a common molecular biology method practiced in the lab.

  • Virtual lab which provides a virtual lab for non-biologists to actually work through basic molecular biologist techniques.



Finally I must say that I am far from an expert so any constructive suggestions to help clarify or expand the above is welcomed and appreciated.

No comments: