Tuesday, October 31, 2006

American Dr. Frankensteins

Today in my Scientific Integrity classed we discussed the ethics behind human experimentation.  I thought it was interesting that we happened to talk about this on Halloween so I decided to share a specific topic, the Tuskegee Experiment.

In 1972 a clinical study was conducted in Tuskegee, Albama, concerning the disease syphilis.  The study was done on African American sharecroppers to examine the effects of syphilis on different ethnicities.  For their participation these sharecroppers were assured a free treatment of mercury, which was of course was toxic but was the only available treatment for syphilis at the time.

By 1947, penecilin became the non-toxic treetment for syphilis.  Knowing this fact, the scientists of the study still decided to press on with the toxic mercury as a treatment.  They even went as far as preventing local hospitals from treating the sharecroppers with penecilin, arguing that an alternative treetment would ruin all their previous data and study.

This study was finally terminated in 1972 not by ethical or moral consideration but by a leak to the press.

Well, that's my scary story for the evening. Happy Halloween.

Disclaimer: I have nothing against Doctors and admire the work they have to go through to become clinicians. In fact my girlfriend is a medical student and I believe she is going to be a great doctor because she is ethically sound. I just don't approve of the character of some medical students and find the thought of them handling people's lives in the future to be rather disturbing.

Monday, October 30, 2006

Average CEL Perl Script

Today I wrote a Perl script to average out Affymetrix .cel files. In the middle of my Perl hacking I ran into some issues concerning Affymetrix's file format.

In a .cel file, under a section marked [INTENSITY] Affymetrix stores a probe intensity on each row. In the row you have your x-coordinate, y-coordinate, mean (aka intensity), standard deviation, and number of pixels.

Originally, I tried using the split function to split the data on a line in their intensity section. What I tried was as follows:
($dummy, $dummy, $dummy, $total_intensity[$i], , ) =
split(/\s+/, $line);

Notice there are three dummy variables that lead prior to the actual intensity value capture. I did this because I wanted to account for the space prior to the coordinate values. So this split function actually splits the line into six variables instead of the five mentioned above to account for the space. This actually caused me a problem because once you get to coordinates three digits long (i.e. y=100) that leading space is no longer there. What I ended up doing was creating a regular expression instead as follows:
if ( $line =~
m/\s*(\d+)\s+(\d+)\s+(\d+.\d+)/g){$total_intensity[$i]= $3;}

I know this is a "hack" but it works. I'm sure there is a way to get rid of the leading spaces but none came to mind. I tried the chomp function but all this does is get rid of trailing new lines. Does anyone have an idea how to get rid of these leading spaces?