Mapping the Genome of Physarum polycephalum
The world-wide Physarum community was delighted when, in August, 2004, the National Human Genome Research Institute announced that Physarum was one of 18 organisms selected for addition to the sequencing pipeline.For the NHGRI News Release, Click Here.
A Physarum Genome Coordinating Group has been formed to facilitate collaboration among interested workers. For a summary of the initial meeting of this Genome Coordinating Group, and an invitation to participate, Click Here.
This site is currently serving as a repository of "progress statements & information requests" from the Group; these appear below, in reverse chronological order. An initial report by Jonatha Gott, published in the 2004 Physarum Newsletter, may be downloaded by clicking here.
*********
November 1, 2006. Jonatha Gott files another update on the overall status of the Physarum Genome Project, asking for feedback, and the names of any others who wish to be included on her Email list. To download this update, as a WORD document, Click Here.
*********
June 26, 2006. Jonatha Gott files another update on the overall status of the Physarum Genome Project, asking for feedback. To download this update, as a WORD document, Click Here.
*********
April 28, 2006. Jonatha Gott files an additional update on the overall status of the Physarum Genome Project, focussing on some cost factors. To download this update, as a WORD document, Click Here.
*********
April 25, 2006. Jonatha Gott files an update on the overall status of the Physarum Genome Project. To download this update, as a WORD document, Click Here. To download the progress report by Gerard Pierron (referred to in the update) as a WORD document, Click Here.
*********
February 1, 2006. Jonatha Gott files an update on the overall status of the Physarum Genome Project. To download this update, as a WORD document, Click Here.
*********
February 1, 2006. Gerard Pierron posts his fourth progress report. To download this fourth report, as a WORD document, Click Here. To download the associated spreadsheet, Click Here.
*********
January 5, 2006. Gerard Pierron posts his third progress report. To download this third report, as a WORD document, Click Here.
*********
December 20, 2005. Ernst Werner reports that he "found Gerard's 'how to' instructions very useful. I successfully used them in the first try, a rare event with these programs! I found his progress reports very interesting!" To download Ernst Werner's report, as a WORD document, Click Here.
*********
December 12, 2005. Gerard Pierron posts his second progress report. To download this second report, as a WORD document, Click Here.
*********
December 1, 2005: Gerard Pierron has begun to work with the genomic data available. He proposes to post a series of progress reports. To download his first progress report, as a WORD document, Click Here.
*********
August 15, 2005 - Message from Jonatha Gott:
I could use some help on the web page. Any takers? Thanks, Jonatha
Message from Sandy Clifton:
I have not forgotten your request for access to the survey sequence
data. The person who can do that is on maternity leave, and she is only
on line periodically. I have contacted her again. I will let you know
as soon she sets up the data on a ftp site for access.
I have another favor to ask of you. We are working on our project
web pages. If you will go to
http://genome.wustl.edu/ISAgenome.cgi?GENOME=Physarum%20polycephalum
you will see that we have some text we are using as a placeholder until
we get the info that we really need. We are going to standardize the
format so that there are 3 sections: HABITAT, BIOLOGY, and SEQUENCING
PLANS. I can handle the sequencing plans (essentially that the survey
sequence is complete and the plan is being formulated to submit to the
NHGRI), but I it woudl be good if you or some of your colleagues, who
really know the organism, would write some text for the habitat and
biology sections. We are trying to keep the text to a single web page,
so that might be good to keep in mind.
Let me know if there is anyone whom you would like for me to contact
regarding this request.
*******************
August 4, 2005 - Message from Jonatha Gott:
This is a second request for information that is to be used in generating
the sequence plan for the Physarum genome project. Please take time to
respond to this message, as it may make the difference between having a
draft vs. a finished Physarum genome sequence.
IF YOU DON'T HAVE TIME TO SEND ME A DETAILED DESCRIPTION IMMEDIATELY,
PLEASE AT LEAST LET ME KNOW WHAT YOU DO HAVE SO THAT I CAN SEND THAT
INFORMATION TO SANDY CLIFTON - DETAILS CAN BE FILLED IN LATER.
Sandy has asked me to assemble a list of resources available within the
Physarum community that would be useful in generating as complete a picture
of the Physarum genome as possible. In particular, please let me know if
you (or anyone you know) have any of the following:
1. genetic map of Physarum
2. physical map of Physarum genome
3. libraries:
BAC or other libraries with large DNA fragments
genomic libraries
cDNA libraries
4. other resources that might be useful in genome assembly and gene annotation
*** If you do have any of the above, it would be particularly helpful if
you briefly described how it was generated (eg. life cycle stage, vector,
average insert size, etc.). Please note that the Wash U group may be
willing to sequence other available libraries as part of this project, so
this could potentially be a good way to assess the quality of your current
libraries at no charge! ***
For instance, I have Tim Burland's genomic libraries, one with inserts of
~1kb, the other with 1-5 kb inserts, made by Stratagene in lambda Zap, as
well as two of his cDNA libraries (ClonTech "capfinder"), one from
prophase, one from S phase. I have never used any of these, and would
appreciate hearing if anyone has experience with them.
My intent is to make as complete a list as possible for Sandy, and to make
this list available to the entire mailing list. If your reagents are not
yet published and you do not wish to "share" them, let me know and that
information will remain confidential. Ultimately, I would like to have a
section in our database/website that contains this information to
facilitate the sharing of valuable resources between labs. Any thoughts on
this? Again, please feel free to send this message to anyone that isn't
already on the mailing list.
*******************
June 21, 2005 - Jonatha Gott, forwarding a message from Rex Chishom:
I'm currently writing a letter in support of the Dictybase grant renewal
and thought I'd pass on some of Rex Chishom's thoughts on how the Physarum
database might be organized. I think that it will be a great
collaboration!
From Rex Chishom:
We have been thinking a lot about the compartive genomics possibilities.
What we are envisioning is first establishing a Physarum database that
basically looks like dictyBase but with different colors and logo. We
need to think of a name and register the appropriate domain name,
probably something like physarumgenome.org or maybe even physarum.org if
it is available. The database could be called "Physarum genome database
(PGD)" or anything else you guys like.
But, in addition, on each dictyBase gene page and on every physarum gene
page we'd like to integrate reciprocal links between orthologs/homologs.
Also we are implementing technology that would allow us to show regions
of synteny (if they exist) between Dicty and Physarum. We are also
toying with the idea of creating something like "AmoebaBase" that could
provide a single portal of access to dictyBase, Physarum, other
Dictyostelids (we have requested sequence of the related species) and
Acanthamoeba if the political issues can be resolved. These are just
some starting ideas. Obviously we seek your input as well as that of
anyone else who is interested.
Let me know how this sounds. In the meanwhile thanks for agreeing to provide a letter of collaboration. Also please let me know what I can do to help make the argument for finishing the Physarum sequence.
*******************
June 17, 2005 - Message from Jonatha Gott:
After my last, rather long, email, you may not want to hear from me again,
but please take time to respond to this message, as it may make the
difference between having a draft vs. a finished Physarum genome sequence.
Sandy Clifton has asked me to assemble a list of resources available within the
Physarum community that would be useful in generating as complete a picture
of the Physarum genome as possible. In particular, please let me know if
you (or anyone you know) have any of the following:
1. genetic map of Physarum
2. physical map of Physarum genome
3. libraries:
BAC or other libraries with large DNA fragments
genomic libraries
cDNA libraries
4. other resources that might be useful in genome assembly and gene annotation
*** If you do have any of the above, it would be particularly helpful if
you briefly described how it was generated (eg. life cycle stage, vector,
average insert size, etc.). Please note that the Wash U group may be
willing to sequence other available libraries as part of this project, so
this could potentially be a good way to assess the quality of your current
libraries at no charge! ***
For instance, I have Tim Burland's genomic libraries, one with inserts of
~1kb, the other with 1-5 kb inserts, made by Stratagene in lambda Zap, as
well as two of his cDNA libraries (ClonTech "capfinder"), one from
prophase, one from S phase. I have never used any of these, and would
appreciate hearing if anyone has experience with them.
My intent is to make as complete a list as possible for Sandy, and to make
this list available to the entire mailing list. If your reagents are not
yet published and you do not wish to "share" them, let me know and that
information will remain confidential. Ultimately, I would like to have a
section in our database/website that contains this information to
facilitate the sharing of valuable resources between labs. Any thoughts on
this? Again, please feel free to send this message to anyone that isn't
already on the mailing list.
*******************
June 17, 2005 - Message from Jonatha Gott:
Just spent about an hour on the phone with Sandy Clifton getting a
"translation" of information that I sent you earlier (copied below). She
answered my questions and the ones that Gerard had sent me, plus we talked
a bit about their plans and what happens next. Will try to summarize our
conversation below. Please read through this carefully and feel free to
send me comments, questions, additional information, etc.
**ALSO, I NEED EVERYONE'S HELP IN ASSEMBLING A RESOURCE LIST, which I will
discuss in my next email once I've put things in context.
SUMMARY:
What has been done: The original intent was to sequence 3 x 384 = 1152
clones in both directions, but in their experience, it is hard to assess a
genome project with so few reads. They discussed it with NHGRI and got the
go ahead to sequence more. Accordingly, DNA from 13,440 clones (i.e., 35
rather than 3 384 well plates) was sequenced from both ends. This process
is largely automated, so not all of them generated useful sequence
(cross-contamination, no DNA in some wells, vector sequence, etc.). Final
result after a couple of rounds of trimming and assessment (26880 -> 22367
-> 20780) was 20,780 trimmed traces containing roughly 14 million base
pairs, or nearly 10% of the genome. She wasn't sure of the details of the
cloning for this particular project, but expected the plasmids to contain
roughly 3-5 kb inserts. Each was sequenced from each end, generating
650-700 bp per trace. When they do the "real" sequencing, they will most
likely use fosmids with inserts in the 40 kb range, which are more useful
for assembling contigs, particularly since the known Tp1 retrotransposons
(see below) are ~8.9 kb in length.
GC content: The G+C content of these sequences was ~40%, which is similar
to the genome. However, since genes tend to be more GC rich than the
intervening sequences and repeat elements, they are still not sure if the
sequences they have are biased.
Repeats: Thus far they have seen ~8.84% simple repeats, which is higher
than they like. They were not aware of the retrotransposons that had been
reported in Physarum (nor was I), which Gerard alerted us to (see his
comments below). They will look for these as well and expect the number of
repeats to go up upon further analysis.
From Gerard Pierron: It is true that informations on repeated elements in
Physarum are missing in the form of a specific database. However, nothing
is mentionned about rDNA and also about Normann Hardman retrotransposons
like sequence Tp1 which is a 8.9kb element related to LTR retrotransposons
and appears to be a significant part of the repetitive DNA of Physarum. The
sequence is known acc. number : X53558. A second retrotransposon Tp2 is
also known (X52770). I wonder whether all the sequences were blasted or if
these retrotransposons correspond to the reported 12 LTR elements
ERV-classI that are mentionned for a total of 1087bp? (Sorry Gerard, I
forgot to ask this question specifically!)
Contamination screen: Looks quite good. Contaminating sequences included
4 general cloning vectors, 4 of their cloning vector, 4 E.coli sequences, 9
? (she thinks GBBCT are chloroplast sequences, but will check on that), and
28 mitochondrial sequences - not bad with 20,780 traces! Many thanks to
Gerard Pierron for providing such high quality DNA (he has more in the
freezer ready to send when they are ready for it).
Gene discovery: They haven't looked for individual genes yet, but have
received EST sequences and will be doing that next. Thanks to Mike Gray
and Gernot Gloeckner/Wolfgang Marwan for providing unpublished data for
this analysis. (NB: Expect both sets of data to be published reasonably
soon - still being analyzed)
Next steps:
1. compare with EST data
2. they prepare a sequencing plan to present to NHGRI in consultation with us
3. NHGRI decides how much they are willing to spend on the project and
either approves or modifies sequencing plan (no set time frame - this may
take a few months)
4. cloning, sequencing, assembly, and auto-finishing
5. hopefully "finishing" (i.e. actual human involvement)
Other items of note:
1. They would like to do a finished sequence. At a minimum can expect a
draft sequence with 6x coverage, but think it is likely that we will end up
with better than that.
2. Much of the process is automated, with computers doing most of the data
analysis, including making the calls regarding sequence quality, gap
filling, primer design, etc. If the data go through two rounds of
"auto-finishing", usually end up with very large contigs of high quality
sequence. Think it is reasonably likely that the Physarum genome will get
this far. The next (and last) level of finishing is the expensive part -
paying someone to sit in front of the computer to assess the sequence ~1000
bp at a time. The chances of this are considerably lower, but that's what
we're trying to work towards.
3. Sandy was very glad to hear that we plan to partner with Dictybase.
She felt that having a data sharing plan would make their case for a
finished sequence stronger.
4. Expected error rate in the range of 1 mistake per 100,000 bases,
potentially subtantially better.
Data availability:
1. Everything they do is public and all data will be made available
promptly.
The individual sequence traces are added to an archive that I believe
should be available to everyone through their website. Note that some of
the traces that were ultimately rejected may be included in this set (eg.
all 22367 traces that made the first cut may be in the archive). I'm not
sure when or how to access these raw data, but if anyone is interested, I'd
be happy to try to find out. You might check out their homepage first:
http://genome.wustl.edu/
2. Assemblies can be posted using fasta format on their ftp site so that
collaborators can use the data shortly after they are generated. I
envision ALL of us having access to that site. Please check the mailing
list and let me know if anyone is missing that should be on it; I
certainly don't want to exclude anyone that would be interested. MARK
ADELMAN, would you be willing to check my mailing list against your master
Physarum mailing list and let me know who is missing?
3. The final assembly is not made available until after publication.
However, she warned that lately bioinformaticists have been doing their own
analysis of the data in the individual traces and publishing first. They
would prefer to be the ones that publish their own data and would like to
work with experts in the field to write this up in a timely manner. Please
let me know if you'd like to be included in such an effort.
Finally, as I will address in a second email, the chances of getting a
finished sequence will be enhanced if there are other resources freely
available to them, such as genetic and/or physical maps of the genome,
other libraries of all types (both genomic and cDNA). BACs would be
particularly helpful. She has asked me to assemble a list of what is
available, who has it, etc.
Well, enough for now. Feedback welcome, as always.
*******************
June 15, 2005 - Message from Jonatha Gott:
Here's what I have from Sandy Clifton so far. I don't know what this means
for the genome project as yet, but thought I'd send it on in case any of
you were interested in seeing the raw numbers thus far. I hope to talk
with Sandy later this week, and will urge them to look at the EST data as
soon as possible. I'll write again once I've talked with her.
Want to keep the entire process open throughout the project; please feel
free to contact me with questions at any time and I'll try to get answers.
From Sandy Clifton:
I finally ran down the analysis results for the ~20K passed survey
sequence reads that we perforned. Note that the repeat content is 8.84
%, but this may be low, reflecting the difficulty in cloning A/T rich
regions (~60% A/T, so far). We might have to use special methods to try
to capture more of the A/T rich areas. Also note that this repeat count
reflects only simple repeats and areas of low complexity, since we do
not have a database of repeats specific for Physarum. We have not had a
chance to look at the ESTs yet. We had about 28 mitochondrial reads, as
well.
And here are the results for Physarum polycephalum:
Physarum polycephalum (slime mold):
-26880 traces- (20780 passed, trimmed traces were screened)
POAA-aaa01001
1) GC content- (based on phrap assembly contigs & singlets)
Physarum_polycephalum 40%
2) Repeat content- (based on phrap assembly contigs & singlets...then
used RepeatMasker -w -e)
NOTE: since we don't have specific repeat libraries, these stats
represent simple repeats and regions of low complexity
Physarum_polycephalum 8.84 % masked
==================================================
file name: contigs_and_singlets.fas
sequences: 22367
total length: 14035327 bp (14023066 bp excl N-runs
GC level: 40.15 %
bases masked: 1239673 bp ( 8.84 %)
==================================================
number of length percentage
elements* occupied of sequence
--------------------------------------------------
SINEs: 13 1557 bp 0.01 %
ALUs 0 0 bp 0.00 %
MIRs 13 1557 bp 0.01 %
LINEs: 51 9078 bp 0.06 %
LINE1 41 8222 bp 0.06 %
LINE2 6 659 bp 0.00 %
L3/CR1 2 102 bp 0.00 %
LTR elements: 12 1087 bp 0.01 %
MaLRs 0 0 bp 0.00 %
ERVL 0 0 bp 0.00 %
ERV_classI 12 1087 bp 0.01 %
ERV_classII 0 0 bp 0.00 %
DNA elements: 3 207 bp 0.00 %
MER1_type 2 164 bp 0.00 %
MER2_type 1 43 bp 0.00 %
Unclassified: 0 0 bp 0.00 %
Total interspersed repeats: 11929 bp 0.09 %
Small RNA: 13 906 bp 0.01 %
Satellites: 1 66 bp 0.00 %
Simple repeats: 6677 447325 bp 3.19 %
Low complexity: 12548 780477 bp 5.57 %
==================================================
3) Contamination screen- (trace flagged as having contaminent if alignment of 200bp at 90% id was found)
Physarum polycephalum (slime mold):
20780 passed, trimmed traces were screened
4 UNIVEC
4 POTW13
4 ECOLI
9 GBBCT
28 MITO (Physarum polycephalum complete mito genome)
*********
December, 2004. Jonatha Gott contributed an initial status report in the Physarum Newsletter, issue 36. To download Jonatha's report, as a WORD document, Click Here.
Back to PhysarumPlus HomePage
Last modified: Saturday, November 4, 2006