Aktualizace k
1. 4. 2010
Apr 1, 2010 We have begun to analyze the terabytes of results that have been generated through the generous efforts of the volunteers.
Now comes the difficult part of sifting through the data to find the best models. The folding algorithm is noise and there will be many inaccurate models. We need to find the best models from the almost 7 billion models generated. This should take approximately 3-6 months using our fastest methods. After identifying the most accurate models, we then will use the information to figure out what functions these proteins perform in the rice organism. This involves comparing the structure and sequence to known proteins and is also a time consuming process. The plant genomes are not nearly as well studied as the human and mammalian genomes which makes the process all the more difficult.
We are also developing faster and more accurate technologies to examine the data. As we have mentioned in the forums, a gpu-accelerated version of the simulation process has already been developed which is several orders of magnitude faster and more accurate. We have and are extending that technology to the analyses of the model structures. We have also developed sophisticated techniques that recognise structure and sequence patterns or signatures to identify the function of the protein.
We are applying for funding to support these and other efforts to analyze the mountain of data that has been generated during this process. We too are volunteers, and it is our hope that our combined efforts in the NRW project will help develop rice strains that will make a difference in fighting malnutrition and feeding the world’s people. Finally, as the project comes to an end, we want to thank everyone for their generous contributions to this endeavor, especially those that volunteered their computers and time to generate the data. We really appreciated it.
Tentative future plans are to resubmit an application to the IBM to apply the Protinfo algorithm to proteins encoded by 1000 plant transcriptomes generated by the 1KP Project. This work in progress. Thus the efforts of the WCG volunteers and the results of this study will have a broader impact beyond rice proteomics.
Sep 15, 2009 - Most of our efforts in the fast few months have been spent trying out to tease more domains from the rice protein/proteome to increase the size of the project. These domains have been packaged into work units and are now crunching. So we have raised the number of protein structure predictions from roughly 40,000 initially to about 65,000 when all the larger sequences have been processed. Of these, we have roughly 35,000 completed so we still have about 30,000 to go (so it looks as though we're about halfway done now).
The logic and goal here is that the more comprehensive picture of the individual protein domains in rice we have, the more we can use that to inform us about the structures of other unknown domains in rice as well as other food crops. That is, partial information is much better than zero information. This enables us to obtain a better understanding of the pathways involved at atomic level detail.