Výsledky projektu

Odpovědět
Zpráva
Autor
Uživatelský avatar
krahulik
52.6315789474 %
52.6315789474 %
Příspěvky: 1219
Registrován: úte 09 led, 2007 10:33

Re: Výsledky projektu

#1 Příspěvek od krahulik »

Progress reporty a stav projektu naleznete zde.
Apr 8, 2009 - We have begun to analyze the protein models generated by Nutritious Rice for the World volunteers. The next step is to use sophisticated methods to select the top protein models for each gene. This will let us focus on a more manegable number of protein structures from the billions generated so far. Rice proteins are very different from what has been previously studied and only 1% of the proteins we're working on have segments which are significantly similar to proteins of known structure. That is why computer modeling is necessary and why this project is important. It also means that we have a lot of hard work ahead of us still!

In general, when proteins have similar amino acid sequences, they also have similar structures. The small number of cases where at least part of the protein sequence is similar to one where the structure is known are thus very useful. We have a good idea what those regions of the protein structure model should look like and this allows us to optimize and validate the tools that we use to pick the best models. That is what we are currently doing. Once we finish this, we will start processing the data and publish the best structures for each gene online.
Nov 12, 2008 - We continue to receive excellent results from you! Storing the predicted structures of 100 proteins requires about 10GB of bz2 compressed files. So far we have amassed over a terabyte of this data, and there's a lot more to be done. We are in the process of making room for storing this, and adjusting our clustering code to deal with this large number of results. Stay tuned for more as this develops.

The National Science Foundation, a significant source of funding for us, and World Community Grid separately interviewed us about our research and this project. Take a look!

In addition, we've modified the project status image above to reflect progress in the form of an animation. Each frame represents a moment in time when some significant number of workunits was submitted or results were returned. It gives you an idea for how things are moving along.
August 28, 2008 - Since the project began, you've sampled a space of about 3 billion potential structures for each protein and been credited for turning in the best ones. We've received structures for 6800 proteins so far -- that's over a billion structures. There are about 40,000 proteins to generate structures for, so we've got a while to go still!

While you continue to generate these structures, we'll be looking through them in more detail using clustering techniques. This will reveal to us those structures that resemble real proteins. We've applied this iterative process at smaller scales with success, and this larger pool of data for clustering will improve the accuracy of identifying good structure predictions.

And yes, the status image has been updated. :)
Obrázek

Uživatelský avatar
krahulik
52.6315789474 %
52.6315789474 %
Příspěvky: 1219
Registrován: úte 09 led, 2007 10:33

Re: Výsledky projektu

#2 Příspěvek od krahulik »

Aktualizace k 1. 4. 2010
Apr 1, 2010 We have begun to analyze the terabytes of results that have been generated through the generous efforts of the volunteers.

Now comes the difficult part of sifting through the data to find the best models. The folding algorithm is noise and there will be many inaccurate models. We need to find the best models from the almost 7 billion models generated. This should take approximately 3-6 months using our fastest methods. After identifying the most accurate models, we then will use the information to figure out what functions these proteins perform in the rice organism. This involves comparing the structure and sequence to known proteins and is also a time consuming process. The plant genomes are not nearly as well studied as the human and mammalian genomes which makes the process all the more difficult.

We are also developing faster and more accurate technologies to examine the data. As we have mentioned in the forums, a gpu-accelerated version of the simulation process has already been developed which is several orders of magnitude faster and more accurate. We have and are extending that technology to the analyses of the model structures. We have also developed sophisticated techniques that recognise structure and sequence patterns or signatures to identify the function of the protein.

We are applying for funding to support these and other efforts to analyze the mountain of data that has been generated during this process. We too are volunteers, and it is our hope that our combined efforts in the NRW project will help develop rice strains that will make a difference in fighting malnutrition and feeding the world’s people. Finally, as the project comes to an end, we want to thank everyone for their generous contributions to this endeavor, especially those that volunteered their computers and time to generate the data. We really appreciated it.

Tentative future plans are to resubmit an application to the IBM to apply the Protinfo algorithm to proteins encoded by 1000 plant transcriptomes generated by the 1KP Project. This work in progress. Thus the efforts of the WCG volunteers and the results of this study will have a broader impact beyond rice proteomics.
Sep 15, 2009 - Most of our efforts in the fast few months have been spent trying out to tease more domains from the rice protein/proteome to increase the size of the project. These domains have been packaged into work units and are now crunching. So we have raised the number of protein structure predictions from roughly 40,000 initially to about 65,000 when all the larger sequences have been processed. Of these, we have roughly 35,000 completed so we still have about 30,000 to go (so it looks as though we're about halfway done now).

The logic and goal here is that the more comprehensive picture of the individual protein domains in rice we have, the more we can use that to inform us about the structures of other unknown domains in rice as well as other food crops. That is, partial information is much better than zero information. This enables us to obtain a better understanding of the pathways involved at atomic level detail.
Obrázek

Uživatelský avatar
krahulik
52.6315789474 %
52.6315789474 %
Příspěvky: 1219
Registrován: úte 09 led, 2007 10:33

Re: Výsledky projektu

#3 Příspěvek od krahulik »

Byl publikován článek, který se zabývá vytvořeným postupem pro zrychlení zpracování výsledků projektu pod WCG (postprocessing). Úlohu v tom hraje i využití GPU (ATI) ;-)

Článek dostupný zde nebo v přiloženém souboru.
1756-0500-4-97.pdf
(338.3 KiB) Staženo 327 x
Obrázek

Odpovědět

Zpět na „Nutritious Rice for the World - World Community Grid“