[bip] [OT] Revision control and databases

Brent Pedersen bpederse at gmail.com
Fri Oct 24 08:51:44 PDT 2008


On Fri, Oct 24, 2008 at 2:08 AM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
>
>
> On Thu, Oct 23, 2008 at 3:55 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>>
>> Giovanni Marco Dall'Olio wrote:
>>>
>>> Hi,
>>> I have a question (well, it's not directly related to biopython or pygr,
>>> but to scientific computing).
>>>
>>> I always used flat files to store results and data for my bioinformatics
>>> analys, but not (as I was saying in another thread) I would like to start
>>> using a database to do that.
>>
>> Of course Biopython's BioSQL interface may provide a starting point.
>
> The problem is that BioSQL doesn't support yet Population Genetics record
> (see another thread in biopython mailing list), so I would have to implement
> something like that in BioSQL or wait for the developers to do it.
> Maybe I will do this later, but now I don't have the time.
>
>>>
>>> The problem is I don't know if databases do Revision Control.
>>> When I used flat files, I was used to save all the results in a git
>>> repository, and, everytime something was changed or calculated again, I did
>>> commit it.
>>> Do you know how to do this with databases? Does MySQL provide support for
>>> revision control?
>>> Thanks :)
>>
>> I think you are asking the wrong questions because it depends on what you
>> want to do and what you actually store. There are a number of questions that
>> you need to ask yourself about what you really need to do (knowing you have
>> used git helps refine these). Examples include:
>> How often do you use the old versions in your git repository?
>> How do you use the old revisions in your git repository?
>> Do you even use the information of an older version if a newer version
>> exists?
>> How many users that can make changes?
>> How often do you have conflicts?
>> Are the conflicts hard to solve?
>
> These are all very good questions.
> The problem is that I consider revision control as a 'good practice': I
> remember that when I was not used to keep an history of the changes to my
> data, it was a mess. I would like to have at least a 'version' field, to
> know how much my data is old.
>
> I have found this :
> - http://pgfoundry.org/projects/tablelog/
> which seems interesting.
> I think this is a big issue for bioinformatics. How is it possible that
> nobody has never tried to implement such a functionality for databases?
> Version Control could be difficult to implement, but not so much. There is
> must be something that I can reuse...
>
>
>
>> Do you actually determine when 'something was changed or calculated again'
>> or it this partly determined by an external source like a Genbank or UniProt
>> update? (At least in a database approach you could automate this.)
>
> Well, it could be useful to
>
>>
>> Revision control may be overkill for your use because this is aims to
>> handle many tasks and change conflicts related to multiple users rather than
>> a single user.  If you don't need all these fancy features then you can use
>> a database. If you just want to store and retrieve a version then you can
>> use a database but you need to at least force the inclusion a date and
>> comment fields to be useful.
>
>
> Maybe there are other similar tools.
> This is a big issue for bioinformatics. I think it is a good, when working
> with
>
> Unfortunately I think revision control would be very useful for me.
> The data in the database will be used and uploaded by 4 or 5 people.
> It will be used also to store the results from some script:
>
>>
>>
>>
>> Regards
>> Bruce
>
>
> Thank you very much for all the replies.. I didn't expect so many of them.
>
>
> --
> -----------------------------------------------------------
>
> My Blog on Bioinformatics (italian): http://bioinfoblog.it
>
> _______________________________________________
> biology-in-python mailing list - bip at lists.idyll.org.
>
> See http://bio.scipy.org/ for our Wiki.
>

if you're willing to rely on an orm, then this is pretty nice:
http://elixir.ematia.de/apidocs/elixir.ext.versioned.html
it uses elixir, a declarative layer over sqlalchemy.
i havent used it much since sqlalchemy came out with its own
declarative layer, but it offers methods like get_as_of() revert_to(),
etc.


i've also just used git to hold up to 500MB sqlite db's. it feels
wrong and i make sparing commits of the db, but doesnt seem to be a
problem in terms of git itself.
-b



More information about the biology-in-python mailing list