Generate representational datasets

Discussion in 'Databases' started by truster, May 28, 2009.

  1. #1
    Hey all,

    We have decided to assign the development of our commercial website to another company and they are asking for a copy of our databases in order to make some tests on performances.
    However, my boss doesn't want any data leak…
    I would like to generate datasets with a volume that epitomizes the amount of data within our databases.

    Thanks.
     
    truster, May 28, 2009 IP
  2. T.Guru

    T.Guru Peon

    Messages:
    78
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Hi,

    What i would propose is to do "data anonymising" : you keep the exact same volume of data in your database(s) but change some of the variables related to your data (and keep others).

    This variant allows you to get a database which can't be more representational than the original one, since you decide which changes to make depending on the confidentiality level wanted.

    I would suggest you use a free Open source ETL tool like Talend able to anonymise your data. Actually you will be able to do much more than data anonymisation since you will be working with a real ETL tool.

    Have a look at the active community able to solve problems and at the R&D team able to implement new features and fix bugs. The tool is called Talend Open Studio and is freely downloadable: http://www.talend.com/products-data-integration/talend-open-studio.php
     
    T.Guru, May 28, 2009 IP
  3. truster

    truster Peon

    Messages:
    10
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Hi. Thanks for your answer. I will have a look at the software you are giving me. The method sounds good.
     
    truster, Jun 1, 2009 IP