Hey all, We have decided to assign the development of our commercial website to another company and they are asking for a copy of our databases in order to make some tests on performances. However, my boss doesn't want any data leak… I would like to generate datasets with a volume that epitomizes the amount of data within our databases. Thanks.
Hi, What i would propose is to do "data anonymising" : you keep the exact same volume of data in your database(s) but change some of the variables related to your data (and keep others). This variant allows you to get a database which can't be more representational than the original one, since you decide which changes to make depending on the confidentiality level wanted. I would suggest you use a free Open source ETL tool like Talend able to anonymise your data. Actually you will be able to do much more than data anonymisation since you will be working with a real ETL tool. Have a look at the active community able to solve problems and at the R&D team able to implement new features and fix bugs. The tool is called Talend Open Studio and is freely downloadable: http://www.talend.com/products-data-integration/talend-open-studio.php
Hi. Thanks for your answer. I will have a look at the software you are giving me. The method sounds good.