A startup called Gretel wants to build a “GitHub for data” so developers can safely access sensitive data.
Often, developers don’t need full access to a bank of user data — they just need a portion or a sample to work with. In many cases, developers could suffice with data that looks like real user data.
This so-called “synthetic data” is essentially artificial data that looks and works just like regular sensitive user data. Gretel uses machine learning to categorize the data — like names, addresses and other customer identifiers — and classify as many labels to the data as possible. Once that data is labeled, it can be applied access policies. Then, the platform applies differential privacy — a technique used to anonymize vast amounts of data — so that it’s no longer tied to customer information.
Check It Out: This Startup Wants to Build a “GitHub for Data”