FLEXIBLE INTEROPERABLE DATA PACKAGES
INFINITE POTENTIAL: THE DATA REVOLUTION
A PARADIGM SHIFT
DATA: REIMAGINED
If you can't find the data, you can't reproduce the analysis.
Why build data packages?
Data is more powerful with context
Data without context is not reusable
LINKED DATA IS REUSABLE DATA
What are Data Packages
Not only do data packages keep track of their own versions, they keep track of the versions of underlying data as well, giving teams the ability to review every version of every document contained in a data package. Teams can grab a version of a package and run it through pipelines to test repeatability, or can link a historical package to their colleague to ensure they’re iterating on the same data.
Data Packages Defined
Data with context
Data packages offer a streamlined approach to data management by storing both data and metadata together in object storage. This integrated method contrasts with traditional practices where metadata and versioning information are stored separately in databases, while the actual data resides in object stores. By consolidating data and its contextual information within the same storage unit, data packages enhance the integrity and coherence of data management, ensuring that context and content are always aligned and readily accessible.
Deeply Versioned
One of the significant advantages of data packages is their support for deep versioning, which is facilitated by the use of SHA-256 checksums for each revision. This cryptographic hash function ensures the integrity of data by providing a unique identifier for every version, allowing users to track changes, verify data accuracy, and revert to previous versions if necessary. This robust versioning capability enhances data reliability and facilitates meticulous data management across different iterations.
Flexible Metadata
Data packages excel in accommodating diverse metadata needs through their flexible metadata schema. Unlike rigid systems that enforce a single, uniform metadata schema across all datasets, data packages allow teams to capture and manage metadata in a way that best suits their specific requirements. This flexibility ensures that relevant details are preserved and easily accessible without the need for a one-size-fits-all approach, thus supporting a wide range of data use cases and applications.
Interoperable Data
The interoperability of data packages is a crucial feature, as it allows seamless integration and sharing of data across different platforms. By storing data in an open-source, customer-owned data storage system, data packages enable the attachment of data from various platforms (Platform A and Platform B) while maintaining compatibility. This open approach ensures that data can be effectively utilized across diverse systems and environments, fostering greater collaboration and data exchange.
Easily Accessed
Data packages are designed to be easily accessible, with features like SQL querying and faceted search to enhance data retrieval. These functionalities allow teams to efficiently locate and extract the data packages they need, regardless of the cloud environment they are using. The ability to perform advanced searches and queries ensures that users can quickly find relevant datasets and integrate them into their workflows, significantly improving data accessibility and usability.