Welcome to the exciting world of data migration!
Migrating your data into the Dataverse is a new chapter and clean slate for your business and can no doubt boost productivity, reporting, and user experience in your business applications. In this series, I will be using SharePoint as an example as it is the most common data source for the Power Platform, however, the process is very similar to most data sources. Before we embark on this journey we have to make sure this is the right path for your data.
If you have sensitive data or your data management is regulated, check out Microsoft's Trust Center and individualized industry documentation for details on HIPAA, GDPR, and government compliance standards.
Once your business data is in the Dataverse, you can take advantage of its robust data security, data integrity through granular data types and relationships, model-driven apps, Dataverse plugins or business rules for complex business logic, and more. As far as data storage goes, the Dataverse is flexible, but it can't do everything. Let's talk about the Common Data Model (CDM).
The CDM is a sort of Rosetta Stone for business data, putting everything in an identical structure so that there is no misinterpretation of values. It doesn't just make your data readable, it also acts as a guardian of data integrity. By using predefined schemas and relationships, you're essentially setting up a rulebook that your data has to follow. This ensures that you're not just tossing dirty, mismatched data into your system.
For example, if you have an Order table and a Customer table, the CDM would enforce that every order is associated with a customer (assuming it's a required field, the server validates the data as it is saved by checking the GUID of the Order to check for a valid Customer) through a LookUp field. So, not only does it make your data more understandable, but it also keeps it clean and reliable.
An example implementation of the CDM within Dataverse would be as follows:
Let's say you are a manufacturer and you want to track your suppliers along with their supplied raw materials, your products, your warehouses, and customers. All of these individual data points can be captured and recorded in the Dataverse. This data model would look like this:
In this example, we are using the standard Account table for Suppliers and Customers with the out-of-the-box Relationship Type choice field to indicate which account type the record is. We then have three custom tables: Material, Product, and Warehouse. The beauty of the CDM is that given the same outline of your data, most people would design your Dataverse tables in this exact way.
The relationships work in a way that we can now track all material and products to a warehouse as well as the supplier and customers for all this data. The relationships in Dataverse are built into the data, not simply an identifier that points to another data record through joins or merges. This is called a Lookup column. Relationships can be one-to-many or many-to-many. More information on relationship types can be found here.
Note that we are not storing granular data from producing products such as machine diagnostics, batch outputs, or uptime/downtime. This data does not typically live in the Dataverse, as it doesn't necessarily conform to the CDM.
Let's talk about the type of data that does not belong in the Dataverse
High-Volume Data: Think IoT devices spewing out real-time data points like temperature, humidity, or GPS coordinates. CDM may not be the best fit for this time-series data.
Binary Large Objects (BLOBs): Large multimedia files like videos, audio files, or even high-res images are not only ideal for Dataverse as there is a 128MB* limit to files stored in a Dataverse file column not to mention storing them there could get expensive at scale (see capacity documentation here). This data should be stored in SharePoint or OneDrive with a reference to the file in Dataverse.
Unstructured Data: Things like email text, meeting notes, or free-form notes. These aren't easily broken down into neat entities and attributes, these again should have a URL reference to these large abstract notes. Notes about a record (such as customer notes) that are specific to a record can be stored in the Dataverse.
Complex Data: Consider cases where the data is multidimensional arrays or matrices, like in some types of research data.
Real-Time Streaming Data: High-frequency trading data, real-time analytics, etc. are not suitable for the batch-oriented nature of most CDM implementations.
Specialized Formats: Industry-specific data formats like DICOM in healthcare or STEP files in manufacturing may not easily conform to CDM. These files can once again be stored in SharePoint or OneDrive with a link to the file.
There are two main table types we'll be discussing in this series: Standard and Activity. These types are selected under Advanced options when creating a custom table and determine the default columns and the behavior of the table.
One way of conceptualizing what belongs in a standard table is that they store the "What" of your data. In our manufacturing example above, we are storing the Product, Warehouse, and Material, but not the actions that bring those together such as production runs, transportation, labor, etc. Those actions, however, may be stored in other tables such as a Production Run with start/end timestamps, a Shipment with sent/received timestamps, or an Employee with clock-in/out timestamps related to a specific Production Line. Those can then be related back to our other data for reporting.
Activity tables can be used for the "How", "Why", and "When" of your data. Activities such as phone calls, emails, and appointments are available out of the box and can be used to explain or track decisions that lead to your data. Custom activity tables can be created to capture other relevant actions that should be stored and related to your data. Activity tables come with a different set of default columns such as the Subject (primary name of the record), start time, end time, and due date. These can be used for actionable items or capturing data specific to a point in time.
Tables contain not only columns, but also their interface objects such as Forms, Views, and Charts which are used to build the interface for the table data in model-driven apps. Tables also hold their metadata such as relationships to other tables, business rules that can validate data or automatically populate columns, and keys to enforce unique identifiers.
Beyond understanding the Common Data Model, we must understand our data as well as the business processes that utilize it. In my next article, we'll talk about planning the migration and creating migration documents. Through the rest of this series we will put that plan into action by creating the custom tables and relationships from our requirements, create a Dataflow that transforms our data, build the relationships after import, and then discuss security, training, and go-live strategy.
At this stage, you should have a general understanding of the Dataverse data structure. Spend some time familiarizing yourself with the core tables (Accounts, Contacts, Activities) as well as the default columns of a custom table. Begin meeting with stakeholders to identify all of the SharePoint lists you are migrating. Come back for part two where we'll begin mapping our data from SharePoint to Dataverse.
*While the API can handle files up to 10 GB in size, Power Apps client controls currently only support files up to 128 MB. Exceeding the 128 MB value when using these controls will result in errors uploading or downloading files. Source