亚洲国产日韩欧美一区二区三区,精品亚洲国产成人av在线,国产99视频精品免视看7,99国产精品久久久久久久成人热,欧美日韩亚洲国产综合乱

Table of Contents
How to create a Hub table in SQL
How to connect multiple Hubs in a Link table
How to handle property changes in Satellite tables
Data loading and deduplication strategies
Home Database SQL SQL for Data Vault Modeling

SQL for Data Vault Modeling

Jul 29, 2025 am 02:35 AM

To create a Hub table in a Data Vault model using SQL, you must first define the field structure that contains a unique hash key, business ID, loading time, and source system, and ensure uniqueness and record loading time. For example: CREATE TABLE hub_customer (customer_hash_key CHAR(32) PRIMARY KEY, customer_id VARCHAR(50) NOT NULL, load_date DATE NOT NULL, record_source VARCHAR(255) NOT NULL); It is recommended to add an index for customer_id to improve query performance. The Link table is used to connect multiple Hubs. It must contain the hash keys, loading time and source system of the related Hubs, and combine multiple hash keys into primary keys, such as: CREATE TABLE link_order (order_hash_key CHAR(32), customer_hash_key CHAR(32), product_hash_key CHAR(32), load_date DATE NOT NULL, record_source VARCHAR(255) NOT NULL, PRIMARY KEY (order_hash_key, customer_hash_key, product_hash_key)); Each hash key should not be empty and the foreign key points to the corresponding Hub and does not store business attributes. The Satellite table is used to store descriptive attributes and change history of Hub or Link, such as: CREATE TABLE sat_customer_detail (customer_hash_key CHAR(32), load_date DATE NOT NULL, customer_name VARCHAR(100), address VARCHAR(255), phone VARCHAR(20), is_current BOOLEAN DEFAULT TRUE, PRIMARY KEY (customer_hash_key, load_date)); Track changes by inserting new records, and filtering the latest records using window functions. When data is loaded, you need to deduplicate. You can use NOT EXISTS or MERGE statements to unify the Hash function and optimize the deduplication efficiency.

SQL for Data Vault Modeling

If you are doing data warehouse modeling, or when you are exposed to data models that require flexible expansion and traceability, you may have heard of the Data Vault model. It is different from the traditional star/snowflake model, emphasizing flexibility, history tracking and scalability. SQL plays a very important role in it, not only for building core structures such as Hub, Link, and Satellite, but also for loading, maintaining and querying these objects.

SQL for Data Vault Modeling

Here are some common SQL usage scenarios and suggestions to help you better understand and apply Data Vault modeling.


How to create a Hub table in SQL

Hub is one of the cores of Data Vault, which stores unique identifiers of business entities. For example, customer ID, product ID, etc. The key point in creating a Hub table is to ensure uniqueness and record loading time.

SQL for Data Vault Modeling

A typical Hub table structure is as follows:

 CREATE TABLE hub_customer (
    customer_hash_key CHAR(32) PRIMARY KEY,
    customer_id VARCHAR(50) NOT NULL,
    load_date DATE NOT NULL,
    record_source VARCHAR(255) NOT NULL
);

Explain some points:

SQL for Data Vault Modeling
  • customer_hash_key is usually a unique key generated by hashing based on customer_id
  • load_date records the first load time, which facilitates subsequent tracking of changes
  • record_source tags the source system from which this record comes from

Suggestion: In practice, in order to improve query performance, you can consider adding an index to customer_id (although not a primary key).


The function of Link tables is to establish relationships between multiple Hubs. For example, if an order involves multiple entities such as customers, products, salesmen, etc., it needs to be linked with Link.

The key points of the design of Link table are:

  • Hash keys containing all related Hubs
  • No other attribute information is included
  • The primary key is composed of multiple hash keys

Sample SQL:

 CREATE TABLE link_order (
    order_hash_key CHAR(32),
    customer_hash_key CHAR(32),
    product_hash_key CHAR(32),
    load_date DATE NOT NULL,
    record_source VARCHAR(255) NOT NULL,
    PRIMARY KEY (order_hash_key, customer_hash_key, product_hash_key)
);

Pay attention to a few details:

  • Each hash key must be non-empty and the foreign key points to the corresponding Hub table
  • The Link table itself does not have business attributes, and is only responsible for the connection relationship.

Common misunderstandings: Sometimes some factual attributes are placed in the Link table, which violates the principles of Data Vault and should be placed in Satellite.


How to handle property changes in Satellite tables

Satellite stores descriptive attributes of Hub or Link, such as customer name, address, phone number, etc., as well as the history of changes in these attributes.

A basic Satellite table structure is as follows:

 CREATE TABLE sat_customer_detail (
    customer_hash_key CHAR(32),
    load_date DATE NOT NULL,
    customer_name VARCHAR(100),
    address VARCHAR(255),
    phone VARCHAR(20),
    is_current BOOLEAN DEFAULT TRUE,
    PRIMARY KEY (customer_hash_key, load_date)
);

Key points:

  • Every time the attribute changes, a new record is inserted
  • is_current field marks the current latest record
  • The latest records can be filtered through window functions

For example, if you want to obtain currently valid customer information:

 SELECT *
FROM (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY customer_hash_key ORDER BY load_date DESC) AS rn
    FROM sat_customer_detail
) sub
WHERE rn = 1 AND is_current = TRUE;

Recommendation: Some systems will automatically manage the is_current field at the ETL layer, but manual control is more flexible, especially during the debugging phase.


Data loading and deduplication strategies

One of the difficulties of the Data Vault model is the deduplication problem during data loading. Because every time you load, you have to determine whether there is a record, and avoid repeated insertion.

A common practice is to check whether there is a hash key before loading:

 INSERT INTO hub_customer (...)
SELECT ...
FROM source_table s
WHERE NOT EXISTS (
    SELECT 1
    FROM hub_customer h
    WHERE h.customer_hash_key = HASH(s.customer_id)
);

You can also use the MERGE (or UPSERT) statement, depending on your database support:

 MERGE INTO hub_customer AS target
USING (
    SELECT HASH(customer_id) AS hash_key, ...
    FROM source_table
) AS source
ON target.customer_hash_key = source.hash_key
WHEN NOT MATCHED THEN
    INSERT (...);

Notice:

  • The selection of Hash functions should be unified, otherwise it is easy to cause data errors.
  • The deduplication logic should be as efficient as possible, especially under large data volumes

Tips: You can aggregate the data once in the staging layer, reduce duplicates and then write them to the target table.


Basically that's it. The application of SQL in Data Vault is not just a simple table building and querying, but runs through the entire process of modeling, loading and maintenance. Understanding the responsibilities of each object and using SQL techniques reasonably can make your data model both stable and efficient.

The above is the detailed content of SQL for Data Vault Modeling. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

PHP Tutorial
1488
72
Defining Database Schemas with SQL CREATE TABLE Statements Defining Database Schemas with SQL CREATE TABLE Statements Jul 05, 2025 am 01:55 AM

In database design, use the CREATETABLE statement to define table structures and constraints to ensure data integrity. 1. Each table needs to specify the field, data type and primary key, such as user_idINTPRIMARYKEY; 2. Add NOTNULL, UNIQUE, DEFAULT and other constraints to improve data consistency, such as emailVARCHAR(255)NOTNULLUNIQUE; 3. Use FOREIGNKEY to establish the relationship between tables, such as orders table references the primary key of the users table through user_id.

Key Differences Between SQL Functions and Stored Procedures. Key Differences Between SQL Functions and Stored Procedures. Jul 05, 2025 am 01:38 AM

SQLfunctionsandstoredproceduresdifferinpurpose,returnbehavior,callingcontext,andsecurity.1.Functionsreturnasinglevalueortableandareusedforcomputationswithinqueries,whileproceduresperformcomplexoperationsanddatamodifications.2.Functionsmustreturnavalu

Using SQL LAG and LEAD functions for time-series analysis. Using SQL LAG and LEAD functions for time-series analysis. Jul 05, 2025 am 01:34 AM

LAG and LEAD in SQL are window functions used to compare the current row with the previous row data. 1. LAG (column, offset, default) is used to obtain the data of the offset line before the current line. The default value is 1. If there is no previous line, the default is returned; 2. LEAD (column, offset, default) is used to obtain the subsequent line. They are often used in time series analysis, such as calculating sales changes, user behavior intervals, etc. For example, obtain the sales of the previous day through LAG (sales, 1, 0) and calculate the difference and growth rate; obtain the next visit time through LEAD (visit_date) and calculate the number of days between them in combination with DATEDIFF;

How to find columns with a specific name in a SQL database? How to find columns with a specific name in a SQL database? Jul 07, 2025 am 02:08 AM

To find columns with specific names in SQL databases, it can be achieved through system information schema or the database comes with its own metadata table. 1. Use INFORMATION_SCHEMA.COLUMNS query is suitable for most SQL databases, such as MySQL, PostgreSQL and SQLServer, and matches through SELECTTABLE_NAME, COLUMN_NAME and combined with WHERECOLUMN_NAMELIKE or =; 2. Specific databases can query system tables or views, such as SQLServer uses sys.columns to combine sys.tables for JOIN query, PostgreSQL can be used through inf

How to create a user and grant permissions in SQL How to create a user and grant permissions in SQL Jul 05, 2025 am 01:51 AM

Create a user using the CREATEUSER command, for example, MySQL: CREATEUSER'new_user'@'host'IDENTIFIEDBY'password'; PostgreSQL: CREATEUSERnew_userWITHPASSWORD'password'; 2. Grant permission to use the GRANT command, such as GRANTSELECTONdatabase_name.TO'new_user'@'host'; 3. Revoke permission to use the REVOKE command, such as REVOKEDELETEONdatabase_name.FROM'new_user

What is the SQL LIKE Operator and How Do I Use It Effectively? What is the SQL LIKE Operator and How Do I Use It Effectively? Jul 05, 2025 am 01:18 AM

TheSQLLIKEoperatorisusedforpatternmatchinginSQLqueries,allowingsearchesforspecifiedpatternsincolumns.Ituseswildcardslike'%'forzeroormorecharactersand'_'forasinglecharacter.Here'showtouseiteffectively:1)UseLIKEwithwildcardstofindpatterns,e.g.,'J%'forn

How to backup and restore a SQL database How to backup and restore a SQL database Jul 06, 2025 am 01:04 AM

Backing up and restoring SQL databases is a key operation to prevent data loss and system failure. 1. Use SSMS to visually back up the database, select complete and differential backup types and set a secure path; 2. Use T-SQL commands to achieve flexible backups, supporting automation and remote execution; 3. Recovering the database can be completed through SSMS or RESTOREDATABASE commands, and use WITHREPLACE and SINGLE_USER modes if necessary; 4. Pay attention to permission configuration, path access, avoid overwriting the production environment and verifying backup integrity. Mastering these methods can effectively ensure data security and business continuity.

Explain the Distinction Between a SQL Schema and a Database. Explain the Distinction Between a SQL Schema and a Database. Jul 05, 2025 am 01:31 AM

OK, please provide the article content that needs a summary.

See all articles