누눕's blog

Hbase Data Model 본문

IT 기록/Hadoop

Hbase Data Model

누워서눕기 2020. 3. 25. 11:53

Data Model

In HBase, data is stored in tables, which have rows and columns. This is a terminology overlap with relational databases (RDBMSs), but this is not a helpful analogy. Instead, it can be helpful to think of an HBase table as a multi-dimensional map.

HBase Data Model Terminology

 

Table

An HBase table consists of multiple rows.

 

Row

A row in HBase consists of a row key and one or more columns with values associated with them. Rows are sorted alphabetically by the row key as they are stored. For this reason, the design of the row key is very important. The goal is to store data in such a way that related rows are near each other. A common row key pattern is a website domain. If your row keys are domains, you should probably store them in reverse (org.apache.www, org.apache.mail, org.apache.jira). This way, all of the Apache domains are near each other in the table, rather than being spread out based on the first letter of the subdomain.

 

Column

A column in HBase consists of a column family and a column qualifier, which are delimited by a : (colon) character.

 

Column Family

Column families physically colocate a set of columns and their values, often for performance reasons. Each column family has a set of storage properties, such as whether its values should be cached in memory, how its data is compressed or its row keys are encoded, and others. Each row in a table has the same column families, though a given row might not store anything in a given column family.

 

Column Qualifier

A column qualifier is added to a column family to provide the index for a given piece of data. Given a column family content, a column qualifier might be content:html, and another might be content:pdf. Though column families are fixed at table creation, column qualifiers are mutable and may differ greatly between rows.

 

Cell

A cell is a combination of row, column family, and column qualifier, and contains a value and a timestamp, which represents the value’s version.

 

Timestamp

A timestamp is written alongside each value, and is the identifier for a given version of a value. By default, the timestamp represents the time on the RegionServer when the data was written, but you can specify a different timestamp value when you put data into the cell.

 

참조

http://hbase.apache.org/book.html#datamodel

 

Apache HBase ™ Reference Guide

Data comes in many sizes, and saving all of your data in HBase, including binary data such as images and documents, is ideal. While HBase can technically handle binary objects with cells that are larger than 100 KB in size, HBase’s normal read and write pa

hbase.apache.org

 

'IT 기록 > Hadoop' 카테고리의 다른 글

하둡(Hadoop) 설치  (0) 2020.03.31
HBase 명령어 모음  (0) 2020.03.26