In Memory Database is a fascinating field for a programmer, for an
architect, and for technology leaders. Let us try to understand in this
article, what is In-Memory DB and why are these so useful.
As a programmer,
designer or architect, whenever I work with traditional databases, one major
area to be conscious about is, that database access should be designed
carefully. This is because traditional database system store everything on
disk, and hence any access or update to this data could be one of the slowest
operations in application flow. Undoubtedly, a lot of research and improvements
have been done to improve the performance of disk based database systems with
improved storing, searching and retrieving logic. Still, disk access, being
mechanical operation, always have limitations. And this is one of the important
reasons to be fascinated about in-memory database systems.
Visualize a system
where all data (or all data required for curernt context) is sitting in memory.
Which means, we can expect a lot more efficient and faster access to data.
Database software can have much simpler logic to manage the data, as there is
no need to manage the loading/unloading of pages in memory. There is no need to
work on time taking locking mechanism to safeguard the data on disk, as access
in memory is much faster and chances of conflicts are extremly small which
change the way we design database system algorithms. It also means that we can
exploit the best available CPU power (which is increasing continuously) to process
the data available in memory, having no limitation of disk access. This could
also mean that there is no need of another layer of caching system. Rather,
database itself will be working like in-memory cache. There are many more
benefits if we design the system considering in-memory database system.
Let us understand
what is In-Memory Database System (IMDS). It is a DBMS which maintain all data
primarily in Main Memory. Data will be loaded in memory even if it is in GB or
TB. A few highlights:
• With 64 bits computer architecture, systems are capable to
address 16 EB of data (1 TB * 1000 * 1000). 82% of the enterprise application’s
databases are below 1 TB, growing with an average speed of 10% per year, which
means that In-Memory database systems can cater to most of the applications now
and in coming future as well.
• Does not need write or read to Disk, hence no dependency on
mechanical parts and their performance limitations. When all data is stored
in single address space, it reduces the complexity of storage algorithms, no
need now of loading and unloading the pages in memory
• Much faster than Disk based traditional DBMS. Having all data in
memory means that data is available at our finger tips, only microseconds or
nanoseconds away.
• And it supports ACID properties of Database, including D
(durability)
With above, it is
also important to understand that what IMDS is not, to break out of various
myths.
· No. IMDS is not
the traditional Disk based DBMS with just having all data loaded in memory (as
cache work). Rather internal design, algorithms are quite different and
hopefully much more improved than traditional database system leveraging upon
the flexibility to assume whole data in memory, and no disk access overhead.
Hence, of course, IMDS are not a caching technology, rather a full fledge DBM
solution
· IMDS are not
volatile, rather these can support ‘D’ of ACID with perfect durability and with
advantage of various flexible durability options.
· IMDS are not the
embedded database only. These work good for embedded applications by providing
small foot print, however, these are equally efficient and rather better in
some aspects for large time critical applications and can work in client – server
architecture.
· A common myth is
that IMDS may need a long time to populate the in memory store on startup.
Actually, it is not.
To have a better perspective of functioning and
technology, let us put few of the architectural attributes of IMDS.
• IMDS can work both as
Embedded Single Process or in Client Server Architecture model.
• IMDS
is Partition Aware. Normally, vertical partitioning is done by Normalization
kind of design strategies. IMDS supports horizontal partitioning by breaking
the table data by rows. For example, one of the criteria to break the rows can
be demographic data.
• IMDS
supports ‘Shared Nothing’ Architecture, which ensure Useful for high
availability and fault tolerant design.
• Mostly
better Data Structures have been used in IMDS, like T-Tree rather than B-Tree
• It
needs simpler concurrency control as locks need to be maintained for lesser
time having all data in memory.
• With
IMDS, data is being stored at various nodes with multiple copies of same data.
It enable scalable infrastructure model, as new nodes can be added easily, data
will be replicated to this new node as per data design.
• Disaster
recovery is also easy as multiple copies of data on different nodes is
available by design.
Considering
all of above, here are few of the advantages of IMDS:
· IMDS can provide
extremely fast transactions processing by overcoming limitation of traditional
database to read write data using mechanical operations on disk. As per POC
done by ‘McObject’, reads are 420 times faster and writes are 4 times faster
than disk based operations
· These are highly
scalable with horizontal and vertical scaling
· These can ensure
high availability with replicated data among multiple nodes
· Can also support
highly fault tolerant design with active – active, or active stand by strategies
· These provide
support for SQL Standards
· These do supports
most of the database connectivity standards i.e. JDBC etc
With so many good things to write, article
won’t be complete without mentioning few of the challenges with IMDS, which
need more research in coming years. These are:
• Durability
• Although
IMDB designs are already evolved enough to support durability, but these comes
at the cost of synchronizing some data with persistent storage using
checkpoints, transaction logging or some high volumn of data transfer across
the nodes
• Future
direction could be to improve SSD technologies (solid state drive) and use
these to store the data for durability purpose. These are slower than RAM, but
much faster than disk.
• One
more bit problem
• In
Memory data storage is limited by the total memory space computer architecture
can handle. As of now, it is limited to 16 EB. However, if data goes one bit
behind this limit, it will pose a challenge which is bound to come in future.
• SSD
or PCM (NVRAM) kind of technologies can be used to store data and would help to
avoid ‘one more bit’ kind of problems in future.
I hope this article would have helped you
to start with IMDS by giving an overview. Please refer to a detailed paper
published by us here.
Let us close the article by listing few of
the IMDS solution available in market to give you a reference to explore
further:
• Commercial
Solutions
• ExtremeDB
by McObject
• TimesTen
by Oracle
• SQLFire
by VMWare
• SolidDB
by IBM
• Hana
by SAP
• Big
Memory by Terracotta
• Altibase
HDB/XDB by Altibase
• Open
Source Solutions
• Memsql
• CSQL
• MonetDB
• H2Database
0 comments:
Post a Comment