Saturday 26 October 2013

Database Sharding

DATABASE SHARDING 


=) Database Sharding -> Share Nothing partitioning scheme for large database. 
=) Breaking your database into small chunks called and spreading them across a number of distributed servers.
=) Vertical scaling*adds more CPU and storage resources to increase capacity. Scaling by adding capacity has limitations: high performance systems with large numbers of CPUs and large amount of RAM are disproportionately more expensive than smaller systems.



=) Acc to Wikipedia Sharding refers to Horizontal Partioning . Which Means keeping the rows of database in separate table rather then creating a separate column. As table are distributed and kept in separate db total no of rows in table are reduced.
=) Concept : take a large database, and break it into a number of smaller databases across servers.
=) Improved Scalablity.

The general purpose database requirements that are a fit for sharding include:
  • High-transaction database applications
  • Mixed workload database usage
    • Frequent reads, including complex queries and joins
    • Write-intensive transactions (CRUD statements, including INSERT, UPDATE, DELETE)
    • Contention for common tables and/or rows
  • General Business Reporting
    • Typical "repelating segment" report generation
    • Some data analysis (mixed with other workloads)
* Steps to determine if we need sharding :
1)Identify Transection intensive tables.
2)Determine volume of Txn
3)Determine Volume associated with different Squeries : Select,Insert,Update
4)Understand Hierarchy
5)Understand Schema : to check if evenly Spread.






In the Bookstore example, the Primary Shard Table is the 'customer' entity. This is the table that is used to shard the data. The 'customer' table is the parent of the shard hierarchy, with the 'customer_order' and 'order_item' entities as child tables. The data is sharded by the 'customer.id' attribute, and all related rows in the child tables associated with a given 'customer.id' are sharded as well. The Global Tables are the common lookup tables, which have relatively low activity, and these tables are replicated to all shards to avoid cross-shard joins.

Another general example  :
Suppose you have an application where no of user counts is huge around 10 users per sec or even more .. Due to single entity update rate too fast and so some serialised write might stack up and timeout.
How to reduce contention :
Break the counters into N counters
when you want to increment a counter you pick one of the shard at random and increment. If you want to know total counter add all values.




Reference : 1)MangoDB
2)Wikipedia
3)Codefeatures


No comments:

Post a Comment