Tuesday 29 October 2013

F1 : A Distributed Database

 F1 : A distributed Database


The paper talks about Google’s F1. F1  is a  a massive relational database management system, that helps run company’s online ad system .Though the paper calls the database Hybrid but I feel it is much more relational than NoSQL-style . The System is built  atop spanner and adds in new key features, including distributed SQL queries, transactionally consistent secondary indexes, asynchronous schema changes including database reorganizations, optimistic transactions, and automatic change history recording and publishing. The paper is beautifully written and maintains the flow. As goggles Full Adword business is running on F1 so I feel its surely  is a powerful System.
Though technical issues remains ,as with new technology but the future seams quiet upbeat for the new Database.

Summery :

Google's F1 is a distributed relational Database Management System that  helps run companies online Ad systems.F1 combines high Availability and Scalability of NoSQL systems as well as Consistency and Usability of Traditional SQL Database . F1 is built atop Scanner.f1 is fault tolerant ,globally distributed and supports both OLAP and OLTP.F1 has Database schema similar to RDBMS with some extensions like support for hierarchical data , protocol buffer and both local and global Indexing.Paper Highlights the problems with the traditional ORM tools and uses a new stripped down API avoiding antipatterns of ORM . F1 supports both NoSQL and SQL interface giving flexibility to user to select the one suitable for his application. Spanner stores records and Data and F1 gives you access to those records through query processing. In short F1 is a Powerful System that combines the best of both worlds(SQL and NoSQl) and remove the negative.
Some Points :
1)Global Indexing cause Scaling problem for large Transections.
2)There is no mention of security .
3)Single rows usually avoids 2PC . What happens if Connection is lost .

Saturday 26 October 2013

Database Sharding

DATABASE SHARDING 


=) Database Sharding -> Share Nothing partitioning scheme for large database. 
=) Breaking your database into small chunks called and spreading them across a number of distributed servers.
=) Vertical scaling*adds more CPU and storage resources to increase capacity. Scaling by adding capacity has limitations: high performance systems with large numbers of CPUs and large amount of RAM are disproportionately more expensive than smaller systems.



=) Acc to Wikipedia Sharding refers to Horizontal Partioning . Which Means keeping the rows of database in separate table rather then creating a separate column. As table are distributed and kept in separate db total no of rows in table are reduced.
=) Concept : take a large database, and break it into a number of smaller databases across servers.
=) Improved Scalablity.

The general purpose database requirements that are a fit for sharding include:
  • High-transaction database applications
  • Mixed workload database usage
    • Frequent reads, including complex queries and joins
    • Write-intensive transactions (CRUD statements, including INSERT, UPDATE, DELETE)
    • Contention for common tables and/or rows
  • General Business Reporting
    • Typical "repelating segment" report generation
    • Some data analysis (mixed with other workloads)
* Steps to determine if we need sharding :
1)Identify Transection intensive tables.
2)Determine volume of Txn
3)Determine Volume associated with different Squeries : Select,Insert,Update
4)Understand Hierarchy
5)Understand Schema : to check if evenly Spread.






In the Bookstore example, the Primary Shard Table is the 'customer' entity. This is the table that is used to shard the data. The 'customer' table is the parent of the shard hierarchy, with the 'customer_order' and 'order_item' entities as child tables. The data is sharded by the 'customer.id' attribute, and all related rows in the child tables associated with a given 'customer.id' are sharded as well. The Global Tables are the common lookup tables, which have relatively low activity, and these tables are replicated to all shards to avoid cross-shard joins.

Another general example  :
Suppose you have an application where no of user counts is huge around 10 users per sec or even more .. Due to single entity update rate too fast and so some serialised write might stack up and timeout.
How to reduce contention :
Break the counters into N counters
when you want to increment a counter you pick one of the shard at random and increment. If you want to know total counter add all values.




Reference : 1)MangoDB
2)Wikipedia
3)Codefeatures


Saturday 19 October 2013

Eventual Consistency


What i got from the Post by Werner Vogels on Consisency

Distribution Transparency : To the user of the system it appears that there is only one system.

Consistency : Client Point of View
4 components :
1)Storage System
2)Process A : A process that writes to and reads from a Storage System
3)Process B&C : 2 process indepandent of process A . Reads and Writes to SS.
Client Side consistency is how and when client sees update made to data object in storage System.
If A updates
1)Strong Consistency : Once update is compleate any subsequent access will return updated value.
2)Weak Consistency : The system does not guarantee that subsequent accesses will return the updated value. A number of conditions need to be met before the value will be returned. Often this condition is the passing of time. The period between the update and the moment when it is guaranteed that any observer will always see the updated value is dubbed the inconsistency window.
3)Eventual Consistency: The storage system guarantees that if no new updates are made to the object eventually (after the inconsistency window closes) all accesses will return the last updated value. The most popular system that implements eventual consistency is DNS, the domain name system. Updates to a name are distributed according to a configured pattern and in combination with time controlled caches, eventually of client will see the update.

Server Side :
N : No of nodes that store replica of data
W:No of node that needs to ack the reciept of update before the update is received
R:The no of nodes that are contacted when data abject accessed through read operation
If W+R > N
Problem here is if 3 nodes are write and 2 fails then system has to fail the write as otherwise system will become unavailable.

In distributed storage systems that need to address high-performance and high-availability the number of replicas is in general higher than 2. Systems that focus solely on fault-tolerance often use N=3 (with W=2 and R=2 configurations). Systems that need to serve very high read loads often replicate their data beyond what is required for fault-tolerance, where N can be tens or even hundreds of nodes and with R configured to 1 such that a single read will return a result. For systems that are concerned about consistency they set W=N for updates.

R=1 W=N optimise Read
R=N W=1 very fast write

Thursday 17 October 2013

Dynamo: Amazon's Highly Available Key-Value Store

Justification :
This paper talks about Amazons new Key/Value storage system called Dynamo.  The author compares Dynamo with many of the present systems and then shows how Dynamo is different from the rest .Primary theme of paper was tradeoff between availability and consistency . Amazon has chosen an always on  eventually consistent model.The paper leverages several techniques such as consistent hashing ,vector clock ,hinted handoff,quorum systems  at a brisk pace. Having been deployed and run at a varied application  environment at Amazon I feel Dynamo is highly successful and meets essentially all the requirements and expectations it was intended to meet.

Summery :
Dynamo is highly available highly scalable key value store.In order to provide In order to provide high availability, the system adopts an eventual consistency model and leverages several techniques like consistent hashing for load balancing , preference list for data replication, hinted handoff for availability , merkel tree for synchronization and vector clock.Along with this the paper also provides a performance analysis of real system application for various issues like data replication and load balancing.

Pros :
a)The paper gives insight into how various different techniques can be combined to form a complex system.
b)Dynamo is a very practicable system.Has a lot of application for real world systems.
c)Dynamo is highly scalable systems following always on ,eventually consistent model.
d)Systems is optimized for 99.9th percentile not average.
Always writable do better customer experience

Cons :
a)ACID properties compromised.
b)There is no security in system and all the nodes assumed to be trusted.
c)In order to make vector clock scalable author truncates oldest value resulting in reconciliation problems which author ignores saying its not surfaced in production.
d)Dynamo considers heterogeneity as one of the design principles but in the experimental part they have carried out their experiment on all homogeneous machines.

Monday 14 October 2013

SPAR Review

Paper Link :
SPAR 


This paper presents a new system(SPAR) of doing Partition on social Networks. SPUR helps in minimizing the replication overhead by maintaining data locality  .When experiments were conducted on Twitter and Facebook data sets ,using various algorithm and compared with each other it was clearly seen that SPAR outperforms other algorithms. So seeing the huge demand of  social network, I think using SPUR will bring significant improvement in data replication .The paper is also well written and conveys the idea clearly.

8. Detailed comments
A. Summery
1) Scalability of real systems is a complex field. It becomes more difficult for Social networks as the data is not disjoint. The paper proposes a middleware for partition and replication for Social Networks called SPAR. SPAR works on joint partitioning and replication. Author explains replication with a graph containing 10 nodes on 2 servers and gives overview of replication using DHT,Full and SPAR. In case of SPAR the queries are resoled locally on the server as a result the throughput was high.  SPAR also gives user flexibility to select its datasource. Spar is a online algorithm so it is highly useful for dynamic social graphs as these graphs requires recomputation of partitions.The author describes the Min replication problem and gives a solution based on greedy optimization. The algorithm is triggered by any of add/removal of node,sever,edge.All the 6 cases are analyzed. Addition of user happens at the partition with minimum replicas. When a user is deleted master and its slaves are deleted.In the case of new relation a edge is created between 2 users . Algorithm checks if the two masters are co located if so no action is required. If not then it calculates minimum number of replicas to be created.The author explains the edge addition with various cases.It was illustrated that minimizing the nodes was not the only condition for minimizing replication .The cases of server addition and removal is also discussed in the paper.In case of server addition it was seen that SPAR was able to achieve stable state ,irrespective of how servers were added.  Extensive experiments were conducted on data from Facebook and Twitter and SPAR was compared against various other algorithms like MOTIS,MO+,Random. The experiments were conducted with 0 and 2 replicas and computed movement cost across 4 servers to 512 servers. It was seen that overhead was minimum in case of SPAR. COV for read and write operation in case of SPAR was 0.37 and 0.0019 ,indicating spars efficient efficient handling of read and write in terms of balncing them across servers.Using Twitter clone Statusnet performance of SPAR was studied on top of MYSQL and Casandra and it was seen that SPAR reduces network traffic by a factor of 8.

3)It is not very clear from the paper how the new edges are added. How they deal with edges if there is metadata attached to it.
4)Local load balancing was not addressed.
5)Formulation of Solution is very vague and could have been explained in a better way.

Sunday 13 October 2013

Consistent Hashing : In a nutshell

The idea of Consistent Hashing dates back to the 90's when David Karger and others presented a paper on Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web . 
In Amazon Dynamo the database partition scheme relies on Consistent Hashing.
So what is Consistent Hashing ?

Suppose you have a simple web application , your traffic increased suddenly and your db started to slow down. So you decided to cache some result so that your webpages load quickly.If you have n machine the common way to load balance will be to put URL U  in cache machine number Hash Function : U+5 Mod N. If N = 23 URL mapping will be  U + 5 Mod 23 . .As we are using mod n so all objects will be distributed equally and then the new objects will be added.Problem occur when you decide to remove or add a cache machine .
Add a machine ,N will increase and so all your cache will become invalid.New URL = U + 5 Mod 24.

What Can be the best Way ?

I would say and also advisable will be if a machine is added it will take share of objects from all the rest machine to equally distribute the load and when cache is removed its better to share its object with remaining machine. Thats  what Consistent Hashing does.. Consistently maps objects to the same cache machine, as far as is possible. Each item should be mapped to only a small number of machines, and in such a way that all machines get roughly the same load of items.

Choose constant Hash function that maps URL to range [0,1]Assume it to be a unit circle and similarly map Cache to Unit Circle.


URL is assigned to closest cache going clockwise . 1,2,3 Item mapped to A. and B mapped to 4,5 When a new cache is added the only URLs that are reassigned are those closest to the new cache going clockwise around the circle. You start where resource A is and head clockwise on the ring until you hit a server. If that server is down, you go to the next one, and so on and so forth. Suppose if Cache C is added ,Only item adjacent to Cache is added in this case 1,2 added to C .If suppose C is removed then items 2,3 move back to A again.

Saturday 12 October 2013

Map Reduce : Short Summery


This paper talks about Map Reduce a parallel data processing framework. The framework  is fairly simple and can accommodate wide variety of task and almost all complexity of running code on a distributed environment is hidden from the user. The paper considers all kinds of possible failures can occur and how they have been tackled in the framework.In short its small,simple yet powerful framework cappable of performing task on  large amount of data.

8.Summery:
1)A framework to allow for easy large scale distributed computing . Programmer has to define just two functions map and reduce without bothering about the internal details like load balancing,fault tolerance,data locality etc. these two function runs in a parallel manner. Map function does some data parallel operations and reduce function aggregates the result from map and produce the output.
2) Programming model is simple and yet powerful. A small amount of code can do powerful thing. Some of the application of map reduce are distributed Grep,count of URL access frequency, Reverse Web link graph etc .
3) The Fault tolerance mechanism is quiet effective. A single master manages all the workers. Failure of worker is handled by restarting the worker making it resilient to large scale worker failures. As there is a single master its failure is highly unlikely and map reduce computation is aborted in such case. The papers take care of straggler workers by chopping of the latency tail by starting duplicate tasks at the end of job.
4)Many extensions are provided with the framework to make It more convenient for the user , like the portioning function, combiner function, skipping bad records, counters etc.

9.Limitations :
There are cases where Map reduce programming is not a suitable choice  :
1)Real time processing.
2)Cases where computation of next value takes into account the previous value.
3)If the data size is small and can be computed on single machine ,then it is better to use it as a single map reduce operation.
4)If you don’t want to use JAVA , you need to use streaming API which in itself is complex process.

In short Map reduce is a simple and powerful framework if you are dealing with huge amount of data.