Apache Spark is a general-purpose & lightning fast cluster computing system. It provides a high-level API like Java, Scala, Python and R. It is a tool for running spark applications and it is 100 times faster than Hadoop and 10 times faster than accessing data from disk. Big data Chennai
Necessity of Apache Spark:
In the industry world , every one needed a general purpose cluster computing tools , such as
MapReduce(It is limited to batch processing).
Storm(It is limited to stream processing).
Impala(It is limited to interactive processing).
Neo4j(It is limited to graph processing).
So, here every one is handling single process only. But in Apache Apark , it provides real-time stream processing,interactive processing,graph processing,in-memeory processing as well as batch procesing with very fast speed, ease of use and standard interface. Big data Chennai
Components of Apache Spark;
- Spark Core
- Spaerk Sql
- Spark streaming
RDDs – Resilient Distributed Datasets:
Iit is the fundamental unit of data in spark, which is didtributed collection of elements across cluster nodes and can perform parallel operations. Big data Chennai
RDDs are immutable but can generate new RDD by transforming existing RDD.
There are two ways to create RDDs:
It is created by invoking parallelize method in the driver program.
It can be created by calling textfile method. This method takes an URI of the file and reda it as a collections of lines.
Article_Source : Geoinsyssoft
You Must be Like