Apache Spark | Best institute for Big Data

big data training in chennai


Dataz a unit of Geoinsyssoft Presents Big data Training in chennai. best institute for bigdata in chennai

Example for RDDs:

Creation of new RDDs:

scala> case class Customer(name :String,product_name:String,price:Int,country:String,state:String)
defined class Customer

scala> val Customerrdd = sc.textFile(“/home/geouser/Documents/Bigdata/Spark/sales.csv”)
Customerrdd: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[17] at textFile at <console>:27

scala> Customerrdd.collect
res5: Array[String] = Array(carolina,Product1,1200,Basildon,England, Betina,Product1,1200,Parkville ,MO, Federica e Andrea,Product1,1200,Astoria ,OR, Gouya,Product1,1200,Echuca,Victoria, Gerd W ,Product2,3600,Cahaba Heights ,AL, carolina,Product1,1200,Mickleton ,NJ, Fleur,Product1,1200,Peoria ,IL, adam,Product1,1200,Martin ,TN, Renee Elisabeth,Product1,1200,Tel Aviv,Tel Aviv, Aidan,Product1,1200,Chatou,Ile-de-France)

scala> Customerrdd.count()
res7: Long = 10

DataFrames in Apache Spark:

A DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a R/Python Dataframe. Along with Dataframe, Spark also introduced catalyst optimizer, which leverages advanced programming features to build an extensible query optimizer. best institute for bigdata in chennai

DataFrame Features:

  • Distributed collection of Row Object
  • Data Processing
  • Optimization using catalyst optimizer
  • Hive Compatibility
  • Tungsten
  • Programming Languages supported
  • Coding with DataFrames. best institute for bigdata in chennai

Creating a DataFrame from existing RDDs:

scala> val df = Customerrdd.toDF
df: org.apache.spark.sql.DataFrame = [_1: string]

scala> df.collect
res8: Array[org.apache.spark.sql.Row] = Array ([carolina,Product1,1200,Basildon,England], [Betina,Product1,1200,Parkville                   ,MO], [Federica e Andrea,Product1,1200,Astoria                     ,OR], [Gouya,Product1,1200,Echuca,Victoria], [Gerd W ,Product2,3600,Cahaba Heights              ,AL], [carolina,Product1,1200,Mickleton                   ,NJ], [Fleur,Product1,1200,Peoria                      ,IL], [adam,Product1,1200,Martin                      ,TN], [Renee Elisabeth,Product1,1200,Tel Aviv,Tel Aviv], [Aidan,Product1,1200,Chatou,Ile-de-France])

Applying Transformations / Actions operations in DataFrame:

scala> df.count
res12: Long = 10

scala> val df = Customerrdd.toDF(“line”)
df: org.apache.spark.sql.DataFrame = [line: string]

scala> df.take(3)
res29: Array[org.apache.spark.sql.Row] = Array([carolina,Product1,1200,Basildon,England], [Betina,Product1,1200,Parkville ,MO], [Federica e Andrea,Product1,1200,Astoria ,OR])

scala> df.collect
res15: Array[org.apache.spark.sql.Row] = Array([carolina,Product1,1200,Basildon,England], [Betina,Product1,1200,Parkville ,MO], [Federica e Andrea,Product1,1200,Astoria ,OR], [Gouya,Product1,1200,Echuca,Victoria], [Gerd W ,Product2,3600,Cahaba Heights ,AL], [carolina,Product1,1200,Mickleton ,NJ], [Fleur,Product1,1200,Peoria ,IL], [adam,Product1,1200,Martin ,TN], [Renee Elisabeth,Product1,1200,Tel Aviv,Tel Aviv], [Aidan,Product1,1200,Chatou,Ile-de-France])

scala> df.filter(col(“line”).like(“%adam%”)).count()
res27: Long = 1

scala> df.filter(col(“line”).like(“%adam%”)).collect
res28: Array[org.apache.spark.sql.Row] = Array([adam,Product1,1200,Martin ,TN])


Source: Geoinsyssoft

You Must Be Like:


2.What is machine learning?


4.How to Be Ready for Big Data

Leave a Reply

Working Hours

  • Monday9am - 6pm
  • Tuesday9am - 6pm
  • Wednesday9am - 6pm
  • Thursday9am - 6pm
  • Friday9am - 6pm
  • SaturdayClosed
  • SundayClosed
Latest Posts

Big Data training Academy in chennai
data science course in chennai
Wanna attend a demo class?

We are glad that you preferred to schedule a demo class. Please fill our short form and one of our friendly team members will contact you back.


Demo Class