Deep Learning Training in chennai

deep learning training in chennai

Results and discussion

This section outlines the runtime environment used for the performance comparison of the four MapReduce-based HLQL. Deep Learning Training in chennai.  The metrics used for making a systematic performance comparison are: increasing input size, scale-out number of nodes and controlling reducer tasks. The performance baseline for each benchmark is a direct implementation of MR, which allow the developers to assess the overhead of each MapReduce-based HLQL. Moreover, WordCount is one of the applicable canonical benchmarks in the literature of Hadoop MR. Furthermore, the Web Log benchmark is another standard used for processing purposes. It is based on the processing of queries to count the average time spent on the website pages by each visitor on a set of web files in a textual format. The Join benchmark consists of two sub-tasks that perform selecting data from two tables. These three benchmarks are the appropriate traditional ones founded in the literature review of MR and HLQL [5, 22, 23, 24].

All experiments utilize only Hadoop MR v2.7.2 (latest stable version) as execution engine, Hive v2.3.2, Pig v0.17.0, JAQL v0.6 and Big SQL v4.2.4 were running on cluster consisting of one dedicated to NameNode and ten DataNodes. Each node was equipped with processor Intel Core i5-5300U CPU 2.30 GHz and 8 GB of RAM. We have used 10 datasets, for x = 1,…,10 of size x = 2GBs. That is, the largest dataset is x10 = 20GBs. The input format used for our experiments in HDFS is a textual file. Our experiments do not include data preparation and loading time.

Increasing input size metric

To perform the scaling input size measurement of each MapReduce-based HLQL, the size of the cluster is fixed at one NameNode and ten DataNodes, and the dataset size has been doubled for each experiment to get the required results. This experiment is based on the analysis of the processing time by using the WordCount, web log processing and the Join program. All the used programs are written with the four MapReduce-based HLQL namely JAQL, Ansi-SQL, HiveQL and Pig Latin.

[ Deep Learning Training in chennai @DATAZ ]

A prominent characteristic of these three measurements is about the total running time, which increases with the input size. It can be observed that Pig and JAQL achieve similar performance, though both MR Java and Hive perform considerably better. This can be seen clearly from experiment results shown in Figs. 8, 9, whereas, Big SQL is the most powerful processing time in the three experiments presented previously (Figs. 8, 9, 10). In the Join benchmark, from the smallest (x1) to the largest data input size (x10), JAQL has the highest running time compared to other MapReduce-based HLQL and MR Java (Fig. 10). Big SQL achieves the lowest running time among all other languages with 42% quicker than JAQL (Fig. 8).

Fig. 8

Fig. 8[ Deep Learning Training in chennai @DATAZ ]

WordCount benchmark metric in increasing input size

Fig. 9

Fig. 9

Web log processing benchmark metric in increasing input size

Fig. 10

Fig. 10

Join benchmark metric in increasing input size

As a result, in Figs. 9 and 10, a weak performance is delivered by Pig in every running time performance of Join benchmark compared to its high performance in web log processing. While comparing these MapReduce-based HLQL at the level of the running time, we find that the Join benchmark of the MR java, Hive and Pig take almost the same execution time (Fig. 10). On one hand, JAQL takes a long running time. On the other hand, Big SQL is characterized by less running time in increasing input size metric.

[ Deep Learning Training in chennai @DATAZ ]

Scale-out number of nodes

We carry out another experiment where we have focused on the number of nodes (DataNodes). These number will be increased from 1 NameNode/DataNode to 1 NameNode and 10 DataNodes. The size of input datasets is fixed in x10. The results of these experiments, scale-out number of nodes, are represented in Figs. 11, 12.

best deep learning training in chennai

Fig. 11

WordCount benchmark for scale-out nodes

academy of deep learning

Fig. 12

Join benchmark for scale-out nodes

All these results clearly illustrate one common design challenge for parallel system. In the WordCount benchmark, beyond the first added node (Fig. 11), there is a notable improvement for Pig, Hive and JAQL. These three MapReduce-based HLQL are all capable to use the additional processing capacity up to 2 nodes. At each point, we notice no further expansion in three MapReduce-based HLQL running time performance. However, Big SQL does not benefit from adding nodes. As result, the running time performance is decreased by 43% from one to ten nodes (Fig. 11).

[ Deep Learning Training in chennai @DATAZ ]

By considering on Fig. 12, which corresponds to the different running time performances, we can see that this experiment has different improvements compared to WordCount benchmark. This experiment, scale-out for Join benchmark, can not demonstrate that the addition of the nodes is beneficial for JAQL, Hive and Pig which achieve similar change. Therefore, Big SQL knows a good feedback with the addition of nodes. It has managed to decrease the running time performance by 60% (Fig. 12).

Controlling number of reducers

All MR jobs split into Map and Reduce tasks. Reducer tasks minimize a set of intermediate values associated with each intermediate key, generated by the map function and breaks down the output to a smaller set of values. A reducer writes output (key, value) pairs to both disk and memory via the namespace abstraction for further processing. The developers decide the number of reducers. Even if they have too many reducers and they can not determine the optimal number of reducers, context-switching overhead of reducer tasks will have a significant impact of performance.

Whilst the number of mappers is determined automatically, based on the split size which is 64 MB for each block. By default, one reducer is set in Hadoop configuration. There is a performance impact associated with MR jobs when investigating and controlling the number of reducer tasks. Many accomplished works have focused on Hive, Pig and MR, aiming to control reducers and to specify their required number while Big SQL and JAQL did not. JAQL and Big SQL will be added in this benchmark. Figures 13, 14 depict the results of controlling the number of reducers. Beyond 40 Reducers tasks, there is an increase in running time performance for all MapReduce-based HLQL. In parallel, Reduce tasks parameter increases to 100, whereas JAQL and Big SQL are all able to utilize the increasing number of Reducer tasks for the optimization of runtime capacity up to 40 Reducers tasks. WordCount program shows these results in Fig. 13.

Deep Learning institute in chennai

Fig. 13

Controlling the number of reducers: WordCount benchmark

Deep Learning Training in chennai

Fig. 14

Controlling the number of reducers: Join benchmark

An important contrast between the WordCount and the join benchmark, shown in Figs. 13, 14 respectively, proves that the three MapReduce-based HLQL, Big SQL, Pig and Hive achieve the fastest running time performance at 50 reducers, after which, performance grows gradually (Fig. 14). To summarize this experiment, the additional expressive of Reducer number can optimize performance with 40% (Fig. 14). In other words, controlling Reducers tasks provide more performance penalty with an increase of 50% in runtime performance (Fig. 13).

[ Deep Learning Training in chennai @DATAZ ]

Articles Source : Springer Open

Related :

Leave a Reply


Working Hours

  • Monday9am - 6pm
  • Tuesday9am - 6pm
  • Wednesday9am - 6pm
  • Thursday9am - 6pm
  • Friday9am - 6pm
  • SaturdayClosed
  • SundayClosed
Latest Posts

Big Data training Academy in chennai
data science course in chennai
Wanna attend a demo class?

We are glad that you preferred to schedule a demo class. Please fill our short form and one of our friendly team members will contact you back.


[recaptcha]

X
Demo Class