Amazon Elastic MapReduce


What is Amazon Elastic MApReduce?

Amazon elastic map reduce (Amazon EMR) is a web service provided by amazon. Amazon EMR basically provides a managed framework to run data processing frameworks, such as apache hadoop, apache spark and presto. This service is very cost effective, easy and secure to implement. Amazon EMR is used for data analysing, data warehousing, web indexing and for many more things.

How to implement Amazon EMR?

  1. Sign in to the AWS account and select Amazon EMR on the management console.
  2. Create an Amazon s3 bucket for cluster logs and output data.
  3. Launch Amazon EMR cluster. Use the following link to open the console.
    https://console.aws.amazon.com/elasticmapreduce/home
  4. Run the hive script using following some steps :
    • Open Amazon EMR cluster and select your cluster.
    • Move to the step section and click on add steps.
    • Dialog box of add steps will open, fill the required fields and then click on add.
  5. To view output of script follow the steps
    • Open amazon S3 console and select s3 bucket used for output.
    • Select output folder
    • The scripts write the output in a separate folder.
    • The output is stored in a separate file and you can download it.

Benefits of Amazon EMR

  1. EMR is easy to use : Amazon EMR is easy to use and you can easily configure and set up the system.
  2. Reliability : Amazon EMR is very reliable and we can put trust on it.
  3. Elasticity : Amazon EMR is very elastic in nature, meaning you can handle any number of data with it.
  4. Secure mechanism : Amazon EMR is a very secure mechanism, it uses EC2 firewall to provide security.
  5. Cost effective :Amazon EMR is very cost effective because it has very cheap rates for web services.