Saturday, 7 September 2013

Not able to add spark job on EC2 cluster

Not able to add spark job on EC2 cluster

I am new to Spark. I am able to launch, manage and shut down Spark
clusters on Amazon EC2 from
http://spark.incubator.apache.org/docs/0.7.3/ec2-scripts.html.
But I am not able to add below job on cluster.
package spark.examples
import spark.SparkContext
import SparkContext._
object SimpleJob {
def main(args: Array[String]) {
val logFile = "< Amazon S3 file url>"
val sc = new SparkContext("spark://<Host Name>:7077", "Simple Job",
System.getenv("SPARK_HOME"), Seq("<Jar Address>"))
val logData = sc.textFile(logFile)
val numsa = logData.filter(line => line.contains("a")).count
val numsb = logData.filter(line => line.contains("b")).count
println("total a : %s, total b : %s".format(numsa, numsb))
}
}
I have created a SimpleJob.scala and added in spark.examples package on my
local spark directory. After that I run the command: ./spark-ec2 -k -i
login
Cluster is started and I am able to login in cluster. But I don't know how
to add and run this job on EC2 cluster.
Can anyone help me to resolve this?
Thanks in advance.
Regards, Ayush

No comments:

Post a Comment