开源大数据集群部署(二十一)Sparkonyarn部署

云的事随心讲 2024-04-19 07:21:57
1 spark on yarn安装(每个节点)

作者:櫰木

cd /root/bigdata/tar -xzvf spark-3.3.1-bin-hadoop3.tgz -C /opt/ln -s /opt/spark-3.3.1-bin-hadoop3 /opt/sparkchown -R spark:spark /opt/spark-3.3.1-bin-hadoop32 配置环境变量及修改配置cat /etc/profile.d/bigdata.shexport SPARK_HOME=/opt/sparkexport SPARK_CONF_DIR=/opt/spark/conf

引用变量

source /etc/profile

yarn的capacity-scheduler.xml文件修改配置保证资源调度按照CPU + 内存模式:(每个yarn 节点)

<property> <name>yarn.scheduler.capacity.resource-calculator</name> <!-- <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value> --> <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value></property>

在yarn-site.xml开启日志功能:

<property> <description>Whether to enable log aggregation</description> <name>yarn.log-aggregation-enable</name> <value>true</value></property><property> <name>yarn.log.server.url</name> <value>http://master:19888/jobhistory/logs</value></property>

修改mapred-site.xml: (每个yarn节点)

<property> <name>mapreduce.jobhistory.address</name> <value>hd1.dtstack.com:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hd1.dtstack.com:19888</value></property>

cd /opt/spark/conf

Spark 配置文件 (每个spark节点)

cat spark-defaults.confspark.eventLog.dir=hdfs:///user/spark/applicationHistoryspark.eventLog.enabled=truespark.yarn.historyServer.address=http://hd1.dtstack.com:18018 spark.history.kerberos.enabled=true spark.history.kerberos.principal=hdfs/hd1.dtstack.com@DTSTACK.COMspark.history.kerberos.keytab=/etc/security/keytab/hdfs.keytab

Spark 环境配置文件 (每个spark节点)

cat spark-env.shexport HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoopexport SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18018 -Dspark.history.fs.logDirectory=hdfs:///user/spark/applicationHistory"export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop由于需要读取日志文件,所以使用hdfs的keytab

创建对应hdfs目录,并修改权限

hdfs dfs -mkdir -p /user/spark/applicationHistoryhdfs dfs -chown -R spark /user/spark/

提交测试任务

cd /opt/spark./bin/spark-submit --master yarn --deploy-mode client --class org.apache.spark.examples.SparkPi examples/jars/spark-examples_2.12-3.3.1.jar3 启动spark history server

cd /opt/spark

开启history server

./sbin/start-history-server.sh

![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/3426d355c12547a2999941894d... =600x)

4 查看效果

  1)先进入YARN管理页面查看Spark on Yarn应用,并点击如下图的History:

直接访问histroy server

http://ip:18018

更多技术信息请查看云掣官网

0 阅读:0

云的事随心讲

简介:感谢大家的关注