Hadoop(一)入门
  Ft2RVYcwxBRK 2023年11月02日 95 0

一、概念

1、Hadoop是什么

Hadoop 是一个提供分布式存储和计算的开源软件框架,它具有无共享、高可用(HA)、弹性可扩展的特点,非常适合海量数据的存储海量数据的分析计算

  • Hadoop 是一个开源软件框架
  • Hadoop 适合处理大规模数据
  • Hadoop 被部署在一个可扩展的集群服务器上

广义上,Hadoop通常是指一个更广泛的概念-Hadoop生态圈。

2、Hadoop的优势

1.扩容能力强

Hadoop是一个高度可扩展的存储平台,它可以存储和分发跨越数百个并行操作的廉价的服务器数据集群。不同于传统的关系型数据库不能扩展到处理大量的数据,Hadoop是能 给企业提供涉及成百上千TB的数据节点上运行的应用程序。

2.成本低

Hadoop为企业用户提供了极具缩减成本的存储解决方案。通过普通廉价的机器组成服务器集群来分发处理数据,成本比较低,普通用户也很容易在自己的PC机上搭建Hadoop运 行环境。

3.高效率

Hadoop能够并发处理数据,并且能够在节点之间动态地移动数据,并保证各个节点的动态平衡,因此处理数据的速度是非常快的。

4.可靠性

Hadoop自动维护多份数据副本,假设计算任务失败,Hadoop能够针对失败的节点重新分布处理。

5.高容错性

Hadoop的一个关键优势就是容错能力强,当数据被发送到一个单独的节点,该数据也被复制到集群的其他节点上,这意味着故障发生时,存在另一个副本可供使用。

3、Hadoop的组成

Hadoop(一)入门_xml

3.1 HDFS概述

Hadoop(一)入门_Hadoop_02

3.2 YARN概述

3.3 MapReduce概述

6、大数据技术生态体系

Hadoop(一)入门_xml_03

二、环境准备

1、虚拟机准备

序号

名称

IP

其他

1

hadoop001

192.168.30.50


2

hadoop002

192.168.30.51


3

hadoop003

192.168.30.52


2、Hadoop单机部署

环境部署

192.168.30.50服务器

安装jdk
[root@node50 opt]# mkdir software
[root@node50 opt]# cd software/
[root@node50 software]# ls
jdk-8u261-linux-x64.tar.gz
[root@node50 software]# 
[root@node50 software]# ls
jdk-8u261-linux-x64.tar.gz
[root@node50 software]# mkdir /opt/module
[root@node50 software]# tar -zxvf jdk-8u261-linux-x64.tar.gz -C /opt/module/

配置环境变量

/etc/profile会执行遍历/etc/profile.d/*.sh,可以通过自定义.sh文件配置环境变量

sudo vim /etc/profile.d/my_env.sh

内容如下:

#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_261
PATH=$PATH:$JAVA_HOME/bin

执行如下命令安装完成

[root@node50 jdk1.8.0_261]# source /etc/profile

安装Hadoop

[root@node50 software]# wget https://archive.apache.org/dist/hadoop/common/hadoop-3.1.3/hadoop-3.1.3.tar.gz
[root@node50 software]# tar -zxvf hadoop-3.1.3.tar.gz -C /opt/module/
[root@node50 software]# cd /opt/module/hadoop-3.1.3/
[root@node50 hadoop-3.1.3]# sudo vim /etc/profile.d/my_env.sh

编辑配置文件

#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

执行如下命令安装完成

[root@node50 jdk1.8.0_261]# source /etc/profile

三、Hadoop生产集群搭建

1、服务器准备

Hadoop(一)入门_Hadoop_04

序号

名称

IP

其他

1

hadoop001

192.168.30.50


2

hadoop002

192.168.30.51


3

hadoop003

192.168.30.52


参考单机部署,hadoop001、hadoop002、hadoop003分别安装jdk和Hadoop

2、集群部署规划

hadoop001

hadoop002

hadoop003

HDFS

NameNode

DataNode

DataNode

SecondaryNameNode

DataNode

YARN

NodeManager

ResourceManager

NodeManager

NodeManager

说明:

  • NameNode和SecondaryNameNode不要安装在同一台服务器
  • ResourceManager也很消耗内存,不要和NameNode、SecondaryNameNode配置在同一台服务器

3、配置文件:

默认配置文件

Hadoop(一)入门_Hadoop_05

自定义配置文件:

  • core-site.xml
  • hdfs-site.xml
  • yarn-site.xml
  • mapred-site.xml

路径:$HADOOP_HOME/etc/hadoop

[root@node50 hadoop]# pwd
/opt/module/hadoop-3.1.3/etc/hadoop
[root@node50 hadoop]# ls
capacity-scheduler.xml            httpfs-log4j.properties     mapred-site.xml
configuration.xsl                 httpfs-signature.secret     shellprofile.d
container-executor.cfg            httpfs-site.xml             ssl-client.xml.example
core-site.xml                     kms-acls.xml                ssl-server.xml.example
hadoop-env.cmd                    kms-env.sh                  user_ec_policies.xml.template
hadoop-env.sh                     kms-log4j.properties        workers
hadoop-metrics2.properties        kms-site.xml                yarn-env.cmd
hadoop-policy.xml                 log4j.properties            yarn-env.sh
hadoop-user-functions.sh.example  mapred-env.cmd              yarnservice-log4j.properties
hdfs-site.xml                     mapred-env.sh               yarn-site.xml
httpfs-env.sh                     mapred-queues.xml.template

修改配置文件

hadoop001hadoop002hadoop003所有节点节点都需要配置以下配置文件

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <!-- 指定NameNode的地址 -->
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoop001:8020</value>
  </property>
  <!-- 指定hadoop数据的存储目录 -->
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/opt/module/hadoop-3.1.3/data</value>
  </property>
  <!-- 配置HDFS网页登录使用的静态用户位hadoop -->
  <property>
    <name>hadoop.http.staticuser.user</name>
    <value>hadoop</value>
  </property>
</configuration>
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <!-- nn web端访问地址 -->
  <property>
    <name>dfs.namenode.http-address</name>
    <value>hadoop001:9870</value>
  </property>
  <!-- 2nn web端访问地址 -->
  <property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>hadoop003:9868</value>
  </property>
</configuration>
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
  <!-- 指定MR走shuffle -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <!-- 指定ResourceManager的地址 -->
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>hadoop002</value>
  </property>
  <!-- 环境变量的继承,版本3.1.1需要配置,后面3.2以上就不需要配置了 -->
  <property>
    <name>yarn.nodemanager.env-whitelist</name>
    <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
  </property>
</configuration>
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <!-- 指定MapReduce程序运行在Yarn上 -->
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

4、启动集群

配置workers

在hadoop001、hadoop002、hadoop003节点分别配置

路径:/opt/module/hadoop-3.1.3/etc/hadoop

hadoop001
hadoop002
hadoop003

注意:该文件中添加的内容结尾不允许有空格,文件中不允许有空行。

启动集群

1、格式化NameNode

如果是第一次启动,需要在hadoop001节点格式化NameNode

[root@node50 hadoop-3.1.3]]# pwd
/opt/module/hadoop-3.1.3
[root@node50 hadoop-3.1.3]]# hdfs namenode -format

说明:

  • 格式化NameNode会产生新的集群id,导致NameNode和DataNode的集群id不一致,集群找不到历史数据。
  • 如果集群在运行中报错,需要重新格式化NameNode,一定要先停止namenode和datanode的进行,并删除所有机器的data和logs目录,然后再进行格式化。

格式化成功后会在hadoop-3.1.3生成data目录

Hadoop(一)入门_xml_06

2、hadoop001 启动HDFS

[root@node50 hadoop-3.1.3]# sbin/start-dfs.sh

说明:启动报错在hadoop-env.sh添加一下内容

Hadoop(一)入门_Hadoop_07

export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

3、hadoop002 启动YARN

sbin/start-yarn.sh

说明:启动报错

Hadoop(一)入门_xml_08

解决方案见上一步hadoop-env.sh添加变量

4、查看HDFS界面

hadoop001

http://192.168.30.50:9870/

Hadoop(一)入门_hadoop_09

5、查看YRAN的ResourceManager

hadoop002

http://192.168.30.51:8088/

Hadoop(一)入门_xml_10

6、jps查看各节点启动信息

hadoop001

Hadoop(一)入门_xml_11

hadoop002

Hadoop(一)入门_hadoop_12

hadoop003

Hadoop(一)入门_xml_13

7、(可选)配置历史服务器

hadoop001、hadoop002、hadoop003完善mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <!-- 指定MapReduce程序运行在Yarn上 -->
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>


  <!-- 历史服务器端地址 -->
  <property>
    <name>mapreduce.jobhistory.address</name>
    <value>hadoop001:10020</value>
  </property>


  <!-- 历史服务器web端地址 -->
  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>hadoop001:19888</value>
  </property>
</configuration>
mapred --daemon start historyserver

测试:http://hadoop001:19888/jobhistory

8、(可选)配置日志聚集功能

Hadoop(一)入门_hadoop_14

Hadoop(一)入门_xml_15

5、集群测试

1、上传文件

[root@node50 hadoop-3.1.3]# hadoop fs -put /opt/software/jdk-8u261-linux-x64.tar.gz  /software

上传本地文件/opt/software/jdk-8u261-linux-x64.tar.gz

Hadoop(一)入门_hadoop_16

Hadoop(一)入门_xml_17

Hadoop(一)入门_Hadoop_18

2、常用命令

Hadoop(一)入门_Hadoop_19

3、常用命令脚本

统一启动/停止脚本

编写脚本/opt/module/custombin

#!/bin/bash
if [ $# -lt 1 ]
then
		echo "No Args Input..."
		exit;
fi

case $1 in 
"start")
	echo "=================== 启动hadoop集群=========================="

	echo "=================== 启动hhdfs=============================="
	ssh hadoop001 "/opt/module/hadoop-3.1.3/sbin/start-dfs.sh"
	echo "=================== 启动yarn==============================="
	ssh hadoop001 "/opt/module/hadoop-3.1.3/sbin/start-yarn.sh"
	echo "=================== 启动historyserver======================"
	ssh hadoop001 "/opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver"
;;
"stop")
	echo "=================== 关闭hadoop集群========================="

	echo "=================== 关闭historyserver======================"
	ssh hadoop001 "/opt/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver"
	echo "=================== 关闭yarn==============================="
	ssh hadoop001 "/opt/module/hadoop-3.1.3/sbin/stop-yarn.sh"
	echo "=================== 关闭hhdfs=============================="
	ssh hadoop001 "/opt/module/hadoop-3.1.3/sbin/stop-dfs.sh"
;;
*)
	echo "Input Args Error..."
;;
esac

授权

chomd +x myhadoop.sh

执行

./myhadoop start
./myhadoop stop

统一查看Java进程脚本

#!/bin/bash
for host in hadoop001 hadoop002 hadoop003
do
	echo ===========$host=================
	ssh $host jps
done

授权

chomd +x jpsall

执行

./jpsall

4、常用端口号

Hadoop(一)入门_xml_20

5、常用配置文件

Hadoop(一)入门_Hadoop_21

6、时间同步(无法联外网的情况下)

Hadoop(一)入门_Hadoop_22

Hadoop(一)入门_Hadoop_23

Hadoop(一)入门_Hadoop_24

【版权声明】本文内容来自摩杜云社区用户原创、第三方投稿、转载,内容版权归原作者所有。本网站的目的在于传递更多信息,不拥有版权,亦不承担相应法律责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@moduyun.com

  1. 分享:
最后一次编辑于 2023年11月08日 0

暂无评论

Ft2RVYcwxBRK
作者其他文章 更多
最新推荐 更多

2024-05-31