Hadoop伪分布搭建-摩杜云开发者社区

Hadoop伪分布搭建

实验环境：在此环境下

修改配置文件（以下文件都是出现在hadoop安装路径下的etc/hadoop目录下，以及改动内容都是加些配置）

Hadoop伪分布搭建_mapreduce

hadoop.env.sh

# The java implementation to use.
export JAVA_HOME=/data/module/jdk1.8.0_144 #jdk的安装路径

core-site.xml

#在上图标签间加入以下内容
<!-- 指定 HDFS 中 NameNode 的地址 低版本是8020端口-->
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://chensy:9000</value> #chensy-主机名
</property>
<!-- 指定 hadoop 运行时产生文件的存储目录 -->

<property>
  <name>hadoop.tmp.dir</name>
  <value>/data/module/hadoop-2.7.3/tmp</value> #hadoop安装路径
</property>

hdfs-site.xml

#加入的位置和上面的一样
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>
<property>
  <name>dfs.permissions</name>
  <value>false</value>
</property>

mapred-site.xml（该文件是不存在的，需要将模板文件mapred-site.xml.template改成mapred-site.xml）

<!-- 指定 mr 运行在 yarn 上-->
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>

yarn-site.xml

<!-- reducer 获取数据的方式 -->
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>

<!-- 指定 YARN 的 ResourceManager 的地址 -->
<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>chensy</value>
</property>

配置文件修改正确后，格式化HDFS

[root@chensy hadoop]# hdfs namenode -format

Hadoop伪分布搭建_HDFS_02

已经成功，若报错注意查看报错日志。应该是某个配置文件出错。

接下来我们可以测试以下hadoop的大数据计算功能。启动HDFS和YARH

[root@chensy hadoop]# start-dfs.sh
Starting namenodes on [chensy]
The authenticity of host 'chensy (192.168.10.137)' can't be established.
ECDSA key fingerprint is SHA256:B71Ahv1kM/Dx0hPnwDZ/w49v4lqw/+B4RGqM2iwiW+4.
ECDSA key fingerprint is MD5:cb:16:e4:06:1b:71:4c:d9:4e:bd:47:0c:b6:e5:24:87.
Are you sure you want to continue connecting (yes/no)? yes
chensy: Warning: Permanently added 'chensy,192.168.10.137' (ECDSA) to the list of known hosts.
root@chensy's password:
chensy: starting namenode, logging to /data/module/hadoop-2.7.3/logs/hadoop-root-namenode-chensy.out
root@localhost's password:
localhost: starting datanode, logging to /data/module/hadoop-2.7.3/logs/hadoop-root-datanode-chensy.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:B71Ahv1kM/Dx0hPnwDZ/w49v4lqw/+B4RGqM2iwiW+4.
ECDSA key fingerprint is MD5:cb:16:e4:06:1b:71:4c:d9:4e:bd:47:0c:b6:e5:24:87.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
root@0.0.0.0's password:
0.0.0.0: starting secondarynamenode, logging to /data/module/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-chensy.out

[root@chensy hadoop]# start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /data/module/hadoop-2.7.3/logs/yarn-root-resourcemanager-chensy.out
root@localhost's password:
localhost: starting nodemanager, logging to /data/module/hadoop-2.7.3/logs/yarn-root-nodemanager-chensy.out

使用浏览器访问HDFS管理界面（IP加50070端口） Hadoop伪分布搭建_mapreduce_03

创建一个文件，然后编辑，内容随便

root@chensy software]# touch sample-wordcount.txt
[root@chensy software]# vi sample-wordcount.txt

I Love China
I Love Guangdong
I Love Qingyuan
I Love Lingnan

将文件传到HDFS

[root@chensy software]# hadoop fs -mkdir /sample
[root@chensy software]# hadoop fs -put /data/software/sample-wordcount.txt /sample

进入Hadoop的jar包所在的目录，统计文件里的数据

[root@chensy software]# cd /data/module/hadoop-2.7.3/share/hadoop/mapreduce
[root@chensy mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /sample/sample-wordcount.txt /sample/output
21/03/12 15:44:37 INFO client.RMProxy: Connecting to ResourceManager at chensy/192.168.10.137:8032
21/03/12 15:44:38 INFO input.FileInputFormat: Total input paths to process : 1
21/03/12 15:44:38 INFO mapreduce.JobSubmitter: number of splits:1
21/03/12 15:44:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1615534786776_0001
21/03/12 15:44:39 INFO impl.YarnClientImpl: Submitted application application_1615534786776_0001
21/03/12 15:44:39 INFO mapreduce.Job: The url to track the job: http://chensy:8088/proxy/application_1615534786776_0001/
21/03/12 15:44:39 INFO mapreduce.Job: Running job: job_1615534786776_0001
21/03/12 15:44:54 INFO mapreduce.Job: Job job_1615534786776_0001 running in uber mode : false
21/03/12 15:44:54 INFO mapreduce.Job:  map 0% reduce 0%
21/03/12 15:45:03 INFO mapreduce.Job:  map 100% reduce 0%
21/03/12 15:45:11 INFO mapreduce.Job:  map 100% reduce 100%
21/03/12 15:45:11 INFO mapreduce.Job: Job job_1615534786776_0001 completed successfully
21/03/12 15:45:11 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=82
                FILE: Number of bytes written=237453
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=172
                HDFS: Number of bytes written=52
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=5994
                Total time spent by all reduces in occupied slots (ms)=5143
                Total time spent by all map tasks (ms)=5994
                Total time spent by all reduce tasks (ms)=5143
                Total vcore-milliseconds taken by all map tasks=5994
                Total vcore-milliseconds taken by all reduce tasks=5143
                Total megabyte-milliseconds taken by all map tasks=6137856
                Total megabyte-milliseconds taken by all reduce tasks=5266432
        Map-Reduce Framework
                Map input records=4
                Map output records=12
                Map output bytes=109
                Map output materialized bytes=82
                Input split bytes=111
                Combine input records=12
                Combine output records=6
                Reduce input groups=6
                Reduce shuffle bytes=82
                Reduce input records=6
                Reduce output records=6
                Spilled Records=12
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=143
                CPU time spent (ms)=1380
                Physical memory (bytes) snapshot=299139072
                Virtual memory (bytes) snapshot=4162605056
                Total committed heap usage (bytes)=165810176
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=61
        File Output Format Counters
                Bytes Written=52

执行指令

[root@chensy mapreduce]# hadoop fs -cat /sample/output/part-r-00000
China   1
Guangdong       1
I       4
Lingnan 1
Love    4
Qingyuan        1

当然也可以在浏览器上查看

Hadoop伪分布搭建_hadoop_04