Hadoop伪分布搭建
实验环境:在此环境下
修改配置文件(以下文件都是出现在hadoop安装路径下的etc/hadoop目录下,以及改动内容都是加些配置)
- hadoop.env.sh
# The java implementation to use.
export JAVA_HOME=/data/module/jdk1.8.0_144 #jdk的安装路径
- core-site.xml
#在上图标签间加入以下内容
<!-- 指定 HDFS 中 NameNode 的地址 低版本是8020端口-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://chensy:9000</value> #chensy-主机名
</property>
<!-- 指定 hadoop 运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/data/module/hadoop-2.7.3/tmp</value> #hadoop安装路径
</property>
- hdfs-site.xml
#加入的位置和上面的一样
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
- mapred-site.xml(该文件是不存在的,需要将模板文件mapred-site.xml.template改成mapred-site.xml)
<!-- 指定 mr 运行在 yarn 上-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
- yarn-site.xml
<!-- reducer 获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定 YARN 的 ResourceManager 的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>chensy</value>
</property>
配置文件修改正确后,格式化HDFS
[root@chensy hadoop]# hdfs namenode -format
已经成功,若报错注意查看报错日志。应该是某个配置文件出错。
接下来我们可以测试以下hadoop的大数据计算功能。启动HDFS和YARH
[root@chensy hadoop]# start-dfs.sh
Starting namenodes on [chensy]
The authenticity of host 'chensy (192.168.10.137)' can't be established.
ECDSA key fingerprint is SHA256:B71Ahv1kM/Dx0hPnwDZ/w49v4lqw/+B4RGqM2iwiW+4.
ECDSA key fingerprint is MD5:cb:16:e4:06:1b:71:4c:d9:4e:bd:47:0c:b6:e5:24:87.
Are you sure you want to continue connecting (yes/no)? yes
chensy: Warning: Permanently added 'chensy,192.168.10.137' (ECDSA) to the list of known hosts.
root@chensy's password:
chensy: starting namenode, logging to /data/module/hadoop-2.7.3/logs/hadoop-root-namenode-chensy.out
root@localhost's password:
localhost: starting datanode, logging to /data/module/hadoop-2.7.3/logs/hadoop-root-datanode-chensy.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:B71Ahv1kM/Dx0hPnwDZ/w49v4lqw/+B4RGqM2iwiW+4.
ECDSA key fingerprint is MD5:cb:16:e4:06:1b:71:4c:d9:4e:bd:47:0c:b6:e5:24:87.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
root@0.0.0.0's password:
0.0.0.0: starting secondarynamenode, logging to /data/module/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-chensy.out
[root@chensy hadoop]# start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /data/module/hadoop-2.7.3/logs/yarn-root-resourcemanager-chensy.out
root@localhost's password:
localhost: starting nodemanager, logging to /data/module/hadoop-2.7.3/logs/yarn-root-nodemanager-chensy.out
使用浏览器访问HDFS管理界面(IP加50070端口)
创建一个文件,然后编辑,内容随便
root@chensy software]# touch sample-wordcount.txt
[root@chensy software]# vi sample-wordcount.txt
I Love China
I Love Guangdong
I Love Qingyuan
I Love Lingnan
将文件传到HDFS
[root@chensy software]# hadoop fs -mkdir /sample
[root@chensy software]# hadoop fs -put /data/software/sample-wordcount.txt /sample
进入Hadoop的jar包所在的目录,统计文件里的数据
[root@chensy software]# cd /data/module/hadoop-2.7.3/share/hadoop/mapreduce
[root@chensy mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /sample/sample-wordcount.txt /sample/output
21/03/12 15:44:37 INFO client.RMProxy: Connecting to ResourceManager at chensy/192.168.10.137:8032
21/03/12 15:44:38 INFO input.FileInputFormat: Total input paths to process : 1
21/03/12 15:44:38 INFO mapreduce.JobSubmitter: number of splits:1
21/03/12 15:44:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1615534786776_0001
21/03/12 15:44:39 INFO impl.YarnClientImpl: Submitted application application_1615534786776_0001
21/03/12 15:44:39 INFO mapreduce.Job: The url to track the job: http://chensy:8088/proxy/application_1615534786776_0001/
21/03/12 15:44:39 INFO mapreduce.Job: Running job: job_1615534786776_0001
21/03/12 15:44:54 INFO mapreduce.Job: Job job_1615534786776_0001 running in uber mode : false
21/03/12 15:44:54 INFO mapreduce.Job: map 0% reduce 0%
21/03/12 15:45:03 INFO mapreduce.Job: map 100% reduce 0%
21/03/12 15:45:11 INFO mapreduce.Job: map 100% reduce 100%
21/03/12 15:45:11 INFO mapreduce.Job: Job job_1615534786776_0001 completed successfully
21/03/12 15:45:11 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=82
FILE: Number of bytes written=237453
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=172
HDFS: Number of bytes written=52
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5994
Total time spent by all reduces in occupied slots (ms)=5143
Total time spent by all map tasks (ms)=5994
Total time spent by all reduce tasks (ms)=5143
Total vcore-milliseconds taken by all map tasks=5994
Total vcore-milliseconds taken by all reduce tasks=5143
Total megabyte-milliseconds taken by all map tasks=6137856
Total megabyte-milliseconds taken by all reduce tasks=5266432
Map-Reduce Framework
Map input records=4
Map output records=12
Map output bytes=109
Map output materialized bytes=82
Input split bytes=111
Combine input records=12
Combine output records=6
Reduce input groups=6
Reduce shuffle bytes=82
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=143
CPU time spent (ms)=1380
Physical memory (bytes) snapshot=299139072
Virtual memory (bytes) snapshot=4162605056
Total committed heap usage (bytes)=165810176
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=61
File Output Format Counters
Bytes Written=52
执行指令
[root@chensy mapreduce]# hadoop fs -cat /sample/output/part-r-00000
China 1
Guangdong 1
I 4
Lingnan 1
Love 4
Qingyuan 1
当然也可以在浏览器上查看