Hadoop2.4编译 伪分布安装 集群安装笔记

资讯 2024-06-24 阅读:36 评论:0
安装hadoop的时候每次重新安装都有些配置会记不太清楚,要去查询很麻烦这次做了个笔记,在这里和大家分享下,如果内容有错误,请指正。因为是从word中复制出来的截的图片都没了,可以从下面...
美化布局示例

欧易(OKX)最新版本

【遇到注册下载问题请加文章最下面的客服微信】永久享受返佣20%手续费!

APP下载   全球官网 大陆官网

币安(Binance)最新版本

币安交易所app【遇到注册下载问题请加文章最下面的客服微信】永久享受返佣20%手续费!

APP下载   官网地址

火币HTX最新版本

火币老牌交易所【遇到注册下载问题请加文章最下面的客服微信】永久享受返佣20%手续费!

APP下载   官网地址

安装hadoop的时候每次重新安装都有些配置会记不太清楚,要去查询很麻烦这次做了个笔记,在这里和大家分享下,如果内容有错误,请指正。因为是从word中复制出来的截的图片都没了,可以从下面链接中下载到word原版!之前没有写博客的习惯,以后会陆续分享一些

安装hadoop的时候每次重新安装都有些配置会记不太清楚,要去查询很麻烦这次做了个笔记,在这里和大家分享下,如果内容有错误,请指正。因为是从word中复制出来的截的图片都没了,可以从下面链接中下载到word原版!之前没有写博客的习惯,以后会陆续分享一些之前的笔记!Hadoop2.4编译 伪分布安装 集群安装笔记

installs Hadop with some configurations that are not very clear at the time of each re-installation, and has to ask for and share this note with you. If there is an error, point to it. Because the screenshots copied from Word are missing, they can be downloaded from the following link to the original Word!

http://pan.baidu.com/s/1gdKdEcb

吃水不忘挖井人,hadoop集群配置相关内容很多都参考了吴超的博客,这里把他的博客的地址也贴出来,里面有很多文章写得很好,建议有兴趣的朋友去看看。

The well-drawer, much of the content of the hadoop cluster configuration was taken into account in Wu Wu's blog, where the address of his blog was posted, and many articles were well written and suggested to interested friends for a look.

http://www.superwu.cn/author/admin/


1.2.1.1.查看主机名

[root@hadoop1 ~]# hostname

hadoop1

?

Hadoop2.4编译 伪分布安装 集群安装笔记

1.2.1.2.修改主机名

[root@hadoop1 ~]# vi /etc/sysconfig/network

?

NETWORKING=yes

HOSTNAME=hadoop1

Hadoop2.4编译 伪分布安装 集群安装笔记

Hadoop2.4编译 伪分布安装 集群安装笔记

使用ifconfig命令查看当前网络配置

uses ifconfig command to view the current network configuration

Ping ip?查看当前主机和目标ip机器是否可以互相访问

Ping ip? See if the current host and target machine can access each other

telnet ip port查看当前主机是否可以访问目标机器的port端口

telnet ip port See if the current host can access the port of the target machine port

wget?http://www.baidu.comwget命令用于下载文件,直接wget?百度的地址可以用于测试当前主机是否可以访问外网,因为在编译hadoop源码的时候当前机器必须可以访问外网

wget? http://www.baidu.comwget command for downloading documents directlywget?100 degrees to test if the current host has access to the extranet, since the current machine must have access to the outer web when compiling the source codehadop

1.3.1.1.设置虚拟机的网络

Hadoop2.4编译 伪分布安装 集群安装笔记

Afp55588"Afmpt"

Hadoop2.4编译 伪分布安装 集群安装笔记

选则桥接模式,并且把桥接到这个选项选择为当前物理机上网用的那个网卡,我使用无线上网的,并且我的有线网络配置都是关闭的,所以这里只有一个选项。

vi /etc/sysconfig/network-scripts/ifcfg-eth0

配置内容如下,建议注释掉所有IPV6相关的配置(在启动resourcemanage的时候可能会报错,我在启动的时候报了一个错吧IPV6相关的配置关掉就好了)

IPV6 (may be wrong to report when starting resource.

DEVICE="eth0"

BOOTPROTO=static#静态IP

BOOTOTOTO=static/spanspan>static/spanspanIP/span>

#IPV6INIT="yes"

NM_CONTROLLED="yes"

ONBOOT="yes"?#启动的时候自动配置网络

TYPE="Ethernet"

#UUID="2d678a8b-6c40-4ebc-8f4e-245ef6b7a969"

NETMASK=255.255.255.0#子网掩码?和物理机一样

NETMASK=255.255.255.0/span> subnet mask?like a physics machine

GATEWAY=192.168.1.1#网关ip和物理机一样

GATEWAY=192.168.1.1/span> gateway ipand physics

IPADDR=192.168.1.201#ip地址和物理机在同一个网段

IPDDR=192.168.1.201#ip address and physics machine in the same section

PREFIX=24

DNS1=101.226.4.6?#NDS服务地址?和物理机一样

DNS1=101.226.4.6?#NDS service address ? /span> and the physics machine

DEFROUTE=yes

IPV4_FAILURE_FATAL=yes

#IPV6_AUTOCONF=yes

#IPV6_DEFROUTE=yes

#IPV6_FAILURE_FATAL=no

NAME="System eth0"

HWADDR=00:0C:29:92:E5:B7?#MAC地址需要和/etc/udev/rules.d/70-persistent-net.rules?文件中一致

HWADRR=000C:29:92:E5:B7?#MAC needs and is consistent with /etc/dev/rules.d/70-persistent-net.rules? in document /span>

#IPV6_PEERDNS=yes

#IPV6_PEERROUTES=yes

LAST_CONNECT=1397965330

?

sudo service network restart?(如果sudo命令无法使用设置方法详见附录1

service network start?(if /span command cannot use the setup method as specified in appendix //span> 1 >/span > )

Hadoop2.4编译 伪分布安装 集群安装笔记

1.3.4.1.解决方法1

删除rm /etc/udev/rules.d/70-persistent-net.rules

delete rm/etc/udev/rules.d/70-persistent-net.rules

然后重启虚拟机,再重启网卡

then reboot the virtual machine and reboot the net card

1.3.4.2.解决方法2

查看/etc/udev/rules.d/70-persistent-net.rules文件的内容,有时候这个文件中会有两个SUBSYSTEM配置,删除NAME="eth*"?只保留NAME="eth0"?的配置,然后在重启网卡

/etc/dev/rules.d/70-persistent-net.rules<, which sometimes contains two configurations SUBSYSTEM, deletes NAME=eth*?, which only retains the configuration

Hadoop2.4编译 伪分布安装 集群安装笔记

1.3.4.3.解决方法3

查看/etc/udev/rules.d/70-persistent-net.rules文件的内容,如果只有一个配置?NAME=“ETH1”,并且/etc/sysconfig/network-scripts/ifcfg-eth0中的配置都没问题,使用下面命令

/etc/dev/rules.d/70-persistent-net.rules, if only one configurationNAME= "ETH1", and /etc/sysconfig/network-scripts/ifcfg-eth0, use the following command. ]

mv /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-eth1

然后重启网卡

and then reboot the net card

Ps:我遇到过的问题就么多,如果遇到其他问题以上方法都不能搞定就找度娘吧,我也没辙了。

Ps: I've had a lot of problems. If I can't do more than that, I'll find Doo.

这个文件中配置ip地址和主机名的映射关系,需要把集群中的所有机器的ip和主机名映射都加进来

This file is configured with the address and host name of . The host name map of all machines in the cluster /span and the hostname need to be added to the map of all machines in the cluster /span'span'

Hadoop2.4编译 伪分布安装 集群安装笔记

生成公钥和私钥

Generate public and private keys

ssh-keygen -t rsa

进入 ~/.ssh?目录

Enter the /.ssh? directory

cat id_rsa.pub >> authorized_keys

然后使用ssh localhost命令测试ssh是否配置正确

Then use the ssh localhost command to test if ssh is configured correctly.

候会出ssh免密的情况,可以按照下面方法

will be released at now ssh >. /span > >. /span > lost /span >. >. /span < span > can be followed as /span >. /span >.

sudo chmod 644 ~/.ssh/authorized_keys

sudo chmod 700 ~/.ssh

Ubuntu测试ssh使用ssh localhost?命令,出现错误提示connect to host localhost port 22:Connection refused

Ubuntu testsshuse command ssh localhost? for error hint to host localhostport 22:Connection recall

?造成错误的原因可能是ssh-server未安装或者未启ubuntu 11.10?安装openssh-client,但是木有安装server

this error > may be due to the fact that has not been installed or has not been activated >. installation , but wood has been installed server/span >

?运行?ps -e | grep ssh看是否有sshd

Run ? , See if there's a /span >shd > /span > /span > /span > > /span >.

?如果没有,server没启,通?/etc/init.d/ssh -start?server程,如果提示ssh不存在 那么就是没安装server

If not, says >server without moving , over ?/etc/init.d/ssh-start? > /span /span > > /span , if the hint /span does not exist, then no spanserver /span is installed /span > /span >

??sudo apt-get install openssh-server命令安装即可

past ? command.

oracle的官网下载最新版的JDK,需要选择linux版的(注意选择是64位和32位版本,视服务器而定)

downloads the latest edition of JDK from the officials' network. selects the versions of linux (note that the options are 64 and 32, depending on the server)

Jdk下载地址

Jdk download address

http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

Hadoop2.4编译 伪分布安装 集群安装笔记

个人习惯用tar.gz包安装

.gz

下载好后把jdk文件上传到服务器,也可以直接在服务器上执行下载

Downloads and uploads of jdk files to the server or downloads directly on the server

Wget?http://download.oracle.com/otn-pub/java/jdk/8u5-b13/jdk-8u5-linux-x64.tar.gz

1.6.2.1.解压

tar –zxvfjdk-8u5-linux-x64.tar.gz

1.6.2.2.配置环境变量

(因为真实的现网环境会有很多个用户,每个用户可能需要使用不同的jdk版本所以jdk配置只需要当前用户生效即可)

( ) Since there are many users in the real current web environment, each user may need to use a different version of jdk jdk configuration only needs to be effective for the current user )

vi ~/.bashrc

export JAVA_HOME=/home/hadoop/software/jdk1.7.0_51

CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

export PATH=$JAVA_HOME/bin: $PATH

1.6.2.3.验证是否安装成功

Java –version

如果如下图所示能正确显示版本号即为安装成功

Hadoop2.4编译 伪分布安装 集群安装笔记

官方提示Maven版本要在3.0以上,尽量安装最新版的(下面是编译hadoop的软件配置)

Official reminder Maven 3.0 >/span > to install, as far as possible, the most up-to-date version (compiled software configuration for /span hadop /span )

JDK 1.6+

* Maven 3.0 or later

* Findbugs 1.3.9 (if running findbugs)

* ProtocolBuffer 2.5.0

* CMake 2.6 or newer

下载maven

Download mavan

Wget http://mirrors.hust.edu.cn/apache/maven/maven-3/3.2.1/binaries/apache-maven-3.2.1-bin.tar.gz

1.7.2.1.修改仓库地址

${ MAVEN_HOME }/conf/settings.xml文件中找到mirrors节点在中间加入?下面红色部分配置(默认如果不修改的话是使用国外的中央仓库,hadoop在编译的时候maven会下载很多jar文件,国外的中央仓库下载比较慢,这里使用开源中国提供的maven仓库,下载速度比较快一点)

found in document $avan}/conf/settings.xml found in document /span> in the middle to add under the red part configuration under (implicitly using foreign central warehouses, hadop when compiled Maven will download a lot of documents /spanjar outside central warehouses, which are slow to download, using Chinese sources, Mavenspanspanspanspanspan

?nexus-osc

?*

?Nexus osc

?http://maven.oschina.net/content/groups/public/

1.7.2.2.修改settings.xml中本地仓库目录地址

找到localRepository节点中间的地址改为存放mavne下载文件的目录地址(目录需提前建立,如果不配置此项,默认现在的文件存放在当前用户home目录下的?.me文件夹中)

The directory containing the downloads of mavne will need to be created in advance. If this is not configured, default that the current document is stored in under the current user home folder?.me)

localRepository>/home/Hadoop/repo

1.7.2.3.配置环境变量

?.bashrc?文件中加入以下配置

Add the following configuration to the file: ..bashrc?

export MAVEN_HOME=/home/hadoop/software/apache-maven-3.1.1

export PATH=$MAVEN_HOME/bin:$PATH

1.7.2.4.验证是否安装成功

mvn –version

如果如下图所示能正确显示版本号即为安装成功

Hadoop2.4编译 伪分布安装 集群安装笔记

1.8.1.1.下载解压

下载地址https://code.google.com/p/protobuf/downloads/list

Downloaded address https://code.google.com/p/protobuf/downloads/list

Wget?https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz

Tar –zxvf protobuf-2.5.0.tar.gz

1.8.1.2.安装

https://www.php.cn/configure --prefix=/home/hadoop/software/protobuf

Prefix?参数用于指定安装路径

Prefix? parameters for specifying the installation path

make

ake install

1.8.1.3.配置环境变量

export PROTOC_HOME=/home/hadoop/software/protobuf

PATH=$PROTOC_HOME/bin:$PATH

1.8.1.4.验证是否安装成功

protoc –version

如果如下图能正确显示版本号?即为安装成功

?

Hadoop2.4编译 伪分布安装 集群安装笔记

下载地址:http://www.cmake.org/cmake/resources/software.html

Downloaded address:

wget http://221.178.162.232/data4/06/f1/4c/96/df/b6/c5/4e/70/41/4c/48/c4/05/aa/88/cmake-2.8.12.2.tar.gz

解压tar –zxvf cmake-2.8.12.2.tar.gz

Unpressure tar ~zxvf cmake-2.8.12.2.tar.gz

https://www.php.cn/bootstrap –prefix=/home/hadoop/software/cmake

make

make?install

1.9.2.1.配置环境变量

export CMAKE_HOME=/home/hadoop/software/cmake

PATH=$ CMAKE_HOME/bin:$PATH

1.9.2.2.验证是否安装成功

cmake –version

如果如下图能正确显示版本号?即为安装成功

?

Hadoop2.4编译 伪分布安装 集群安装笔记

yum install openssl-devel

yum install ncurses-devel

?

下载地址:http://apache.fayea.com/apache-mirror/hadoop/common/hadoop-2.4.0/

Downloaded address: http://apache.fayea.com/apache-mirror/hadoop/common/hadop-2.4.0/

Wget http://apache.fayea.com/apache-mirror/hadoop/common/hadoop-2.4.0/hadoop-2.4.0-src.tar.gz

解压

/span'

Tar –zxvf hadoop-2.4.0-src.tar.gz

目录/usr/local/hadoop-2.4.0-src中,执行命令

In /usr/local/hadop-2.4.0-src, execute command

mvn package -DskipTests -Pdist,native

漫长的等待之后就是见证奇迹的时刻了(maven编译结束提示编译成功的时候),编译后的代码在/usr/local/hadoop-2.2.0-src/hadoop-dist/target下面

After a long wait, >, the code compiled under /usr/local/hadop-2.2.0-src/hadoop-dist/target

验证编译完是不是64位的,我的虚拟机是32位的 所以执行后显示32位,

verifies whether the compiler is 64, and my virtual machine is 32, so the execution shows 32, .

file lib/native/libh

Hadoop2.4编译 伪分布安装 集群安装笔记

进入cd hadoop-2.4.0/etc/hadoop/

cd hadlop-2.4.0/etc/hadop/

给所有的.sh?文件增加可执行权限

Add executable privileges to all documents .sh?

Chmod +x *.sh

vi hadoop-env.sh

找到

/span'

export JAVA_HOME=${JAVA_HOME}

修改为

Amend to read

export JAVA_HOME=/home/hadoop/software/jdk1.7.0_51

configuration节点中增加

Add to the node

property>

?name>fs.defaultFSname>

?value>hdfs://hadoop1:9000value>

property>

?

property>

?name>hadoop.tmp.dirname>

?value>/home/hadoop/hadoop/hadoop-2.4.0/data/tmpvalue>

property>

mv mapred-site.xml.template mapred-site.xml

vi mapred-site.xml

configuration节点中增加

Add to the node

property>

name>mapreduce.framework.namename>

value>yarnvalue>

property>

configuration节点中增加

Add to the node

property>

name>dfs.replicationname>

value>1value>

property>

configuration节点中增加

Add to the node

property>

name>yarn.nodemanager.aux-servicesname>

value>mapreduce_shufflevalue>

property>

?

property>

name>yarn.resourcemanager.addressname>

value>hadoop1:8032value>

property>

?

property>

name>yarn.resourcemanager.resource-tracker.addressname>

value>hadoop1:8031value>

property>

?

property>

name>yarn.resourcemanager.admin.addressname>

value>hadoop1:8033value>

property>

?

property>

name>yarn.resourcemanager.scheduler.addressname>

value>hadoop1:8030value>

property>

?

property>

name>yarn.web-proxy.addressname>

value>hadoop1:8888value>

property>

?

property>

name>yarn.nodemanager.aux-services.mapreduce.shuffle.classname>

value>org.apache.hadoop.mapred.ShuffleHandlervalue>

property>

Hadoop2.4编译 伪分布安装 集群安装笔记Hadoop2.4编译 伪分布安装 集群安装笔记Hadoop2.4编译 伪分布安装 集群安装笔记Hadoop2.4编译 伪分布安装 集群安装笔记Hadoop2.4编译 伪分布安装 集群安装笔记

1.13.7.1.格式化namenode

bin/hadoop namenode –format

1.13.7.2.启动伪分布式集群

sbin/start-all.sh

查看是否启动成功

to see if it's started successfully

jps

Hadoop2.4编译 伪分布安装 集群安装笔记

1.13.7.3.执行常用Shell命令测试集群

bin/hadoop fs –ls /

1.13.7.4.使用浏览器访问查看集群状态

http://hadoop1:50070

其中hadoop1namenodehostname或者直接使用ip也可以

of hadop1of namenodeof hostname or direct use of spanipof

Hadoop2.4编译 伪分布安装 集群安装笔记

查看resourcemanagercluster运行状态

views the active status of resourcemanager on . ]

http://hadoop1:8088/cluster

其中hadoop1resourcemanagerhostname或者直接使用ip也可以(这里是伪分布模式namenoderesourcemanager都在同一台机器上运行,所以都直接用当前这个虚拟机的主机名就可以访问)

/[span>hadop1/[span>/[span>/span>/[span>/[span]hostname or ip/[span>/(span>)/(span>name/span>/(span>)/(span>)/[span>/[span>/span>/[span>and /span>/resourcemanager/[span]/[span]/[span]/[span]/[span]/[span]/[span]/[span]/[span]/[span]/[spans]/[spans]/spans/[spans]/[spans]/[spans]/spans/s/spans/[spanspans]/[s]/[s/spanspans/[s/spanspansp]/[s/[sp]/[sp]/[sp]/[sp]/[s/[sp]/[s/[sp]/[sp]/[s/[s/[sp]/[sp]/[s/[s/[s/[s/[sp]/[sp]/[s/[s/[sp]

Hadoop2.4编译 伪分布安装 集群安装笔记

hadoop中的NameNode好比是人的心脏,非常重要,绝对不可以停止工作。在hadoop1时代,只有一个NameNode。如果该NameNode数据丢失或者不能工作,那么整个集群就不能恢复了。这是hadoop1中的单点问题,也是hadoop1不可靠的表现,如图1所示。

In the era of hadop1, there is only one name Node. If the data of NameNode are lost or unable to work, the whole cluster cannot be restored. This is a single issue in hadop1 and an unreliable performance of hadop1, as shown in figure 1.

Hadoop2.4编译 伪分布安装 集群安装笔记

1

> >

?

hadoop2就解决了这个问题。hadoop2.2.0HDFS的高可靠指的是可以同时启动2NameNode。其中一个处于工作状态,另一个处于随时待命状态。这样,当一个NameNode所在的服务器宕机时,可以在数据不丢失的情况下,手工或者自动切换到另一个NameNode提供服务。

hadop2 solves the problem. The high reliability of hadop2.0 means that 2 of NameNode can be started at the same time. One of them is working and the other is on standby. This way, when a server in which NameNode is installed, manual or span automatic /span'span > to another nameNode /span > provides services without loss of data.

?

这些NameNode之间通过共享数据,保证数据的状态一致。多个NameNode之间共享数据,可以通过Network File System或者Quorum Journal Node。前者是通过linux共享的文件系统,属于操作系统的配置;后者是hadoop自身的东西,属于软件的配置。

these NameNode share data to ensure consistency in the status of the data. Multiple NameNode share data through Network File System or Quorum Journal Node. The former is a file system shared through linux, which is the configuration of the operating system; the latter is hadop

?

我们这里讲述使用Quorum Journal Node的配置方式,方式是手工切换。

We're talking about how to use the configuration of Quorum Journal Node

?

集群启动时,可以同时启动2NameNode。这些NameNode只有一个是active的,另一个属于standby状态。active状态意味着提供服务,standby状态意味着处于休眠状态,只进行数据同步,时刻准备着提供服务,如图2所示。

starts at the same time as 2 (span)NameNode. These NameNode have only one name active and the other is standby. active states mean service provision, standby states mean hibernation, data synchronization only and ready to provide services at all times, as shown in figure 2.

Hadoop2.4编译 伪分布安装 集群安装笔记

2

2

?

在一个典型的HA集群中,每个NameNode是一台独立的服务器。在任一时刻,只有一个NameNode处于active状态,另一个处于standby状态。其中,active状态的NameNode负责所有的客户端操作,standby状态的NameNode处于从属地位,维护着数据状态,随时准备切换。

in a typical HA cluster, each NameNode is a stand-alone server. At any given time, only NameNode is in active status, and the other is in standby status. In which activeNameNode is responsible for all client operations, standby status, NameNode is in a subordinate status, maintains data status and is ready to switch.

?

两个NameNode为了数据同步,会通过一组称作JournalNodes的独立进程进行相互通信。当active状态的NameNode的命名空间有任何修改时,会告知大部分的JournalNodes进程。standby状态的NameNode有能力读取JNs中的变更信息,并且一直监控edit log的变化,把变化应用于自己的命名空间。standby可以确保在集群出错时,命名空间状态已经完全同步了,如图3所示。

NameNode communicates with each other for data synchronization through an independent process called JournalNodes. When changes are made to the namespace of NameNode in the state of active, most of the standby states are capable of reading changes in and have been monitoring changes in edit log/span' and applying them to their own naming spaces. standby ensures that when mistakes occur in the cluster, the naming space status is fully synchronized, as shown in .

Hadoop2.4编译 伪分布安装 集群安装笔记

Empr.5588

3

3

?

为了确保快速切换,standby状态的NameNode有必要知道集群中所有数据块的位置。为了做到这点,所有的datanodes必须配置两个NameNode的地址,发送数据块位置信息和心跳给他们两个。

In order to ensure a quick switch, standby state of NameNode needs to know the location of all data blocks in the cluster. To do this, all datanodes must be equipped with two addresses NameNode to send them data block location information and heart beats.

?

对于HA集群而言,确保同一时刻只有一个NameNode处于active状态是至关重要的。否则,两个NameNode的数据状态就会产生分歧,可能丢失数据,或者产生错误的结果。为了保证这点,JNs必须确保同一时刻只有一个NameNode可以向自己写数据。

is essential for the HA cluster to ensure that at the same time there is only one NameNode in active status. Otherwise, the data status of two NameNode will be divisive and may lose data or produce erroneous results. To ensure this, JNs must ensure that at the same time there is only one NameNode to write data to oneself.

主机名

Hostname

Ip

NameNode

ResourceManager

JournalNode

DataNode

NodeManager

Hadoop1

192.168.1.201

is

is

is

is

?

Hadoop2

192.168.1.202

is

is

is

is

?

Hadoop3

192.168.1.203

is

is

is

is

?

vi hadoop-env.sh

找到

/span'

export JAVA_HOME=${JAVA_HOME}

修改为

Amend to read

export JAVA_HOME=/home/hadoop/software/jdk1.7.0_51

configuration节点中增加

Add to the node

这种方法配置简单,推荐使用。

This method is simple and recommended.

fs.defaultFS?客户端连接HDFS时,默认的路径前缀。如果前面配置了nameservice ID的值是mycluster,那么这里可以配置为授权信息的一部分。

The default path prefix when the client connects to HDFS. If the value of nameservation ID is prefixed to mycluster, then it can be configured as part of the authorized information.

可以在core-site.xml中配置如下

can be configured in core-site.xml as

?fs.defaultFS

hdfs://mycluster

?hadoop.tmp.dir

?/home/hadoop/hadoop/hadoop-2.4.0/data/tmp

dfs.journalnode.edits.dir?这是JournalNode进程保持逻辑状态的路径。这是在linux服务器文件的绝对路径。配置如下

This is the path to the logical state of the JournalNode process. This is the absolute path of the server file at linux.

?dfs.journalnode.edits.dir

?/home/hadoop/hadoop/hadoop-2.4.0/data/journal

mv mapred-site.xml.template mapred-site.xml

vi mapred-site.xml

configuration节点中增加

Add to the node

property>

name>mapreduce.framework.namename>

value>yarnvalue>

property>

configuration节点中增加

Add to the node

配置集群中的副本数

Number of copies in the configuration cluster

dfs.replication

2

dfs.nameservices?命名空间的逻辑名称。如果使用HDFS Federation,可以配置多个命名空间的名称,使用逗号分开即可。

dfs.nameservices? name of the logical name of the naming space. If you use HDFS Regulation, you can configure names of multiple named spaces and separate with comma.

dfs.nameservices

mycluster

dfs.ha.namenodes.[nameservice ID]?命名空间中所有NameNode的唯一标示名称。可以配置多个,使用逗号分隔。该名称是可以让DataNode知道每个集群的所有NameNode。当前,每个集群最多只能配置两个NameNode

dfs.ha.names.[namesservice ID]?() The only name for all NameNode naming spaces can be configured several times and separated by comma. The name can be used to let DataNode know all NameNode for each cluster. For the time being, each cluster can only have two NameNode.

?

?dfs.ha.namenodes.mycluster

?hadoop1,hadoop2

?

dfs.namenode.rpc-address.[nameservice ID].[name node ID]?每个namenode监听的RPC地址。如下所示

dfs.namenode.rpc-address.[nameservation ID]. [name none ID] ? each listening namenode.

?

?dfs.namenode.rpc-address.mycluster.hadoop1

?hadoop1:8020

?

?

?dfs.namenode.rpc-address.mycluster.hadoop2

?hadoop2:8020

?

dfs.namenode.http-address.[nameservice ID].[name node ID]?每个namenode监听的http地址。如下所示

dfs.namenode.http-address.[nameservation ID].[name name node ID] ? each listening namenodehttp/span>.

?

?dfs.namenode.http-address.mycluster.hadoop1

?hadoop1:50070

?

?

?dfs.namenode.http-address.mycluster.hadoop2

?hadoop2:50070

?

dfs.namenode.shared.edits.dir?这是NameNode读写JNs组的uri。通过这个uriNameNodes可以读写edit log内容。URI的格式"qjournal://host1:port1;host2:port2;host3:port3/journalId"。这里的host1host2host3指的是Journal Node的地址,这里必须是奇数个,至少3个;其中journalId是集群的唯一标识符,对于多个联邦命名空间,也使用同一个journalId。配置如下

. Through this , can read ? This is NameNode to read and write JNs to read ; host2port2; host3/journalId's , where to read and write >.

?

?dfs.namenode.shared.edits.dir

?qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/mycluster

?

dfs.client.failover.proxy.provider.[nameservice ID]?这里配置HDFS客户端连接到Active NameNode的一个java

[nameservation ID] ? Setup of HDFS >

?

?dfs.client.failover.proxy.provider.mycluster

?org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

?

?

?dfs.ha.automatic-failover.enabled.mycluster

?false

?

dfs.ha.fencing.methods?配置active namenode出错时的处理类。当active namenode出错时,一般需要关闭该进程。处理方式可以是ssh也可以是shell

dfs.ha.fencing.methods configures the type of treatment active namenode. When active namenode makes a mistake, the process usually needs to be closed.

如果使用ssh,配置如下

If ssh is used, configure as follows:

?

?dfs.ha.fencing.methods

?sshfence

?

?

?dfs.ha.fencing.ssh.private-key-files

?/home/hadoop/.ssh/id_dsa

?

configuration节点中增加

Add to the node

yarn.nodemanager.aux-services

?mapreduce_shuffle

?yarn.resourcemanager.hostname

hadoop3

以下是我写的一些脚本可以方便快速启动集群,还不是很智能化,增加节点或者主机名不一样都要重新修改脚本

Below is some scripts that I've written to facilitate quick start of the cluster. It's not very intelligent. Adding nodes or hostnames makes changes to scripts

initmycluster.sh主要实现了?初始化集群的功能(把hadoop1节点的配置文件都同步到hadoop2hadoop3上、删除日志文件、namenode元数据、删除datanote块数据,格式话两个namenode,然后启动集群,比较危险慎用

mainly achieved the functions of initial clustering , delete log files, namenode meta-data, datanote data in two formats /span'hadop2, and then start clustering with danger.

startmycluster.sh主要是按顺序启动集群中的服务

startmycluster.sh mainly starts services in clusters in sequence

stopmycluster.sh主要是停止集群中的所有服务

stopmycluster.sh is mainly to stop all services in clusters

Hadoop2.4编译 伪分布安装 集群安装笔记Hadoop2.4编译 伪分布安装 集群安装笔记Hadoop2.4编译 伪分布安装 集群安装笔记

initmycluster文件配置详解

Cannot initialise Evolution's mail component.

#hadoop1上的配置文件都同步到hadoop2hadoop3

Synchronize all configuration files on hadop1 to hadop2 and hadop3

ssh hadoop1 'scp /home/hadoop/hadoop/hadoop-2.4.0/etc/hadoop/*hadoop2:/home/hadoop/hadoop/hadoop-2.4.0/etc/hadoop/'

ssh hadoop1 'scp /home/hadoop/hadoop/hadoop-2.4.0/etc/hadoop/*hadoop3:/home/hadoop/hadoop/hadoop-2.4.0/etc/hadoop/'

?

#清空初始化文件和日志文件

> emptied initialized documents and log files

ssh hadoop1 'rm -rf /home/hadoop/hadoop/hadoop-2.4.0/data/tmp/*'

ssh hadoop2 'rm -rf /home/hadoop/hadoop/hadoop-2.4.0/data/tmp/*'

ssh hadoop3 'rm -rf /home/hadoop/hadoop/hadoop-2.4.0/data/tmp/*'

ssh hadoop1 'rm -rf /home/hadoop/hadoop/hadoop-2.4.0/logs/*'

ssh hadoop2 'rm -rf /home/hadoop/hadoop/hadoop-2.4.0/logs/*'

ssh hadoop3 'rm -rf /home/hadoop/hadoop/hadoop-2.4.0/logs/*'

?

#启动集群中的?journalnode

Starts clustering ?

ssh hadoop1 '/home/hadoop/hadoop/hadoop-2.4.0/sbin/hadoop-daemon.sh start journalnode'

ssh hadoop2 '/home/hadoop/hadoop/hadoop-2.4.0/sbin/hadoop-daemon.sh start journalnode'

ssh hadoop3 '/home/hadoop/hadoop/hadoop-2.4.0/sbin/hadoop-daemon.sh start journalnode'

#启动hadoo1上的?namenode1

starts on hadoo1>/span>?namenode1

ssh hadoop1 '/home/hadoop/hadoop/hadoop-2.4.0/bin/hdfs namenode -format -clusterId mycluster'

ssh hadoop1 '/home/hadoop/hadoop/hadoop-2.4.0/sbin/hadoop-daemon.sh start namenode'

#s启动hadoop2上的?namenode2

starts on hadop2?namenode2

ssh hadoop2 '/home/hadoop/hadoop/hadoop-2.4.0/bin/hdfs namenode -bootstrapStandby'

sleep 10

ssh hadoop2 '/home/hadoop/hadoop/hadoop-2.4.0/sbin/hadoop-daemon.sh start namenode'

sleep 10

#hadoop1上的namenode置为active状态

replace hadop1 >/span>namenode with activestate

ssh hadoop1 '/home/hadoop/hadoop/hadoop-2.4.0/bin/hdfs haadmin -failover --forceactive hadoop2 hadoop1'

#启动?datanode

Start ? datanode

ssh hadoop1 '/home/hadoop/hadoop/hadoop-2.4.0/sbin/hadoop-daemons.sh start datanode'

#启动?yarn(包括hadoop3上的ResourceManagernodemanagerhadoop1hadoop2上的nodemanager

starts on ?yarn (including hadop3) and nodemanager, , hadop1, and hadop2/span>nodemanager, , hadop1, and hadoop2

ssh hadoop3 '/home/hadoop/hadoop/hadoop-2.4.0/sbin/start-yarn.sh'

ssh hadoop1 '/home/hadoop/hadoop/hadoop-2.4.0/bin/hdfs haadmin -failover --forceactive hadoop2 hadoop1'

可以吧hadoop1?上的namenode切换为active状态,

to switch from to activestate, . ]

结束hadoop1上的namenode?进程然后执行

finishes the process on /hadop1 >/span>namenode? and then executes

ssh hadoop1 '/home/hadoop/hadoop/hadoop-2.4.0/bin/hdfs haadmin -failover --forceactive hadoop1 hadoop2'

可以吧hadoop2?上的namenode切换为active状态,

to switch from to activestate, . ]

然后在重新启动hadoop1上的namenode?还是可以自动加入到集群中

and then reboot on hadop1. namenode? can still automatically be added to the cluster .

visudo -f /etc/sudoers

root ALL=(ALL) ALL?之后增加

After , after root ALL=(ALL) ALL?, insert

HadoopALL=(ALL) ALL

#hadoop为需要使用sudo命令的用户名

#hadop is the username that needs to use the command/span

Defaults:hadoop?timestamp_timeout=-1,runaspw

//增加普通账户tomsudo权限

// Adds permission to to to regular accounts

//timestamp_timeout=-1?只需验证一次密码,以后系统自动记忆

//timesamp_timeout=1? only needs to verify the password once and the system will automatically remember .

//runaspw需要root密码,如果不加默认是要输入普通账户的密码

/runaspw requires a password for root, without default to enter a password for a normal account

Ubuntu用户可以把需要用使用sudo命令的用户加入sudo

Ubuntu may add the user who needs to use the command to group /span. ]

sudo adduser hadoop sudo

useradd -d /home/hadoop -s /bin/bash -m hadoop

passwd hadoop

Chmod +x *.sh

:%s/vivian/sky/(等同于?:g/vivian/s//sky/?替换每一行的第一个?vivian??sky

: %s/vivian/sky/ (equivalent to ?: g/vivian/s/sky/) replaces the first of each line with ?

:%s/vivian/sky/g(等同于?:g/vivian/s//sky/g?替换每一行中所有?vivian??sky

: %s/vivian/sky/g (equivalent to ?:g/vivian/s/sky/g) replaces all of each line with span?

.执行命令?sudo apt-get remove vim-common
.执行命令?sudo apt-get install vim

I/span>. execute orders ? sudo apt-get remove vim-common
/span>ii/span>. execute orders ? sudo apt-get install vim

?

#/sbin/iptables?-I?INPUT?-p?tcp?--dport?80?-j?ACCEPT

#/sbin/iptables?-I?INPUT?-p?tcp?--dport?22?-j?ACCEPT

#/etc/rc.d/init.d/iptables?save?

美化布局示例

欧易(OKX)最新版本

【遇到注册下载问题请加文章最下面的客服微信】永久享受返佣20%手续费!

APP下载   全球官网 大陆官网

币安(Binance)最新版本

币安交易所app【遇到注册下载问题请加文章最下面的客服微信】永久享受返佣20%手续费!

APP下载   官网地址

火币HTX最新版本

火币老牌交易所【遇到注册下载问题请加文章最下面的客服微信】永久享受返佣20%手续费!

APP下载   官网地址
文字格式和图片示例

注册有任何问题请添加 微信:MVIP619 拉你进入群

弹窗与图片大小一致 文章转载注明

分享:

扫一扫在手机阅读、分享本文

发表评论
平台列表
美化布局示例

欧易(OKX)

  全球官网 大陆官网

币安(Binance)

  官网

火币(HTX)

  官网

Gate.io

  官网

Bitget

  官网

deepcoin

  官网
热门文章
  • 美国可以使用什么加密货币交易所_前10名比特币交易平台名单

    美国可以使用什么加密货币交易所_前10名比特币交易平台名单
    美国可以使用什么加密货币交易所?What is an encrypted currency exchange that the United States can use?随着加密货币的普及,加密货币交易所也越来越多。对于美国的加密货币交易者而言,选择一个可靠、安全且易于使用的交易所是十分重要的事情。那么,美国可以使用哪些加密货币交易所呢?以下是前10名比特币交易平台名单:The choice of a reliable, secure, and easy-to-use...
  • 2017比特币已经挖出,比特币是什么?

    2017比特币已经挖出,比特币是什么?
    20 17比特币已经挖出了目录。20 17比特币被挖出来了。比特币是什么?比特币在20 17达到挖矿上限。在20 17中,比特币达到了挖矿上限,这意味着比特币的总数达到了2 1 100,000。比特币挖矿会越来越少。由于比特币挖矿的奖励每四年减半一次,未来比特币挖矿的数量将减少,比特币的价格可能会上涨。影响比特币市场的主要因素。除了挖矿上限和奖励减半之外,比特币市场还受到供需、投资者情绪和监管政策等因素的影响。比特币的未来前景。随着全球对比特币的认知度越来越高,比特币的前景仍...
  • 2015年宝马3系二手车价格多少钱

    2015年宝马3系二手车价格多少钱
    爱你一生不变心6558 2022-04-28 13:20:03 二手车能不能买主要看车况,如果车况好就可以考虑,车况不好价格再低也别考虑,像2015款宝马3系目前行情价是在18.32万这样,如果低...
  • Coinbase:财富100强企业过去一年的Web3采用率增长39%

    Coinbase:财富100强企业过去一年的Web3采用率增长39%
    随着比特币、以太坊等主流加密货币在web2世界的认可度变得越来越高,许多web2传统公司也积极拥抱web3,加速采用和开发区块链产品的速度和应用层面。With ˂a style= "max-width": 90%" href=https://m.php.cn/zt/21172.html" target="_blank" bitcoin, ˂a style="color:#f60; text-decoration: unde...
  • 区块链技术原理(转载)

    区块链技术原理(转载)
    转自:https://cloud.tencent.com/developer/article/1838661From: https://clud.tencent.com/development/article/1838661 本文主要是对区块链进行概念分析和组成技术解析,从哈希运算、数字签名、共识算法、智能合约、P2P网络等技术在区块链中的应用进行综合分析This paper focuses on conceptual analysis and technical com...
标签列表