[转帖]hadoop 0.20 布署的补充_Hadoop，ERP及大数据讨论区_Weblogic技术|Tuxedo技术|中间件技术|Oracle论坛|JAVA论坛|Linux/Unix技术|hadoop论坛

总帖数

每页帖数

1/1页

返回列表

发起投票

查看: 3108 | 回复: 0

主题： [转帖]hadoop 0.20 布署的补充

huizai

注册用户

等级：少校
经验：933
发帖：83
精华：0
注册：2013-6-18
状态：离线
发送短消息息给huizai

加好友发送短消息息给huizai

发消息

发表于：

2013-6-26 10:01:00 | [全部帖] [楼主帖]

楼主

最近在自己的局域网中又全新安装了hadoop0.20版本，感觉和0.19版本还是有一些变化的。

20版原包中默认取消了hadoop-default.xml配置文件，取而代之的是三个配置文件：

core-site.xml
mapred-site.xml
hdfs-site.xml

默认的这三个文件都是空的，也就是说，这些配置的全局默认值已经在代码中写死了，我们在配置文件中写的是和默认值不同的选项，会覆盖默认选项。

不同的配置选项要放在相应的文件中，不能放错地方。

hadoop 0.20官方英文文档中告诉我们了该怎么写（注意，是英文文档，20版提供了中文文档，但是里面的内容都是旧内容，我就是看了中文文档所以走了不少弯路），参考：http://hadoop.apache.org/common/docs/r0.20.2/cluster_setup.html

原文摘录如下：

This section deals with important parameters to be specified in the following:
conf/core-site.xml :

Parameter	Value	Notes
fs.default.name	URI of NameNode .	hdfs://hostname/

conf/hdfs-site.xml :

Parameter	Value	Notes
dfs.name.dir	Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.	If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.data.dir	Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.	If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices.

conf/mapred-site.xml :

Parameter	Value	Notes
mapred.job.tracker	Host or IP and port of JobTracker .	host:port pair.
mapred.system.dir	Path on the HDFS where where the Map/Reduce framework stores system files e.g. /hadoop/mapred/system/ .	This is in the default filesystem (HDFS) and must be accessible from both the server and client machines.
mapred.local.dir	Comma-separated list of paths on the local filesystem where temporary Map/Reduce data is written.	Multiple paths help spread disk i/o.
mapred.tasktracker.{map reduce}.tasks.maximum	The maximum number of Map/Reduce tasks, which are run simultaneously on a given TaskTracker , individually.	Defaults to 2 (2 maps and 2 reduces), but vary it depending on your hardware.
dfs.hosts/dfs.hosts.exclude	List of permitted/excluded DataNodes.	If necessary, use these files to control the list of allowable datanodes.
mapred.hosts/mapred.hosts.exclude	List of permitted/excluded TaskTrackers.	If necessary, use these files to control the list of allowable TaskTrackers.
mapred.queue.names	Comma separated list of queues to which jobs can be submitted.	The Map/Reduce system always supports atleast one queue with the name as default . Hence, this parameter's value should always contain the string default . Some job schedulers supported in Hadoop, like the Capacity Scheduler , support multiple queues. If such a scheduler is being used, the list of configured queue names must be specified here. Once queues are defined, users can submit jobs to a queue using the property name mapred.job.queue.name in the job configuration. There could be a separate configuration file for configuring properties of these queues that is managed by the scheduler. Refer to the documentation of the scheduler for information on the same.
mapred.acls.enabled	Specifies whether ACLs are supported for controlling job submission and administration	If true , ACLs would be checked while submitting and administering jobs. ACLs can be specified using the configuration parameters of the form mapred.queue.queue-name.acl-name , defined below.
mapred.queue.queue-name .acl-submit-job	List of users and groups that can submit jobs to the specified queue-name .	The list of users and groups are both comma separated list of names. The two lists are separated by a blank. Example: user1,user2 group1,group2 . If you wish to define only a list of groups, provide a blank at the beginning of the value.
mapred.queue.queue-name .acl-administer-job	List of users and groups that can change the priority or kill jobs that have been submitted to the specified queue-name .	The list of users and groups are both comma separated list of names. The two lists are separated by a blank. Example: user1,user2 group1,group2 . If you wish to define only a list of groups, provide a blank at the beginning of the value. Note that an owner of a job can always change the priority or kill his/her own job, irrespective of the ACLs.

Typically all the above parameters are marked as final to ensure that they cannot be overriden by user-applications.

在局域网里，每个机器往往都用用户的名字命名，如John-desktop，但是我们在分布式系统中，通常希望用master, slave001,slave002这样的命名规则来命名机器，

这样我们需要编辑/etc/hosts文件，把每一台机器希望的命名都写进去，如：

192.168.1.10 John-desktop

192.168.1.10 master

192.168.1.11 Peter-desktop

192.168.1.11 slave001

依此类推。因为在hadoop中，系统会自动取当前机器名（用hostname），这时，如果hostname不是master, slave001这样的名字，网络通信就会出问题。

以下是我的配置文件

注：我有两台机器，主机IP：192.168.1.10 从机IP：192.168.1.11

core-site.xml:

[xhtml]view plain copy print ?

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master/</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hdfs</value>
</property>
</configuration>

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master/</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hdfs</value>
</property>

</configuration>

hdfs-site.xml:

[xhtml]view plain copy print ?

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.name.dir</name>
<value>${hadoop.tmp.dir}/dfs/name</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table(fsimage). If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy. </description>
</property>
<property>
<name>dfs.data.dir</name>
<value>${hadoop.tmp.dir}/dfs/data</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
</description>
</property>
</configuration>

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<configuration>
<property>
<name>dfs.name.dir</name>
<value>${hadoop.tmp.dir}/dfs/name</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table(fsimage). If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy. </description>
</property>
<property>
<name>dfs.data.dir</name>
<value>${hadoop.tmp.dir}/dfs/data</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
</description>
</property>
</configuration>

mapred-site.xml

[xhtml]view plain copy print ?

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>192.168.1.10:9001</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapred.system.dir</name>
<value>${hadoop.tmp.dir}/mapred/system</value>
<description>The shared directory where MapReduce stores control files.
</description>
</property>
<property>
<name>mapred.local.dir</name>
<value>${hadoop.tmp.dir}/mapred/local</value>
<description>The local directory where MapReduce stores intermediate
data files. May be a comma-separated list of
directories on different devices in order to spread disk i/o.
Directories that do not exist are ignored.
</description>
</property>
</configuration>

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<configuration>
<property>
<name>mapred.job.tracker</name>
<value>192.168.1.10:9001</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapred.system.dir</name>
<value>${hadoop.tmp.dir}/mapred/system</value>
<description>The shared directory where MapReduce stores control files.
</description>
</property>
<property>
<name>mapred.local.dir</name>
<value>${hadoop.tmp.dir}/mapred/local</value>
<description>The local directory where MapReduce stores intermediate
data files. May be a comma-separated list of
directories on different devices in order to spread disk i/o.
Directories that do not exist are ignored.
</description>
</property>
</configuration>

另外，

在conf/hadoop-env.sh里，要把JAVA_HOME环境变量指向JDK路境，尽管可能在.profile中已经设置过了，这里还是要设一下，不然有时会提示“没有指定JAVA_HOME"

;

本版精华
热门帖子

操作引用/回复

总帖数

每页帖数

1/1页

返回列表

用户登录

Weblogic中间件技术论坛

Tuxedo中间件技术论坛

数据库论坛

Java论坛

Linux/unix论坛

网站地图