[转帖]hadoop 0.20 布署的补充_Hadoop,ERP及大数据讨论区_Weblogic技术|Tuxedo技术|中间件技术|Oracle论坛|JAVA论坛|Linux/Unix技术|hadoop论坛_联动北方技术论坛  
网站首页 | 关于我们 | 服务中心 | 经验交流 | 公司荣誉 | 成功案例 | 合作伙伴 | 联系我们 |
联动北方-国内领先的云技术服务提供商
»  游客             当前位置:  论坛首页 »  自由讨论区 »  Hadoop,ERP及大数据讨论区 »
总帖数
1
每页帖数
101/1页1
返回列表
0
发起投票  发起投票 发新帖子
查看: 3344 | 回复: 0   主题: [转帖]hadoop 0.20 布署的补充        下一篇 
huizai
注册用户
等级:少校
经验:933
发帖:83
精华:0
注册:2013-6-18
状态:离线
发送短消息息给huizai 加好友    发送短消息息给huizai 发消息
发表于: IP:您无权察看 2013-6-26 10:01:00 | [全部帖] [楼主帖] 楼主

最近在自己的局域网中又全新安装了hadoop0.20版本,感觉和0.19版本还是有一些变化的。

20版原包中默认取消了hadoop-default.xml配置文件,取而代之的是三个配置文件:

core-site.xml
mapred-site.xml
hdfs-site.xml


默认的这三个文件都是空的,也就是说,这些配置的全局默认值已经在代码中写死了,我们在配置文件中写的是和默认值不同的选项,会覆盖默认选项。

不同的配置选项要放在相应的文件中,不能放错地方。

hadoop 0.20官方英文文档中告诉我们了该怎么写(注意,是英文文档,20版提供了中文文档,但是里面的内容都是旧内容,我就是看了中文文档所以走了不少弯路),参考:http://hadoop.apache.org/common/docs/r0.20.2/cluster_setup.html

原文摘录如下:

This section deals with important parameters to be specified in the following:
conf/core-site.xml :


ParameterValueNotes
fs.default.nameURI of NameNode .hdfs://hostname/

conf/hdfs-site.xml :


ParameterValueNotes
dfs.name.dir Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.data.dir Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices.

conf/mapred-site.xml :


ParameterValueNotes
mapred.job.trackerHost or IP and port of JobTracker .host:port pair.
mapred.system.dir Path on the HDFS where where the Map/Reduce framework stores system files e.g. /hadoop/mapred/system/ . This is in the default filesystem (HDFS) and must be accessible from both the server and client machines.
mapred.local.dir Comma-separated list of paths on the local filesystem where temporary Map/Reduce data is written. Multiple paths help spread disk i/o.
mapred.tasktracker.{map reduce}.tasks.maximum The maximum number of Map/Reduce tasks, which are run simultaneously on a given TaskTracker , individually. Defaults to 2 (2 maps and 2 reduces), but vary it depending on your hardware.
dfs.hosts/dfs.hosts.excludeList of permitted/excluded DataNodes. If necessary, use these files to control the list of allowable datanodes.
mapred.hosts/mapred.hosts.excludeList of permitted/excluded TaskTrackers. If necessary, use these files to control the list of allowable TaskTrackers.
mapred.queue.namesComma separated list of queues to which jobs can be submitted. The Map/Reduce system always supports atleast one queue with the name as default . Hence, this parameter's value should always contain the string default . Some job schedulers supported in Hadoop, like the Capacity Scheduler , support multiple queues. If such a scheduler is being used, the list of configured queue names must be specified here. Once queues are defined, users can submit jobs to a queue using the property name mapred.job.queue.name in the job configuration. There could be a separate configuration file for configuring properties of these queues that is managed by the scheduler. Refer to the documentation of the scheduler for information on the same.
mapred.acls.enabledSpecifies whether ACLs are supported for controlling job submission and administration If true , ACLs would be checked while submitting and administering jobs. ACLs can be specified using the configuration parameters of the form mapred.queue.queue-name.acl-name , defined below.
mapred.queue.queue-name .acl-submit-jobList of users and groups that can submit jobs to the specified queue-name . The list of users and groups are both comma separated list of names. The two lists are separated by a blank. Example: user1,user2 group1,group2 . If you wish to define only a list of groups, provide a blank at the beginning of the value.
mapred.queue.queue-name .acl-administer-jobList of users and groups that can change the priority or kill jobs that have been submitted to the specified queue-name . The list of users and groups are both comma separated list of names. The two lists are separated by a blank. Example: user1,user2 group1,group2 . If you wish to define only a list of groups, provide a blank at the beginning of the value. Note that an owner of a job can always change the priority or kill his/her own job, irrespective of the ACLs.

Typically all the above parameters are marked as  final to ensure that they cannot be overriden by user-applications.



在局域网里,每个机器往往都用用户的名字命名,如John-desktop,但是我们在分布式系统中,通常希望用master, slave001,slave002这样的命名规则来命名机器,

这样我们需要编辑/etc/hosts文件,把每一台机器希望的命名都写进去,如:

192.168.1.10     John-desktop

192.168.1.10 master

192.168.1.11 Peter-desktop

192.168.1.11 slave001

依此类推。因为在hadoop中,系统会自动取当前机器名(用hostname),这时,如果hostname不是master, slave001这样的名字,网络通信就会出问题。

以下是我的配置文件

注:我有两台机器,主机IP:192.168.1.10 从机IP:192.168.1.11

core-site.xml:

[xhtml]view plaincopyprint?

  1. <?xml version="1.0"?> 
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
  3. <!-- Put site-specific property overrides in this file. --> 
  4. <configuration> 
  5.  <property> 
  6.  <name>fs.default.name</name> 
  7.  <value>hdfs://master/</value> 
  8.  <description>The name of the default file system. A URI whose 
  9.  scheme and authority determine the FileSystem implementation. The 
  10.  uri's scheme determines the config property (fs.SCHEME.impl) naming 
  11.  the FileSystem implementation class. The uri's authority is used to 
  12.  determine the host, port, etc. for a filesystem.</description> 
  13.  </property> 
  14.  <property> 
  15.  <name>hadoop.tmp.dir</name> 
  16.  <value>/home/hadoop/hdfs</value> 
  17.  </property> 
  18. </configuration> 

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master/</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hdfs</value>
</property>

</configuration>

hdfs-site.xml:

[xhtml]view plaincopyprint?

  1. <?xml version="1.0"?> 
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
  3. <!-- Put site-specific property overrides in this file. --> 
  4. <configuration> 
  5. <property> 
  6.  <name>dfs.name.dir</name> 
  7.  <value>${hadoop.tmp.dir}/dfs/name</value> 
  8.  <description>Determines where on the local filesystem the DFS name node 
  9.  should store the name table(fsimage). If this is a comma-delimited list 
  10.  of directories then the name table is replicated in all of the 
  11.  directories, for redundancy. </description> 
  12. </property> 
  13. <property> 
  14.  <name>dfs.data.dir</name> 
  15.  <value>${hadoop.tmp.dir}/dfs/data</value> 
  16.  <description>Determines where on the local filesystem an DFS data node 
  17.  should store its blocks. If this is a comma-delimited 
  18.  list of directories, then data will be stored in all named 
  19.  directories, typically on different devices. 
  20.  Directories that do not exist are ignored. 
  21.  </description> 
  22. </property> 
  23. </configuration> 

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>dfs.name.dir</name>
<value>${hadoop.tmp.dir}/dfs/name</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table(fsimage). If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy. </description>
</property>
<property>
<name>dfs.data.dir</name>
<value>${hadoop.tmp.dir}/dfs/data</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
</description>
</property>
</configuration>

mapred-site.xml

[xhtml]view plaincopyprint?

  1. <?xml version="1.0"?> 
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
  3. <!-- Put site-specific property overrides in this file. --> 
  4. <configuration> 
  5. <property> 
  6.  <name>mapred.job.tracker</name> 
  7.  <value>192.168.1.10:9001</value> 
  8.  <description>The host and port that the MapReduce job tracker runs 
  9.  at. If "local", then jobs are run in-process as a single map 
  10.  and reduce task. 
  11.  </description> 
  12. </property> 
  13. <property> 
  14.  <name>mapred.system.dir</name> 
  15.  <value>${hadoop.tmp.dir}/mapred/system</value> 
  16.  <description>The shared directory where MapReduce stores control files. 
  17.  </description> 
  18. </property> 
  19. <property> 
  20.  <name>mapred.local.dir</name> 
  21.  <value>${hadoop.tmp.dir}/mapred/local</value> 
  22.  <description>The local directory where MapReduce stores intermediate 
  23.  data files. May be a comma-separated list of 
  24.  directories on different devices in order to spread disk i/o. 
  25.  Directories that do not exist are ignored. 
  26.  </description> 
  27. </property> 
  28. </configuration> 


<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>192.168.1.10:9001</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapred.system.dir</name>
<value>${hadoop.tmp.dir}/mapred/system</value>
<description>The shared directory where MapReduce stores control files.
</description>
</property>
<property>
<name>mapred.local.dir</name>
<value>${hadoop.tmp.dir}/mapred/local</value>
<description>The local directory where MapReduce stores intermediate
data files. May be a comma-separated list of
directories on different devices in order to spread disk i/o.
Directories that do not exist are ignored.
</description>
</property>
</configuration>

另外,

在conf/hadoop-env.sh里,要把JAVA_HOME环境变量指向JDK路境,尽管可能在.profile中已经设置过了,这里还是要设一下,不然有时会提示“没有指定JAVA_HOME"

;




赞(0)    操作        顶端 
总帖数
1
每页帖数
101/1页1
返回列表
发新帖子
请输入验证码: 点击刷新验证码
您需要登录后才可以回帖 登录 | 注册
技术讨论