1. 搭建hadoop 伪分布式集群
http://hadoop.apache.org/docs/stable/single_node_setup.html
core-site.xml
[html]view plaincopy
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://localhost:9000</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/home/user/data/temp</value>
- </property>
- <property>
- <name>dfs.name.dir</name>
- <value>/home/user/data/name</value>
- </property>
- <property>
- <name>dfs.data.dir</name>
- <value>/home/user/data/data</value>
- </property>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- </configuration>
注意: 在conf/hadoop-env.sh 中修改JDK路径 export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.26
2. 开启远端监听端口
在bin/hadoop 或 conf/hadoop-env.sh 加入以下内容,一次只能开启一个, 这里ip为10.13.249.132
[plain]view plaincopy
- HADOOP_NAMENODE_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y"
- #HADOOP_SECONDARYNAMENODE_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,address=8789,server=y,suspend=y"
- #HADOOP_DATANODE_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,address=8790,server=y,suspend=y"
- #HADOOP_BALANCER_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,address=8791,server=y,suspend=y"
- #HADOOP_JOBTRACKER_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,address=8792,server=y,suspend=y"
- #HADOOP_TASKTRACKER_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,address=8793,server=y,suspend=y"
在conf/mapred-site.xml中添加以下配置项, 限制能启动task 的child数为1:
[html]view plaincopy
- <configuration>
- <property>
- <name>mapred.job.tracker</name>
- <value>localhost:9001</value>
- </property>
- <property>
- <name>mapred.child.java.opts</name>
- <value>-agentlib:jdwp=transport=dt_socket,address=8883,server=y,suspend=y</value>
- </property>
- </configuration>
此时开启hadoop,能看到namenode开启监听端口,如图:
3. eclipse java远程调试
装 eclipse 和 hadoop plugin
hadoop plugin 可以从源码编译出来 Hadoop 1.0.3/src/contrib/eclipse-plugin, 或使用聂永哥打包好的 下载地址
可用版本 eclipse3.7 + hadoop-eclipse-plugin-1.0.2 + Hadoop 1.0.3 ,
1) 配置hadoop插件
eclipse perspective -> Map/Reduce
General
Map/Reduce Master Host: localhost , Post: 9001
DFS Master Host: localhost, Post: 9000
User name: user
Advanced
dfs.data.dir, dfs.name.dir, dfs.tmp.dir 等填入 core-site.xml 中的值
mapred.child.java.opts = -Xmx200m -Xdebug -Xrunjdwp:transport=dt_socket,address=8883,server=y,suspend=y
2) eclipse开启远程调试功能,连接到运行hadoop的Ip与端口。如下图所示:
设置之后,在namenode的main()中设置一个断点,然后点击上图的debug,等待一会儿便能连上hadoop的namenode,这时便能调试hadoop的namenode了,datanode与jobtracker, tesktracker 类似。
3) 调试mapreduce程���
无参数 WordCount.java
[java]view plaincopy
- public class WordCount {
-
- public static class TokenizerMapper
- extends Mapper<Object, Text, Text, IntWritable>{
-
- private final static IntWritable one = new IntWritable(1);
- private Text word = new Text();
-
- public void map(Object key, Text value, Context context
- ) throws IOException, InterruptedException {
- StringTokenizer itr = new StringTokenizer(value.toString());
- while (itr.hasMoreTokens()) {
- word.set(itr.nextToken());
- context.write(word, one);
- }
- }
- }
-
- public static class IntSumReducer
- extends Reducer<Text,IntWritable,Text,IntWritable> {
- private IntWritable result = new IntWritable();
-
- public void reduce(Text key, Iterable<IntWritable> values,
- Context context
- ) throws IOException, InterruptedException {
- int sum = 0;
- for (IntWritable val : values) {
- sum += val.get();
- }
- result.set(sum);
- context.write(key, result);
- }
- }
-
- public static void main(String[] args) throws Exception {
- Configuration conf = new Configuration();
-
- Job job = new Job(conf, "word count");
- job.setJarByClass(WordCount.class);
- job.setMapperClass(TokenizerMapper.class);
- job.setCombinerClass(IntSumReducer.class);
- job.setReducerClass(IntSumReducer.class);
- job.setOutputKeyClass(Text.class);
- job.setOutputValueClass(IntWritable.class);
-
- FileInputFormat.addInputPath(job, new Path("/wordcount/input"));
- FileOutputFormat.setOutputPath(job, new Path("/wordcount/out"));
-
- System.exit(job.waitForCompletion(true) ? 0 : 1);
- }
- }
如果有参数, 选择“Run Configurations”,弹出窗口,点击“Arguments”选项卡,在“Program argumetns”处预先输入参数:
hdfs://master:9000/user/root/input2 hdfs://master:9000/user/root/output2
右键,选择“Run on Hadoop”, 就可以运行了.
设置断点(右键 –> Debug As – > Java Application),即可debug.
note1: 每次运行前删除out目录.
note2: 如果提示hdfs中out目录权限不够,使用hadoop dfs chmod 770 /out 授权( Linux 下 ) .
note3:SafeModeException, 在主节点处,关闭掉安全模式: #bin/hadoop dfsadmin –safemode leave
另外,该插件会在eclipse对应的workspace\.metadata\.plugins\org.apache.hadoop.eclipse下,自动生成jar文件,以及其他文件,包括Haoop的一些具体配置等。
引用
eclipse调试hadoop源代码 http://hi.baidu.com/shenh062326/blog/item/b04a810cb48315f8aa645713.html
Hadoop学习笔记之在Eclipse中远程调试Hadoop http://www.blogjava.net/yongboy/archive/2012/04/26/376486.html
通过eclipse调试MapReduce任务 http://rdc.taobao.com/team/jm/archives/1761