更佳编程之路: 第 12 章. 使用 perledit: 段编辑文件_VMware, Unix及操作系统讨论区_Weblogic技术|Tuxedo技术|中间件技术|Oracle论坛|JAVA论坛|Linux/Unix技术|hadoop论坛

总帖数

每页帖数

1/1页

返回列表

发起投票

查看: 4349 | 回复: 0

主题： 更佳编程之路: 第 12 章. 使用 perledit: 段编辑文件

谁是天蝎

注册用户

等级：大元帅
经验：90210
发帖：106
精华：0
注册：2011-7-21
状态：离线
发送短消息息给谁是天蝎

加好友发送短消息息给谁是天蝎

发消息

发表于：

2011-8-25 16:42:36 | [全部帖] [楼主帖]

楼主

cfper 语法使用管道来完成输入文件和输出文件的编辑。并没有使用临时文件、外部 Perl 调用或者任何其它的技巧。

cfperl Perl 编辑文件以常规的 cfperl 方式来解析，先由一个依赖于特定主机的“perledit”解析，重组，然后交给顶层的解析器。

什么是基于 cfperl Perl 的文件编辑？

Perl 语言主要以文本处理而闻名。它的语法灵活而简洁，可以用很少几行语句来完成复杂的编辑操作。(参考两篇文章 -- 一行程序 101和一行程序 102-- 描述了 Perl 的灵活而强大的编辑功能。)

cfengine 文件编辑的语法相对于 Perl 来说更为简单和直接。这通常来说是更好的，但是有一些懂得 Perl 的人宁可去用 Perl 而不去用 cfengine 内嵌的文件编辑功能。我曾经尝试让 cfengine 的 shell 命令和 editfiles 如我所愿地去工作，但最终的结论是，它不可能取代 Perl。

例如，我们来考虑一个常见的匹配 IP 地址的任务。使用 cfengine，您必须在每次用到的时候定义一个模式来匹配一个 IP 地址。使用 cfperl，您可以这样做：

清单 1. 匹配 IP 地址

perledit:
any::
filter in place /etc/hosts 'use Regexp::Common qw/net/; next unless m/$RE{net}{IPv4}/;'

这一行代码使 cfperl 打开并读入 /etc/hosts 文件，然后再只写回那些匹配某个 IPv4 IP 地址的那些行。相比较而言，cfengine 有 DeleteLines 函数 (删去某些行，那些行或者包含某个词，或者匹配某个正则表达式，或者以某个词开头)。问题是，您不得不自己去写匹配 IPv4 地址的正则表达式。您可能会说，这很简单，不过是重复4次 \d+\ ，不要最后的句号，对吗？不完全对；IP 地址数据的范围是 0 到 255，如果您想要对您的 B 类子网中的所有地址进行特殊处理怎么办？使用 cfperl， Net::Netmask 可以帮您完成这项任务。如果用 cfengine，这个任务是非常困难的，而且您还很可能会出错。

我强烈建议您去看一看 CPAN 上的 Regexp::Common 模块 (参阅参考资料一节的链接)。它不仅与 cfperl 有关，而且对于任何一个真正的 Perl 程序员来说它都是一个基本的工具。

以下是 4 个用于文件编辑的 cfperl 命令，对应于文件编辑的 4 个基本功能：

写文件 ( write )
以另一个文件作为数据源写一个文件 ( write from A to B )
以一个正则表达式来过滤一个文件 ( filter in place )
将一个文件过滤到另一个文件 ( filter A to B )

filter 命令如同 perl -p ，会自动打印出命令的结果。 write 命令如同 perl -n ，由用户在适当的时机调用 print() 再打印。如果您熟悉 -p 和 -n 的用法，您将会觉得使用 cfperl 的文件编辑命令非常得心应手。如果您对这些不熟悉，请参考 "perldoc perlrun" 帮助手册。

语法和解析器实现

特别要提出的是，perledit 的语法只是在命令名后边跟输入文件 (如果要用到)、输出文件和编辑时用执行的 Perl 命令。 write 和 filter in place 命令没有输入文件。另外两个命令， write from A to B 和 filter from A to B ，需要输入文件。输入和输出文件可以通过管道传到其它进程。例如， |perl 可以将输出传到处理路径上的 Perl 解释器，而通常不用 ls/etc| 来列出文件 (而是应该去用 File::Find 模块或者 opendir() / readdir() 函数)。

cfperl 解析器可以处理“unquoted”的文件 (如 /etc/passwd)、“quoted”的文件 (如 First Chapter.doc)以及与其它进程交互的管道。定义了以下规则：

清单 2. 文件名规范

input_filename: input_pipe |
input_regular_filename
output_filename: output_pipe |
output_regular_filename
input_pipe: /"[^"]+?\|"/
{
       $item[1] =~ s/"([^"]*)"/$1/;
       chop $item[1]; # cut the last (pipe) character
       $return = {
             mode => ::EDIT_INPUT(),
             type => ::EDIT_FILETYPE_PIPE(),
             file => $item[1],
       };
       1;
}
output_pipe: /"\|[^"]+?"/
{
       $item[1] =~ s/"([^"]*)"/$1/;
       $item[1] = substr $item[1], 1; # cut the first (pipe) character
       $return = {
             mode => ::EDIT_OUTPUT(),
             type => ::EDIT_FILETYPE_PIPE(),
             file => $item[1],
       };
       1;
}
output_regular_filename: regular_filename { $return =
       {
             mode => ::EDIT_OUTPUT(),
             type => ::EDIT_FILETYPE_FILE(),
             file => $item[1],
       };
       1;
}
input_regular_filename: regular_filename { $return =
       {
             mode => ::EDIT_INPUT(),
             type => ::EDIT_FILETYPE_FILE(),
             file => $item[1],
       };
       1;
}
regular_filename: unquoted_regular_filename | quoted_regular_filename
quoted_regular_filename: /"[^"|]+"/
{
       $item[1] =~ s/^"(.*)"/$1/;
       $return = $item[1];
       1;
}
unquoted_regular_filename: /[^'"\s|]+/ # no pipes, space, or quotes

如您所见，cfperl 解析器定义了保存文件信息的数据结构，尤其是文件模式 (输入或输出) 和类型 (文件或管道)。此外，出于 IO::Pipe 模块的要求，解释器在使用到管道命令的时候还去掉了文件名中的引号和管道 ( | ) 操作符。

当定义好文件名规则后，其它的事情就好办了：

清单 3. perledit 解析器的其余部分

input: filter_in_place | write_in_place |
filter_from_to | write_from_to |
<error>
filter_in_place: /filter/ /in/ /place/ output_regular_filename command
{::edit_op(::EDIT_IN_PLACE, $item{command}, undef, $item{output_regular_filename}); 1; }
filter_from_to: /filter/ /from/ input_filename /to/ output_filename command
{::edit_op(::EDIT_FROM_TO, $item{command}, $item{input_filename},
$item{output_filename});
1; }
write_in_place: /write/ output_filename command
{::edit_op(::WRITE_IN_PLACE, $item{command}, undef, $item{output_filename}); 1; }
write_from_to: /write/ /from/ input_filename /to/ output_filename command
{::edit_op(::WRITE_FROM_TO, $item{command}, $item{input_filename}, $item{output_filename});
1; }
command: /'.*'/
{
$item[1] =~ s/^'(.*)'/$1/;
$return = $item[1];
1;
}

命令规则除去了引号。注意，不管是单引号还是双引号都被除去了，因为它是一个贪婪的匹配，一直到处理到最后一个单引号。

在开始处，对应于四个基本命令定义了通常的输入文件规则。唯一一个不用 input_filename 或 output_filename 的规则是 filter_in_place ，它不能使用管道，因为它不用将过滤后的内容重新写回。

cfperl 附带的配置样例包括以下使用范例；您可以试着指出它们的功能。不要忘记，它们是不绝对必须的，只是比较有趣。

清单 4. perledit 配置样例

perledit:
any::
filter in place /var/tmp/passwd 's/root/toor/g'
filter from /var/tmp/passwd to /var/tmp/passwd.rewrite 's/0/0wn J00/g'
filter from "ls /etc |" to /var/tmp/listing 's/s/ss/g'
filter from "ls /etc |" to "|cat > /var/tmp/listing2" 's/s/ss/g'
# this one should be an error
filter in place "|nmap" 's/s/ss/g'
# this one should make edit_op return unhappily
filter from /var/tmp/passwd to /var/tmp/passwd 's/0/0wn J00/g'
write /var/tmp/environment 'use Data::Dumper; print Dumper \%ENV'
write /var/tmp/environment-error 'use Data::Dumper; print Dumper \%ENV'
# the following two lines of text are one line of code -
# remove the (CONTINUED) text and merge them!
write from /etc/passwd to /var/tmp/passwd.rewrite2 (CONTINUED)
'use Data::Dumper; push @a, $_; print Dumper map { $_, $i++ } @a;'
# the following two lines of text are one line of code -
# remove the (CONTINUED) text and merge them!
write from "ls /etc|" to /var/tmp/listing-env (CONTINUED)
'use Data::Dumper; push @a, $_; print Dumper map { $_, $i++ } @a;'

以灵活的方式打开文件：

edit_open_file() 函数

edit_open_file() 函数为 edit_op() 函数打开文件。

清单 5. edit_open_file() 函数

sub edit_open_file
{
       my $data = shift @_;
my $mode = shift @_ || $data->{mode};
       my $to_open;
if ($data->{type} eq EDIT_FILETYPE_FILE)
       {
       $to_open = $data->{file};
       $data->{object} = new IO::File;

             # if the output file has not been prepended with a > already...
             if ($mode eq EDIT_OUTPUT && $to_open !~ m/^>/)
             {
                   $to_open = ">$to_open";
             }
       $data->{object}->open($to_open);
             out(5, "edit_open_file: opened file '$to_open'");
       }
elsif ($data->{type} eq EDIT_FILETYPE_PIPE)
       {
       $to_open = $data->{file};
       $data->{object} = new IO::Pipe;
             if ($mode eq EDIT_INPUT)
             {
             $data->{object}->reader($to_open);
             }
             elsif ($mode eq EDIT_OUTPUT)
             {
             $data->{object}->writer($to_open);
             }
       }
       else
       {
             warn "edit_open_file: Invalid input/output description passed in " . Dumper($data);
       }
unless ($data->{object}->opened())
       {
             out(0,
       "edit_open_file: could not open " . $data->{type} .
             " '$to_open', editing operation will fail");
             return undef;
       }
       return $data;
}

edit_open_file() 函数没有使用 Perl 内置的文件处理的功能，而使用了 IO::File 和 IO::Pipe 模块。用这种方法来将文件对象返回到 edit_op() 函数更为简单。

重要的是要理解为什么将 edit_open_file() 作为一个单独的函数。首先，它比较复杂，而且在逻辑上和 edit_op() 的其它功能是分开的。其次，它在多处用到，必须可以被灵活地调用。第三，将来它还可以在 cfperl 的其它地方被重用。

edit_open_file() 中可选的第二个参数在 filter_in_place 功能中用于强制定义输出文件。不要忘记， filter_in_place 只用到了一个文件，同时作为输入和输出，而输出和输入文件被文件名规则打上了不同的标记。这样，一个在语法中被定义为输出的文件将不得不按输入文件来处理。事实的确如此。

事务的核心: edit_op() 函数

edit_op() 函数是基于 cfper Perl 的编辑里的核心部分。解析器的四个功能的执行都要调用它，不过每次使用不同的常量 ( EDIT_IN_PLACE , EDIT_FROM_TO , WRITE_IN_PLACE , WRITE_FROM_TO ) 来指定要执行的特定功能。实际的 Perl 编辑命令由 edit_op() 来执行。输入和输出文件的信息 (由文件名规则生成的) 也传送到 edit_op() ，这样 edit_op() 就可以知道所正在处理的文件。

首先， edit_op() 确认输入文件和输出文件不是同一个文件，并以相应的参数被调用：

清单 6. edit_op() 文件初始处理

out (0, "edit_op: invoked with insufficient parameters")
unless defined $command;
out (0, "edit_op: invoked with insufficient parameters")
unless defined $input || defined $output;
if (defined $output && defined $input &&
$input->{file} eq $output->{file})
{
out(0, "edit_op: input and output file '" . $input->{file} .
"' is the same, exiting");
return 1;
}

然后， edit_op() 设置接接下来会用的逻辑操作模式。这看起来违反常理。您可能会问，为什么不在代码中使用常量呢？原因是它们对于一般读者来说太难懂，而且在当每个条件都需要额外的思考时很容易让程序员 (可能就是您的) 犯错误。

清单 7. 设置操作模式

my $in_place_mode = ($op == EDIT_IN_PLACE) || ($op == WRITE_IN_PLACE);
my $from_to_mode = ($op == EDIT_FROM_TO) || ($op == WRITE_FROM_TO);
my $edit_mode = ($op == EDIT_IN_PLACE) || ($op == EDIT_FROM_TO);
my $write_mode = ($op == WRITE_IN_PLACE) || ($op == WRITE_FROM_TO);
my $write_once_mode = 0;
if ($write_mode && $in_place_mode)
{
$write_once_mode = 1; # there is no loop for this case
}

然后， edit_op() 执行文件编辑主循环。 outputfilehandle 保存在 $old_handle 中，这样在编辑循环内部调用 print() 就可以将其传送到另一个文件而不是传送到默认的 STDOUT 。您可以看到，我们前面定义的详细的模式相对于它们代表的常量来说更容易理解。这解决的基本问题是，调用常量是顶层操作模式参数，而它们的内容 (操作子模式，如果您要用到) 才是程序真正需要的。例如，我们实际上不关心 WRITE_IN_PLACE 模式的使用。我们关心的是 $write_once_mode 模式被激活。

清单 8. 文件编辑主循环

my $old_handle = select; # save the old output destination
eval
{
       no strict;
       no warnings;
       if ($edit_mode && $in_place_mode) # filter in place
       {
             ...
       }
       elsif ($write_once_mode) # "write" command
       {
             ...
       }
       else # "write from to" or "edit from to"
       {
             ...
       }
};
select $old_handle; # restore output in case it's needed

edit_op() 的其余部分完成各个模式相应的功能。对 filtering in place 来说，输入文件被作为各行的一个列表读入并处理。这是因为像 File::Temp 那样使用临时文件，不像我所期待那样可靠。它或者容易出错，或者很难用。

由 $write_once_mode 变量指定的 write 模式是另外一种特殊的情况。在这里没有循环；命令的输出被简单地写入到输出文件中。

最后是两个 from-A-to-B 模式。两者唯一的区别是，在 filter from A to B 模式中，在每一行处理完成后要执行一个附加的 print() 语句。

结束语

cfperl 的基于 Perl 的文件编辑能力正是系统管理的另一个难题。我希望这些能对您有所帮助！

编写程序的乐趣在于探索和发现；cfperl 的编辑功能将使您乐此不疲。使用 cfengine editfiles 语法很难或者根本不能实现的编辑任务，您用 cfperl 都可以完成。不仅如此，这使得编辑成为一项次要工作，最终，对 cfperl 执行过程中任意 Perl 代码来说，编辑将成为例行公事的步骤。开始投入吧!祝您得到更多乐趣!

本版精华
热门帖子

操作引用/回复

总帖数

每页帖数

1/1页

返回列表

用户登录

Weblogic中间件技术论坛

Tuxedo中间件技术论坛

数据库论坛

Java论坛

Linux/unix论坛

网站地图