开始的时候,这种陌生的匹配语言看起来很神秘。但是不要担心,它的思路是非常简单的。有一些字符-我们称之为“通配符”或者“元字符”,这些的字符只会按照设定的条件匹配相应文本。您可以对被匹配的文本可以替换或者别的操作。
【To those unfamiliar with matching languages they do look very cryptic at first, but don't worry - the idea is really very simple. Certain characters, often called wildcards or meta characters, are given special meaning. Each of these characters will match parts of the original text only if they meet certain conditions. The text that's been matched can then be replaced by something else.】例如星号“*”可以用来匹配任意数量的任何字符。通常用来匹配您所不确定的文本部分。例如您需要匹配任何以“ko”结尾的单词,使用“*ko”将会匹配“Naoko”或者“Atsuko”但是不会匹配“Michie”。此外“john*smith”将会匹配“John W Smith”,“John 'Bubba' Smith", 当然也包括“John Smith”。
【For example, the asterisk "*" will match any unknown group of characters, no matter what they are. It's normally used to match a section of text you're not sure about. For instance, say you were trying to match any word that ended with the letters "ko". Using "*ko" would match "Naoko" or "Atsuko" but it wouldn't match a ko-less "Michie". While, something like "john*smith" would match "John W Smith", "John 'Bubba' Smith", not to mention plain 'ol "John Smith".】
针对HTML语言来说,假设您希望匹配所有的图片标签。图片标签总是用“<img”开头,用“>”结尾,但是中间总是会存在一些别的字符,这个时候就可以使用“<img * >”这个表达式,这相当于···
【Applying the idea to HTML - say you wanted to match all image tags. An image tag always begins with "<img" and ends with a ">", but it can also have any number of things in between. A matching expression like "<img * >" could be used. It's like saying...】匹配任何以<img开头,中间可以出现任何字符,并以“>”结尾的文本。
【Match anything that starts with <img, possibly has some other stuff here, then ends with ">".】在替换文本中您可以按照自己的设计重写图片标签。甚至可以“捕获”部分原来的文本并保留到替换文本当中(例如图片文件的链接地址),请看下面的例子。
【In the replacement text you could then re-write the image tag to say exactly what you want it to. It's even possible to "capture" parts of the original text you may want to keep (like the URL of the image for instance) to use in the replacement text. Look at the following...】
匹配: | <img * src=(\w)\1 * >
替换:
| <img src=\1 border=1 >
| |
这里引进一些新的概念,首先是“\w”的用法,它会匹配任何连续的字符串,直到遇到空格中断为止,在匹配url的时候很有用。而跟在括号后面的 “\1”的含义是:把所匹配的文本存储到编号1的变量中。“\1”在替换文本中的作用是在当前位置插入编号1的变量内容,匹配语言提供了0-9十个变量以供使用。
【This introduces a few new ideas, first of all the "\w" (or word match) will match any continuous string of text unbroken by a space - it's useful for matching URLs. the parenthesis "( ... )" followed by the "\1" basically say "Stick whatever is matched between ( and ) and place it into variable number one". The "\1" in the replacement text then just inserts the contents of variable number one at that location. The Proxomitron matching language features ten such variables numbered 0-9.】
运行一下,上面的规则将会把图片标签改写成下面这个样子:
原标签
<img align=left src="bison.gif" alt="My pet bison Phil" >
重写为
<img src="bison.gif" border=1 >
蓝色部分是第一个 "*" 匹配的
红色 部分是 "(\w)\1" 匹配的
绿色部分是最后的 "*"匹配的
请注意蓝色和绿色的文本并没有出现在替换文本当中,只有我们通过变量进行定义希望保留的文本才会出现。决定哪些是保留的,哪些是需要删除的,这样我们就可以完全控制html。例如-我们希望修改上面的图片,不是直接显示图片而是成为一个链接地址,当我们点击的时候才会显示图片。我们可以把替换文本改为下面这样
【Notice that the blue and green bits never appear in the replacement text. Only the bit we decided to keep by using the number variable does. By deciding what to keep and what to throw away we can completely rework a bit of HTML. For example, say we wanted to change the above image so that instead of showing us our bison, it gave us a link we could click to see it. If we changed the replacement text to read...】
替换为: <a href=\1 > a picture </a>
可以把
<img align=left src="bison.gif" alt="My pet bison Phil" >
重写为
<a href="bison.gif" > a picture </a>
希望这样可以使您对于匹配模式有了一个初步的概念,学习更多的元字符就可以把你的设想付诸行动了。
【Hopefully this will give you a brief idea of how pattern matching works. Read about the other meta characters to learn more about putting these ideas into action.】
返回 匹配规则
