字符串匹配算法

基本思想是从源字符串的起始位置起每个字符匹配。如果中间出现失配,则回溯到某个位置再行匹配。

1.最简单的就是回溯到前一次失配串的第二个字符再行匹配,并将目标串的起始位置重置。


int Index( const char * src, const char * des, int start)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBloc
kStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/
ContractedBlock.gif) … {

int lengthSrc = strlen(src);

int lengthDes = strlen(des);

int i = start,j = 0 ;

for (; i < lengthSrc && j < lengthDes; i ++ ,j ++ )
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

if (src[i] == des[j])
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

continue ; // 这个位置匹配成功
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

else
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

// 这个位置失配,将i复位到前一次匹配串的起始位置的下一位,目标串重置

// 这里因为马上要进入下一个循环,并且i++,j++,所以i = start,j = -1.

// 标记起始位置的start也要加一。

i = start;

start += 1 ;

j =- 1 ;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

if (j == lengthDes)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

return start;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

else
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

return - 1 ;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBloc
kEnd.gif) }

2.使用KMP算法。KMP算法的思想只是将刚才算法优化了一下。在好的情况下,可以不用从上一个匹配串的start+1位置重新匹配。而是可以从某个可能的位置重新
匹配。

但是这样要先消耗一个n的时间来得到目标串的每一个位置失配后将从哪个位置开始重新匹配。当然这样好处还有源串的匹配位置不用改变。


int * Get_Next( const char * a)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBloc
kStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/
ContractedBlock.gif) … {

int length = strlen(a);

int * next = new int [length];

next[ 0 ] = - 1 ;

// 该算法中约定第一个位置的重匹配要从目标串起始位置的下标减一开始。

// 当然这只是为了后面的匹配函数好些罢了。

for ( int i = 0 ,j = - 1 ; i < length;)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

if (j == - 1 || a[j] == a[i])
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

j ++ ;

i ++ ;

if (a[i] != a[j])
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

next[i] = j;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

else
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … { //
如果目标串出现“aaaab”这样的情况,可以使得后面的’a’重用前面a的匹配位置。

// 类似于动态规划的思想。

next[i] = next[j];
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

else
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

j = next[j];
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

return next; // 这里并没有返回数组的个数…
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBloc
kEnd.gif) }

真正的功能函数:


int IndexKMP( const char * src, const char * des, int start)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBloc
kStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/
ContractedBlock.gif) … {

int length1 = strlen(src);

int length2 = strlen(des);

int * next = Get_Next(des);

int i = start,j = 0 ;

while (i < length1 && j < length2)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

if (j == - 1 || src[i] == des[j])
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

i ++ ;

j ++ ;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

else
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

j = next[j]; // 只不过这里有所改变。
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

if (j == length2)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

return i - length2;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

else
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

return - 1 ;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBloc
kEnd.gif) }

解决了一下内存泄露的问题,同时找到一个数组越界的bug。在Get_Next()中next数组的下标有可能超过其长度。改进后的算法如下:


int * GetNext( const char * a, int * next)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBloc
kStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/
ContractedBlock.gif) … {

int length = strlen(a);


next[ 0 ] = - 1 ;

for ( int i = 0 ,j = - 1 ; i < length;)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

if (j == - 1 || a[j] == a[i])
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

j ++ ;

i ++ ;


if (a[i] != a[j])
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

next[i] = j;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

else
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

next[i] = next[j];
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

else
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

j = next[j];
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

本着谁生成谁释放的原则,在IndexKMP中生成next数组:注意这里的数组长度是length2+1,很奇怪的是,当初值为length2时,本该越界处是在G
etNext()中。但是那里并没有抛出异常。倒是在delete中抛出来Heap被破坏的异常。所以如果你写的程序抛出Heap Corruption
Exception,那么很有可能你的数组已经在前面越界。



int IndexKMP( const char * src, const char * des, int start)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBloc
kStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/
ContractedBlock.gif) … {

int length1 = strlen(src);

int length2 = strlen(des);

int * next = new int [length2 + 1 ];

GetNext(des,next);

int i = start,j = 0 ;

while (i < length1 && j < length2)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

if (j == - 1 || src[i] == des[j])
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

i ++ ;

j ++ ;

![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

else
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

j = next[j];

![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }


delete [] next;

if (j == length2)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

return i - length2;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

else
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

return - 1 ;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBloc
kEnd.gif) }

继续优化代码:在开始处判断目标串和源串的长度


int IndexKMP( const char * src, const char * des, int start)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBloc
kStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/
ContractedBlock.gif) … {

int length1 = strlen(src);

int length2 = strlen(des);

if (length2 > length1)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

return - 1 ;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

int * next = new int [length2 + 1 ];

GetNext(des,next);

int i = start,j = 0 ;

while (i < length1 && j < length2)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

if (j == - 1 || src[i] == des[j])
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

i ++ ;

j ++ ;

![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

else
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

j = next[j];

![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }


delete [] next;

if (j == length2)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

return i - length2;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

else
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockStart.gif) ![](http://images.csdn.net/syntaxhighlighting/OutliningIndicato
rs/ContractedSubBlock.gif) … {

return - 1 ;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubB
lockEnd.gif) }

![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBloc
kEnd.gif) }