alglib::string Namespace Reference

Functions

size_t naive_str_matching (const std::string &text, const std::string &pattern)
 
size_t rabin_karp (const std::string &text, const std::string &pattern)
 
size_t horspool (const std::string &text, const std::string &pattern)
 
size_t kmp (const std::string &text, const std::string &pattern)
 Performs substring search in a given string. More...
 

Function Documentation

size_t alglib::string::horspool ( const std::string &  text,
const std::string &  pattern 
)
109  {
110 
111  int skip_table[UCHAR_MAX + 1];
112  int n = text.length(), m = pattern.length();
113 
114  // Initialize each of the index of the shift table to m.
115  std::fill_n(skip_table, UCHAR_MAX + 1, m);
116 
117  // Compute the skip table for the given pattern.
118  for(int i = 0; i < m - 1; i++) {
119  skip_table[pattern[i]] = m - i - 1;
120  }
121 
122  // Do the string comparison
123 
124  // align the pattern shifted with s in [0, n-m] with the text
125  for(int shift = 0; shift <= n - m;) {
126 
127  // For every shift, start comparing from back
128  int i = m - 1;
129 
130  // Keep moving to the right until a mismatch is detected.
131  while(i >= 0 && (text[shift + i] == pattern[i]))
132  i--;
133 
134  if(i == -1)
135  // if the pattern matched completely with the text
136  return shift;
137 
138  shift += skip_table[text[shift + m - 1]];
139  }
140 
141  return std::string::npos;
142 
143 }
size_t alglib::string::kmp ( const std::string &  text,
const std::string &  pattern 
)

Performs substring search in a given string.

Parameters
textThe text in which the pattern is to be searched
patternThe string to be searched
Returns
The starting position of the first occurence of pattern in text
151  {
152 
153 
154 }
size_t alglib::string::naive_str_matching ( const std::string &  text,
const std::string &  pattern 
)

The naive algorithm checks for the equivalence of pattern P with the substring of the text T[s:s+m-1] starting with the shift s = 0 and going all the way to s = n-m.

Performance The best case occurs when the substring with shift = 0 matches the pattern giving a time complexity of O(m). The worst case occurs when the pattern matches all but the last character of the m-substring produced by each of the shift. In this case, the time complexity is O((n-m)m). In the average case, the expected succesful consecutive character matches while comparing a random pattern with a random m-substring is given by: Summation of [(1/len(Σ))^k * (len(Σ)-1)/len(Σ)] * (k+1) from k = 0 to m where the quantity in [] is the probability of having k successful matches followed by one unsuccesful match. For english alphabet, the summation is close to 1. Hence, in average case the time complexity is O(n-m).

Proof of Correctness Supposing that T[s, s+m-1] = P, the match will be detected in the sth iteration of the loop. If however a match doesn't exist, the test will fail in each of the iteration and a no-match-found identifier will be returned.

27  {
34  int s = 0;
35  int n = text.length(), m = pattern.length();
36  for(s = 0; s <= n - m; s++)
37  // Loop Invariant : The pattern doesn't exist in T[0:s-1] but may exist
38  // in T[s, n].
39  if(text.substr(s, m) == pattern)
40  return s;
41 
42  return std::string::npos;
43 
66 }
size_t alglib::string::rabin_karp ( const std::string &  text,
const std::string &  pattern 
)

The Rabin-Karp algorithm computes a hash of the m-substring with shift s and then compares it with the pattern. It gains on the naive algorithm by using a rolling hash function which allows one to compute hash value of T[s+1:s+m] from T[s:s+m-1] in constant time.

78  {
86  int m = pattern.length();
87  int n = text.length();
88  const long int R = 10;
89  long int pattern_hash = poly_hash(pattern, m);
90  long int text_hash = poly_hash(text, m);
91  long int R_m_minus_1 = std::pow(R, m - 1);
92 
93  if(pattern_hash == text_hash)
94  if(pattern == text.substr(0, m))
95  return 0;
96  for(int s = 1; s <= n - m; s++) {
97  text_hash = (text_hash - R_m_minus_1 * text[s-1]) * R + text[s + m - 1];
98  if(pattern_hash == text_hash)
99  if(pattern == text.substr(s, m))
100  return s;
101  }
102 
103  return std::string::npos;
104 }