Bài giảng Phân tích và thiết kế thuật toán gồm 5 chương cung cấp cho các bạn các kiến thức về giải thuật so khớp chuỗi. Tham khảo nội dung bài giảng để nắm bắt nội dung chi tiết. | Pattern Matching Pattern Matching Text processing Pattern Matching 5/14/2020 3:44:57 AM Pattern Matching Outline and Reading Strings (§) Pattern matching algorithms Brute-force algorithm (§) Boyer-Moore algorithm (§) Knuth-Morris-Pratt algorithm (§) Pattern Matching Strings A string is a sequence of characters Examples of strings: Java program HTML document DNA sequence Digitized image An alphabet S is the set of possible characters for a family of strings Example of alphabets: ASCII Unicode {0, 1} {A, C, G, T} Let P be a string of size m A substring P[i j] of P is the subsequence of P consisting of the characters with ranks between i and j A prefix of P is a substring of the type P[0 i] A suffix of P is a substring of the type P[i m - 1] Given strings T (text) and P (pattern), the pattern matching problem consists of finding a substring of T equal to P Applications: Text editors Search engines Biological research Pattern Matching Brute-Force Algorithm The brute-force pattern matching algorithm compares the pattern P with the text T for each possible shift of P relative to T, until either a match is found, or all placements of the pattern have been tried Brute-force pattern matching runs in time O(nm) Example of worst case: T = aaa ah P = aaah may occur in images and DNA sequences unlikely in English text Algorithm BruteForceMatch(T, P) Input text T of size n and pattern P of size m Output starting index of a substring of T equal to P or -1 if no such substring exists for i 0 to n - m { test shift i of the pattern } j 0 while j Boyer-Moore Heuristics The Boyer-Moore’s pattern matching algorithm is based on two heuristics Looking-glass heuristic: Compare P with a subsequence of T moving backwards Character-jump heuristic: When a mismatch occurs at T[i] = c If P . | Pattern Matching Pattern Matching Text processing Pattern Matching 5/14/2020 4:57:42 AM Pattern Matching Outline and Reading Strings (§) Pattern matching algorithms Brute-force algorithm (§) Boyer-Moore algorithm (§) Knuth-Morris-Pratt algorithm (§) Pattern Matching Strings A string is a sequence of characters Examples of strings: Java program HTML document DNA sequence Digitized image An alphabet S is the set of possible characters for a family of strings Example of alphabets: ASCII Unicode {0, 1} {A, C, G, T} Let P be a string of size m A substring P[i j] of P is the subsequence of P consisting of the characters with ranks between i and j A prefix of P is a substring of the type P[0 i] A suffix of P is a substring of the type P[i m - 1] Given strings T (text) and P (pattern), the pattern matching problem consists of finding a substring of T equal to P Applications: Text editors Search engines Biological research Pattern Matching .