Thursday, October 8, 2015

LeetCode - Wildcard Matching


Implement wildcard pattern matching with support for '?' and '*'.

'?' Matches any single character.
'*' Matches any sequence of characters (including the empty sequence).

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "*") → true
isMatch("aa", "a*") → true
isMatch("ab", "?*") → true
isMatch("aab", "c*a*b") → false


Approaches to the Problem


The general approach is to loop through each character in s and p and compare them.  Break out of the loop once the end is reached for either s or p.

- If the characters match, then move on to the next set of characters.

- If the pattern character is '?', then move on to the next set of characters.

- If the pattern character is a wildcard '*', then call the function recursively with the wildcard character removed.  If the recursive call returns true, then the candidate string and pattern matches.  If the recursive call returns false, then move on to the next character in the candidate string.

- If the characters don't match, then return false.

Finally, return true if the end has been reached for both s and p.  Otherwise, return false if the end has not been reached for either s or p.


The first approach works, but has an incredibly inefficient runtime complexity of O(2^n).  This is caused by the fact that whenever a wildcard is encountered, it performs a recursive call and possibly recomputes the same exact s and p multiple times.  Due to the O(2^n) time complexity, LeetCode threw a 'Time limit exceeded" error.

This problem is analogous to the fibonacci problem where the same computation is performed many times.

To solve the issue of duplicate computations, we can use a technique called memoization.  By creating a lookup table to store failed matches, we can avoid going down the same rabbit hole by checking if s and p has already failed to match.


The second approach significantly improves runtime complexity, but LeetCode still threw a 'Time limit exceeded" error on the inputs:

s = "aaaaabbbbbababbabbbbaaabbaabaaabaabbbbabaaaabaaabbbbbbabbaaaaabbbaaababbaaabaabbaabbbbaaabaabbbaabbbbabbabaabbabbbbaabbaabbabaababababbaabbabbbaaabaaababaaaabbaaaaabbbaaaabbbababbabaaaabbbaabaabbaaaabaaab"

p = "*aab***aaab**a*b**b*aa***baabababb*abbbbb**a**ba******b**b*a*ab*b***aa*****a*aba*a***aa*b****a*a*****"

Upon closer inspection, we can infer that the contiguous wildcards was causing the algorithm to branch out wildly.  To solve this issue, we can remove contiguous wildcards from p so it looks like

p = "*aab*aaab*a*b*b*aa*baabababb*abbbbb*a*ba*b*b*a*ab*b*aa*a*aba*a*aa*b*a*a*"


By combining the three approaches above, LeetCode accepted my submission.

This was one of the more challenging problems I have encountered, and justifiably so because the solution acceptance rate is 15.8% (13th lowest out of 273 problems on the site) and its difficulty is rated 'Hard'.

The real world application of the wildcard matching problem includes anything that uses text search, some of which are: Google search, Microsoft Word, text editors, etc.