Problem Statement
The following is the description of substring with concatenation of words:
You are given a string, s, and a list of words, words, that are all of the same length. Find all starting indices of substring(s) in s that is a concatenation of each word in words exactly once and without any intervening characters.
For example, given:
s: "barfoothefoobarman"
words: [“foo”, “bar”]
You should return the indices: [0,9].
Thoughts
Given the problem, we can easily come up with the following solution:
- Compose all possible string combinations from the
words
array and compare them with target string: the time and memory complexity can be ; - Or, Find index of every single word and try to connect them;
- Or, Combine words using prefix tree and compare with target;
In both cases, we need to compare string effectively, so we can use KMP
algorithm.
How to KMP
KMP is a somewhat complicated algorithm to write, but very efficient. It can compare string with the time efficiency – .
If you don’t understand KMP, try reading my basic introduction and/or more interactive version.
Application
I choose the second thought because it can avoid composing duplicate1 connected-string. This is often the case that: we met a problem, which have two basic directions
- Generate many possible candidates, then filter them;
- Or, more efficiently, try to generate what we need directly.
Another reason is that the first thought will not use the condition of all words are same length
as far as I can see.
s: "aaa"
words: [“a”, “a”, “b”]
Problem Analysis Again – Same Length
Now, suppose we get the index of every word, how to combine them?
We notice the condition – all words have same length. What make a difference if they are not?
Right, we can’t use the first index and plus the length to get the next possible index if they are not.
// So I sort the indices in order
sort(mapping:index -> word)
// and connect indices if the differences between them is length(word) -- which will make many linked-lists.
for( index : indcies )
if(mapping.has(index + length(word)))
connect two nodes of (index, word)
Then, I can iterate every linked-list to put word in a deque. If this word appear too many times, I pop some element from head to adjust. If the size of deque equals to the count of words, we find a match.
while(node != NULL)
if(deque.count(node.word) >= words.count(node.word))
pop in front until count(node.word) decrease
deque.add(word)
if(deque.size() == words.size())
remember start of front node's index
node = node->next;
Finally, the implementation of upper pseudo-code
Written with StackEdit.
For example, the following will generate duplicate string “ab” ↩︎
评论
发表评论