The Longest Common Subsequence (LCS) problem is a classic question in computer science. Given two strings, the goal is to find the longest sequence of characters that appears in both strings in the same order (though not necessarily consecutively). This problem is common in fields like bioinformatics for DNA sequence analysis and in version control systems for tracking code changes.
Algorithm Choice
For LCS, a dynamic programming (DP) approach is both efficient and intuitive, offering the lowest practical time and space complexity for typical cases. We use a 2D array to store intermediate results for subproblems, reducing redundant computations.
• Optimal Substructure: The LCS of two strings can be constructed from the LCS of smaller subproblems.
• Overlapping Subproblems: Many LCS subproblems overlap, making dynamic programming suitable as it stores and reuses results.
• Recursive Nature: LCS calculations break down into smaller LCS computations.
• Top-Down or Bottom-Up Feasibility: Both top-down memoization and bottom-up DP work well.
Steps of Algorithm
1. Initialize a 2D DP array dp of size (m+1) \times (n+1) to store subproblem results.
2. Iterate over each character pair from the two strings:
• If characters match, update the cell as dp[i][j] = dp[i-1][j-1] + 1.
• If not, take the maximum value between the previous subproblems: dp[i][j] = max(dp[i-1][j], dp[i][j-1]).
3. The LCS length will be stored in dp[m][n] at the end.
public class LongestCommonSubsequence {
public int longestCommonSubsequence(String text1, String text2) {
int m = text1.length();
int n = text2.length();
// Initialize the DP array with dimensions (m+1) x (n+1)
int[][] dp = new int[m + 1][n + 1];
// Fill the DP array by comparing characters
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (text1.charAt(i - 1) == text2.charAt(j - 1)) {
dp[i][j] = dp[i - 1][j - 1] + 1; // Matching characters
} else {
dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]); // Non-matching characters
}
}
}
// The result is the length of the longest common subsequence
return dp[m][n];
}
}
Line-By-Line Breakdown
- Store Lengths
int m = text1.length();
int n = text2.length();
Retrieve lengths of text1 and text2 and store them in variables. These lengths will determine the dimensions of our DP array, allowing us to iterate over each character in both strings.
2. Create 2D Integer Array
int[][] dp = new int[m + 1][n + 1];
- Create a 2D integer array dp with dimensions (m +1) x (n +1). The extra row and column allows for easier boundary conditions when indexing, so dp[0][*] and dp[*][0] can represent comparisons with an “empty” substring.
- Each dp[i][j] will store the length of the LCS between the substrings text1[0…i-1] and text2[0…j-1]. This array will help us break down the problem into smaller subproblems and store intermediate solutions, key aspects of dynamic programming.
3. Nested For Loop
// Fill the DP array by comparing characters
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
- These nested for loops iterate over each character pair in text1 and text2, where i and j range from 1 to m and n respectively.
- By looping over each character in both strings, we are able to build solutions for progressively larger substrings, which is typical in dynamic programming.
4. Check Character Match With If Statement
if (text1.charAt(i - 1) == text2.charAt(j - 1)) {
dp[i][j] = dp[i - 1][j - 1] + 1; // Matching characters
- This if statement checks if the characters at text1[i-1] and text2[j-1] match. If they do, dp[i][j] is set to dp[i-1][j-1] + 1, indicating that the LCS length increases by 1, as we have found a common character.
- This captures the essence of the LCS problem: when two characters match, we extend the subsequence. The +1 adds this character to the previously computed LCS length stored at dp[i-1][j-1].
5. Check If Characters Don’t Match
} else {
dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]); // Non-matching characters
- If the characters do not match, dp[i][j] takes the maximum of dp[i-1][j] or dp[i][j-1]. This means we disregard one of the characters (from text1 or text2) to maintain the longest sequence possible.
- This part of the DP logic is key in handling non-matching characters. By taking the maximum, we ensure that only valid subsequences are considered and that the length is maximized.
6. Return
// The result is the length of the longest common subsequence
return dp[m][n];
}
}
- Finally, we return dp[m][n], which holds the length of the longest common subsequence of text1 and text2.
Why This Algorithm Works Well
This code efficiently uses the dynamic programming pattern to break down the LCS problem into manageable subproblems. Each dp[i][j] cell leverages previously computed solutions, which reduces unnecessary recalculations. The approach is much more efficient than brute force, which would involve examining all possible subsequences directly, resulting in exponential time complexity.