Added Python Implementation of Suffix Arrays and LCP Arrays
This pull request provides an implementation of Suffix Arrays and Longest Common Prefix (LCP) Arrays in Python, contributing to the open-source community as part of Hacktoberfest 2024.
Overview:
A suffix array is a fundamental data structure used in various text processing algorithms. This project implements:
- Suffix Array Construction: Efficiently builds a suffix array by sorting all suffixes of a string and storing their starting indices.
- LCP Array Construction: Constructs the LCP array that records the lengths of the longest common prefixes between consecutive suffixes in the sorted suffix array.
Key Features:
- Efficient Construction: Both the suffix array and LCP array are computed efficiently, ensuring linear-time construction for the LCP array after suffix sorting.
- User-friendly Display: Clearly shows both the suffix and LCP arrays for any input string, aiding in visualization.
Why this Contribution?
As part of Hacktoberfest 2024, this contribution aims to assist developers working with text-processing algorithms. This implementation serves as a foundation for more advanced algorithms in fields such as bioinformatics, data compression, and natural language processing.
Example Output:
For the input string "banana", the program generates:
- Suffix Array: [5, 3, 1, 0, 4, 2]
- LCP Array: [0, 1, 3, 0, 0, 2]
Why Suffix Arrays and LCP Arrays Matter:
- Text Searching: Essential for fast substring searching.
- Pattern Detection: Highlights repeated patterns within the text, useful in data compression.
- Bioinformatics: Critical for genome sequencing and alignment algorithms.
References:
For more information on suffix arrays and LCP arrays:
Describe your change:
- [x] Add an algorithm.
- [ ] Fix a bug or typo in an existing algorithm.
- [ ] Add or change doctests? -- Note: Please avoid changing both code and tests in a single pull request.
- [x] Documentation change.
Checklist:
- [x] I have read CONTRIBUTING.md.
- [x] This pull request is all my own work -- I have not plagiarized.
- [x] I know that pull requests will not be merged if they fail the automated tests.
- [x] This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
- [x] All new Python files are placed inside an existing directory.
- [x] All filenames are in all lowercase characters with no spaces or dashes.
- [x] All functions and variable names follow Python naming conventions.
- [x] All function parameters and return values are annotated with Python type hints.
- [x] All functions have doctests that pass the automated testing.
- [x] All new algorithms include at least one URL that points to Wikipedia or another similar explanation.
- [ ] If this pull request resolves one or more open issues then the description above includes the issue number(s) with a closing keyword: "Fixes #ISSUE-NUMBER".
I would greatly appreciate your feedback on my Python implementation of Suffix Arrays and LCP Arrays—your insights would mean a lot!