I tried to enroll in ITMO Academy: pilot course on Codeforces, but
By registering for this course, you confirm that you will not in any way distribute solutions to the original problems of the course or other original course materials (video, etc.).
Does it mean I cannot write a solution here? I remember that it is advised not to share solutions online for non-temporary CTF contests, and it is also the case for this course? Interesting…
Suffix Array and Suffix Tree in Data Structures & Applications
Q1. What are some of the applications of Suffix Arrays?
Pattern Searching, Longest Common Substring, Data Compression, Data Mining, Bioinformatics, etc.
A suffix array is a sorted array of all suffixes of a given string. The sophisticated versions of the algorithms to build a suffix array are too diffcult to understand for me right now, so I will stick to an easy implementation.
def build_suffix_array(s: str) -> list[int]:
list[tuple[str, int]] = []
suffixes: for i in range(len(s)):
suffixes.append((s[i:], i))
# O(n^2logn): O(nlogn) sorting and O(n) string comparision
list[tuple[str, int]] = sorted(suffixes, key=lambda x: x[0])
sorted_suffixes:
list[int] = [suffix[1] for suffix in sorted_suffixes]
suffix_array: return suffix_array
= "ababba"
s """
5: a
0: ababba
2: abba
4: ba
1: babba
3: bba
"""
print(build_suffix_array(s)) # [5, 0, 2, 4, 1, 3]
= "banana"
s """
5: a
3: ana
1: anana
0: banana
4: na
2: nana
"""
print(build_suffix_array(s)) # [5, 3, 1, 0, 4, 2]
I gradually got the concept of the suffix array, but this simple approach is not what the competitive programming world accepts. I need to understand the better way to build a suffix array.
There seem to be Manber-Myers algorith whose time complexity is O(nlogn) but I could not find helpful resources online except for this https://github.com/benfulton/Algorithmic-Alley/blob/master/AlgorithmicAlley/SuffixArrays/sa.py
It seems the Prefix doubling algorithms are the ones I should check out.
def build_suffix_array(s: str) -> list[int]:
# Append the special character
+= "$" # $ is smaller than any other characters
s = len(s)
n
# Initial setup
list[int] = list(range(n))
suffix_array:
# Initial ranking based on single characters
# Rank based on ASCII values
# Values don't matter as long as they give a unique ordering
= [ord(c) for c in s]
rank = [0] * n
temp_rank
= 0
k while (1 << k) < n: # While 2^k < n
# Sort based on pairs (rank[i], rank[i + 2^k])
# to increment `k` and get the result of 2^(k + 1),
# we split 2 ^ (k + 1) into two groups of 2^k
# Because we already know the ordering of 2 ^ k in the previous iteration
# Comparing the strings with 2^k length can be done in O(1)
# rank[i]: the rank of the suffix starting at `i` (left half)
# rank[i + 2 ^ k]: the rank of the suffix starting at `i + 2^k` (right half)
# rank[(i + 2 ^ k) % n]: Cycling does not break the order, so
# to make things simple, we make all the suffix the same length
# 0: "abbab$"
# 3: "ab$abb"
=lambda i: (rank[i], rank[(i + (1 << k)) % n]))
suffix_array.sort(key
# Reassign ranks based on sorted order
0]] = 0
temp_rank[suffix_array[for i in range(1, n):
= suffix_array[i - 1]
prev = suffix_array[i]
curr # Compare pairs (rank[prev], rank[prev + 2^k]) and (rank[curr], rank[curr + 2^k])
if (rank[prev], rank[(prev + (1 << k)) % n]) == (
rank[curr],+ (1 << k)) % n],
rank[(curr
):= temp_rank[prev]
temp_rank[curr] else:
= temp_rank[prev] + 1
temp_rank[curr] = temp_rank[:] # Update rank array for next iteration
rank += 1
k
return suffix_array
print(" ".join([str(x) for x in build_suffix_array(input())]))
This seems to be the one I'm looking for (This is very basic and should not violate the course rule not to share a solution). I left comments as I decoded the code, and I think I understand the algorithm to some extent. Though, I don’t think I can memorize all the details. Let's try implementing it by myself later.
I tried to implement it without any references, but I missed the fact
that in Python, shift operators (>>
and
<<
) have lower precedence than arithmetic operators.
Also I missed to access rank[suffix_array[i]] in some places. I can't
say I completely understand the algorithm, but I'm getting there.
I'll take a look at a problem that can be solved with a suffix array (if I'm not exhausted)
This is a classic problem using DFS/BFS, as I remember this problem was on a famous Japanese book about competitive programming. Since recursive approach exceeds the maximum stack size limit when using Python, BFS is the most straightforward approach.
I accidentally updated row_i
and col_i
in
the for-loop, and it resulted in some weird bugs. I could not find how
that bug had happened, but it was a good experience to fall into the
pitfall.
from collections import deque
for _ in range(int(input())):
= map(int, input().split())
n, m list[list[int]] = []
a: for _ in range(n):
list(map(int, input().split())))
a.append(
list[list[bool]] = []
visited: for _ in range(n):
False] * m)
visited.append([
int = 0
max_volume: = deque()
queue
for row_i in range(n):
for col_i in range(m):
if a[row_i][col_i] == 0 or visited[row_i][col_i]:
continue
int = 0
lake_volume: = True
visited[row_i][col_i]
queue.append((row_i, col_i))while queue:
= queue.popleft()
ri, ci += a[ri][ci]
lake_volume
# up
if ri > 0 and a[ri - 1][ci] > 0 and not visited[ri - 1][ci]:
- 1][ci] = True
visited[ri - 1, ci))
queue.append((ri # down
if ri < n - 1 and a[ri + 1][ci] > 0 and not visited[ri + 1][ci]:
+ 1][ci] = True
visited[ri + 1, ci))
queue.append((ri # right
if ci < m - 1 and a[ri][ci + 1] > 0 and not visited[ri][ci + 1]:
+ 1] = True
visited[ri][ci + 1))
queue.append((ri, ci # left
if ci > 0 and a[ri][ci - 1] > 0 and not visited[ri][ci - 1]:
- 1] = True
visited[ri][ci - 1))
queue.append((ri, ci
= max(lake_volume, max_volume)
max_volume
print(max_volume)
Protein bar 150 Sandwich 150 Yogurt 300 Pancake & sausage 300 Yakisoba 500 Rice 400
Total 1800 kcal
MUST:
TODO: