Does it mean I cannot write a solution here? I remember that it is advised not to share solutions online for non-temporary CTF contests, and it is also the case for this course? Interesting…

Suffix Array

A suffix array is a sorted array of all suffixes of a given string. The sophisticated versions of the algorithms to build a suffix array are too diffcult to understand for me right now, so I will stick to an easy implementation.

def build_suffix_array(s: str) -> list[int]:
    suffixes: list[tuple[str, int]] = []
    for i in range(len(s)):
        suffixes.append((s[i:], i))

    # O(n^2logn): O(nlogn) sorting and O(n) string comparision
    sorted_suffixes: list[tuple[str, int]] = sorted(suffixes, key=lambda x: x[0])

    suffix_array: list[int] = [suffix[1] for suffix in sorted_suffixes]
    return suffix_array


s = "ababba"
"""
5: a
0: ababba
2: abba
4: ba
1: babba
3: bba
"""
print(build_suffix_array(s))  # [5, 0, 2, 4, 1, 3]

s = "banana"
"""
5: a
3: ana
1: anana
0: banana
4: na
2: nana
"""
print(build_suffix_array(s))  # [5, 3, 1, 0, 4, 2]

I gradually got the concept of the suffix array, but this simple approach is not what the competitive programming world accepts. I need to understand the better way to build a suffix array.

def build_suffix_array(s: str) -> list[int]:
    # Append the special character
    s += "$"  # $ is smaller than any other characters
    n = len(s)

    # Initial setup
    suffix_array: list[int] = list(range(n))

    # Initial ranking based on single characters
    # Rank based on ASCII values
    # Values don't matter as long as they give a unique ordering
    rank = [ord(c) for c in s]
    temp_rank = [0] * n

    k = 0
    while (1 << k) < n:  # While 2^k < n
        # Sort based on pairs (rank[i], rank[i + 2^k])
        # to increment `k` and get the result of 2^(k + 1),
        # we split 2 ^ (k + 1) into two groups of 2^k
        # Because we already know the ordering of 2 ^ k in the previous iteration
        # Comparing the strings with 2^k length can be done in O(1)
        # rank[i]: the rank of the suffix starting at `i` (left half)
        # rank[i + 2 ^ k]: the rank of the suffix starting at `i + 2^k` (right half)
        # rank[(i + 2 ^ k) % n]: Cycling does not break the order, so
        # to make things simple, we make all the suffix the same length
        # 0: "abbab$"
        # 3: "ab$abb"
        suffix_array.sort(key=lambda i: (rank[i], rank[(i + (1 << k)) % n]))

        # Reassign ranks based on sorted order
        temp_rank[suffix_array[0]] = 0
        for i in range(1, n):
            prev = suffix_array[i - 1]
            curr = suffix_array[i]
            # Compare pairs (rank[prev], rank[prev + 2^k]) and (rank[curr], rank[curr + 2^k])
            if (rank[prev], rank[(prev + (1 << k)) % n]) == (
                rank[curr],
                rank[(curr + (1 << k)) % n],
            ):
                temp_rank[curr] = temp_rank[prev]
            else:
                temp_rank[curr] = temp_rank[prev] + 1
        rank = temp_rank[:]  # Update rank array for next iteration
        k += 1

    return suffix_array


print(" ".join([str(x) for x in build_suffix_array(input())]))

This seems to be the one I'm looking for (This is very basic and should not violate the course rule not to share a solution). I left comments as I decoded the code, and I think I understand the algorithm to some extent. Though, I don’t think I can memorize all the details. Let's try implementing it by myself later.

I tried to implement it without any references, but I missed the fact that in Python, shift operators (>> and <<) have lower precedence than arithmetic operators. Also I missed to access rank[suffix_array[i]] in some places. I can't say I completely understand the algorithm, but I'm getting there.

I'll take a look at a problem that can be solved with a suffix array (if I'm not exhausted)

1829E. The Lakes - 1100 (BFS)

This is a classic problem using DFS/BFS, as I remember this problem was on a famous Japanese book about competitive programming. Since recursive approach exceeds the maximum stack size limit when using Python, BFS is the most straightforward approach.

I accidentally updated row_i and col_i in the for-loop, and it resulted in some weird bugs. I could not find how that bug had happened, but it was a good experience to fall into the pitfall.

from collections import deque


for _ in range(int(input())):
    n, m = map(int, input().split())
    a: list[list[int]] = []
    for _ in range(n):
        a.append(list(map(int, input().split())))

    visited: list[list[bool]] = []
    for _ in range(n):
        visited.append([False] * m)

    max_volume: int = 0
    queue = deque()

    for row_i in range(n):
        for col_i in range(m):
            if a[row_i][col_i] == 0 or visited[row_i][col_i]:
                continue
            lake_volume: int = 0
            visited[row_i][col_i] = True
            queue.append((row_i, col_i))
            while queue:
                ri, ci = queue.popleft()
                lake_volume += a[ri][ci]

                # up
                if ri > 0 and a[ri - 1][ci] > 0 and not visited[ri - 1][ci]:
                    visited[ri - 1][ci] = True
                    queue.append((ri - 1, ci))
                # down
                if ri < n - 1 and a[ri + 1][ci] > 0 and not visited[ri + 1][ci]:
                    visited[ri + 1][ci] = True
                    queue.append((ri + 1, ci))
                # right
                if ci < m - 1 and a[ri][ci + 1] > 0 and not visited[ri][ci + 1]:
                    visited[ri][ci + 1] = True
                    queue.append((ri, ci + 1))
                # left
                if ci > 0 and a[ri][ci - 1] > 0 and not visited[ri][ci - 1]:
                    visited[ri][ci - 1] = True
                    queue.append((ri, ci - 1))

            max_volume = max(lake_volume, max_volume)

    print(max_volume)

Protein bar 150 Sandwich 150 Yogurt 300 Pancake & sausage 300 Yakisoba 500 Rice 400

20241107 ◎

Suffix Array

1829E. The Lakes - 1100 (BFS)