20241107 ◎


I tried to enroll in ITMO Academy: pilot course on Codeforces, but

By registering for this course, you confirm that you will not in any way distribute solutions to the original problems of the course or other original course materials (video, etc.).

Does it mean I cannot write a solution here? I remember that it is advised not to share solutions online for non-temporary CTF contests, and it is also the case for this course? Interesting…

Suffix Array

Suffix Array and Suffix Tree in Data Structures & Applications

Q1. What are some of the applications of Suffix Arrays?

Pattern Searching, Longest Common Substring, Data Compression, Data Mining, Bioinformatics, etc.

A suffix array is a sorted array of all suffixes of a given string. The sophisticated versions of the algorithms to build a suffix array are too diffcult to understand for me right now, so I will stick to an easy implementation.

def build_suffix_array(s: str) -> list[int]:
    suffixes: list[tuple[str, int]] = []
    for i in range(len(s)):
        suffixes.append((s[i:], i))

    # O(n^2logn): O(nlogn) sorting and O(n) string comparision
    sorted_suffixes: list[tuple[str, int]] = sorted(suffixes, key=lambda x: x[0])

    suffix_array: list[int] = [suffix[1] for suffix in sorted_suffixes]
    return suffix_array


s = "ababba"
"""
5: a
0: ababba
2: abba
4: ba
1: babba
3: bba
"""
print(build_suffix_array(s))  # [5, 0, 2, 4, 1, 3]

s = "banana"
"""
5: a
3: ana
1: anana
0: banana
4: na
2: nana
"""
print(build_suffix_array(s))  # [5, 3, 1, 0, 4, 2]

I gradually got the concept of the suffix array, but this simple approach is not what the competitive programming world accepts. I need to understand the better way to build a suffix array.

There seem to be Manber-Myers algorith whose time complexity is O(nlogn) but I could not find helpful resources online except for this https://github.com/benfulton/Algorithmic-Alley/blob/master/AlgorithmicAlley/SuffixArrays/sa.py

It seems the Prefix doubling algorithms are the ones I should check out.

def build_suffix_array(s: str) -> list[int]:
    # Append the special character
    s += "$"  # $ is smaller than any other characters
    n = len(s)

    # Initial setup
    suffix_array: list[int] = list(range(n))

    # Initial ranking based on single characters
    # Rank based on ASCII values
    # Values don't matter as long as they give a unique ordering
    rank = [ord(c) for c in s]
    temp_rank = [0] * n

    k = 0
    while (1 << k) < n:  # While 2^k < n
        # Sort based on pairs (rank[i], rank[i + 2^k])
        # to increment `k` and get the result of 2^(k + 1),
        # we split 2 ^ (k + 1) into two groups of 2^k
        # Because we already know the ordering of 2 ^ k in the previous iteration
        # Comparing the strings with 2^k length can be done in O(1)
        # rank[i]: the rank of the suffix starting at `i` (left half)
        # rank[i + 2 ^ k]: the rank of the suffix starting at `i + 2^k` (right half)
        # rank[(i + 2 ^ k) % n]: Cycling does not break the order, so
        # to make things simple, we make all the suffix the same length
        # 0: "abbab$"
        # 3: "ab$abb"
        suffix_array.sort(key=lambda i: (rank[i], rank[(i + (1 << k)) % n]))

        # Reassign ranks based on sorted order
        temp_rank[suffix_array[0]] = 0
        for i in range(1, n):
            prev = suffix_array[i - 1]
            curr = suffix_array[i]
            # Compare pairs (rank[prev], rank[prev + 2^k]) and (rank[curr], rank[curr + 2^k])
            if (rank[prev], rank[(prev + (1 << k)) % n]) == (
                rank[curr],
                rank[(curr + (1 << k)) % n],
            ):
                temp_rank[curr] = temp_rank[prev]
            else:
                temp_rank[curr] = temp_rank[prev] + 1
        rank = temp_rank[:]  # Update rank array for next iteration
        k += 1

    return suffix_array


print(" ".join([str(x) for x in build_suffix_array(input())]))

This seems to be the one I'm looking for (This is very basic and should not violate the course rule not to share a solution). I left comments as I decoded the code, and I think I understand the algorithm to some extent. Though, I don’t think I can memorize all the details. Let's try implementing it by myself later.

I tried to implement it without any references, but I missed the fact that in Python, shift operators (>> and <<) have lower precedence than arithmetic operators. Also I missed to access rank[suffix_array[i]] in some places. I can't say I completely understand the algorithm, but I'm getting there.

I'll take a look at a problem that can be solved with a suffix array (if I'm not exhausted)


1829E. The Lakes - 1100 (BFS)

This is a classic problem using DFS/BFS, as I remember this problem was on a famous Japanese book about competitive programming. Since recursive approach exceeds the maximum stack size limit when using Python, BFS is the most straightforward approach.

I accidentally updated row_i and col_i in the for-loop, and it resulted in some weird bugs. I could not find how that bug had happened, but it was a good experience to fall into the pitfall.

from collections import deque


for _ in range(int(input())):
    n, m = map(int, input().split())
    a: list[list[int]] = []
    for _ in range(n):
        a.append(list(map(int, input().split())))

    visited: list[list[bool]] = []
    for _ in range(n):
        visited.append([False] * m)

    max_volume: int = 0
    queue = deque()

    for row_i in range(n):
        for col_i in range(m):
            if a[row_i][col_i] == 0 or visited[row_i][col_i]:
                continue
            lake_volume: int = 0
            visited[row_i][col_i] = True
            queue.append((row_i, col_i))
            while queue:
                ri, ci = queue.popleft()
                lake_volume += a[ri][ci]

                # up
                if ri > 0 and a[ri - 1][ci] > 0 and not visited[ri - 1][ci]:
                    visited[ri - 1][ci] = True
                    queue.append((ri - 1, ci))
                # down
                if ri < n - 1 and a[ri + 1][ci] > 0 and not visited[ri + 1][ci]:
                    visited[ri + 1][ci] = True
                    queue.append((ri + 1, ci))
                # right
                if ci < m - 1 and a[ri][ci + 1] > 0 and not visited[ri][ci + 1]:
                    visited[ri][ci + 1] = True
                    queue.append((ri, ci + 1))
                # left
                if ci > 0 and a[ri][ci - 1] > 0 and not visited[ri][ci - 1]:
                    visited[ri][ci - 1] = True
                    queue.append((ri, ci - 1))

            max_volume = max(lake_volume, max_volume)

    print(max_volume)

Protein bar 150 Sandwich 150 Yogurt 300 Pancake & sausage 300 Yakisoba 500 Rice 400

Total 1800 kcal


MUST:

TODO:


index 20241106 20241108