I don't know why, but the today's meeting was exhausting.
There are several options such as jsvine/pdfplumber when
it comes to PDF management, and our team uses PyMuPDF
for
that purpose.
Redaction is removing visible texts or graphics from a PDF document. It's used to hide sensitive information or reduce the entropy of PDF pages.
To do redactions with PyMuPDF, we first need to use
Page.add_redact_annot()
to add redaction annotations, and
they call Page.apply_redactions()
.
# document: fitz.Document
# rectangles: list[tuple[int, int, int, int]]
page: fitz.Page = document[0] # first page
for rect in rectangles:
page.add_redact_annot(rect)
page.apply_redactions()
What I stumbled upon today is the text
option. The
document says its default value is
text=PDF_REDACT_TEXT_REMOVE | 0
but the constant
PDF_REDACT_TEXT_REMOVE
does not exist in the code base. So,
in the reality, I'm expected to pass int
to the
argument.
It can be confirmed in the source code
def apply_redactions(
page: pymupdf.Page, images: int = 2, graphics: int = 1, text: int = 0
) -> bool:
I may be totally wrong, but I prefer not using int
as a
kind of flag.
PyMongo has similar implementation.
PyMongo:
collection
- Collection level operations
pymongo.ASCENDING = 1
Ascending sort order.
pymongo.DESCENDING = -1
Descending sort order.
I remember that once I asked my colleague to use the constants
pymongo.ASCENDING
and pymongo.DESCENDING
instead of integers 1
and -1
because when
seeing 1
and -1
in the source code, it's not
obvious what they mean. Whereas, the constants convey good amount of
information.
If I were to implement a similar method, I would define constants in the same way as PyMongo because integers hardly have information/readability.
But I'm not sure. There may be some advantages of simply using integers. Please correct me if my understanding is stupid.
PyMuPDF/src_classic/helper-python.i
= 1
TEXT_PRESERVE_LIGATURES = 2
TEXT_PRESERVE_WHITESPACE = 4
TEXT_PRESERVE_IMAGES = 8
TEXT_INHIBIT_SPACES = 16
TEXT_DEHYPHENATE = 32
TEXT_PRESERVE_SPANS = 64
TEXT_MEDIABOX_CLIP = 128
TEXT_CID_FOR_UNKNOWN_UNICODE
= (0
TEXTFLAGS_BLOCKS | TEXT_PRESERVE_LIGATURES
| TEXT_PRESERVE_WHITESPACE
| TEXT_MEDIABOX_CLIP
| TEXT_CID_FOR_UNKNOWN_UNICODE
)
def get_text(
page: Page,str = "text",
option: = None,
clip: rect_like = None,
flags: OptInt = None,
textpage: TextPage bool = False,
sort: =None,
delimiters
):= {
formats "text": fitz.TEXTFLAGS_TEXT,
"html": fitz.TEXTFLAGS_HTML,
"json": fitz.TEXTFLAGS_DICT,
"rawjson": fitz.TEXTFLAGS_RAWDICT,
"xml": fitz.TEXTFLAGS_XML,
"xhtml": fitz.TEXTFLAGS_XHTML,
"dict": fitz.TEXTFLAGS_DICT,
"rawdict": fitz.TEXTFLAGS_RAWDICT,
"words": fitz.TEXTFLAGS_WORDS,
"blocks": fitz.TEXTFLAGS_BLOCKS,
}
# ...
if option == "blocks":
return get_text_blocks(
=clip, flags=flags, textpage=textpage, sort=sort
page, clip )
My brain cells are dying due to fatigue, but I would like to note that the flags are managed with the power of 2.
For example, 3 = b11
means
TEXT_PRESERVE_LIGATURES (= 1)
and
TEXT_PRESERVE_WHITESPACE (= 2)
are turned on.
It is embarassing, but I have never seen this implementation before. I find it a great way of managing multiple flags; it's just easy to understand and manage.
I hope I get a chance to do this someday, in some projects.
Rice 800 Bagels 500 Yogurt 300 Protein shake 200
Total 1800 kcal
10k run
I watched a YouTube video saying that 60~70% of the max heart rate is best for jogging.
My max heart rate is around 200, maybe, and I tried to keep my heart rate around 120~140.
It turned out that my heart rate bumped to 150 and ended up at 170, even though I tried to run at the slowest speed possible.
How is it possible to keep a heart rate below 60% of max? Interesting.
MUST:
TODO: