You are viewing a single thread.
View all comments View context
5 points

I know PDF providers who visibly print the customer’s name or number in the header of every page, along with short copyright text. I use qpdf --stream-decompress to make the PDF into human-readable PostScript, and then Python+regex to remove each header text, which stand out a bit from other PDF elements. The script throws an error if more or fewer elements than pages have been removed but that hasn’t happened yet. Processed documents sometimes have screwed-up non-ASCII characters in the Table of Contents for some reason but I don’t have the originas anymore so IDK if it’s my fault. Still, I wouldn’t share the PDFs unless in text-only or printed form because of any other steganographic shenanigans in the file. I would absolutely torrent them if I could repurchase them under a new identity and verify that the files are identical.

BTW, has anyone figured out how to embed Python code in PDF? The whitespace always gets reencoded as x-coordinates so copy&pasting it never preserves indentation. No, you can’t use the Ogham Space Mark (Unicode’s only non-blank character classified as a space) for indentation in Python, I tried.

permalink
report
parent
reply

Science Memes

!science_memes@mander.xyz

Create post

Welcome to c/science_memes @ Mander.xyz!

A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.



Rules

  1. Don’t throw mud. Behave like an intellectual and remember the human.
  2. Keep it rooted (on topic).
  3. No spam.
  4. Infographics welcome, get schooled.


Research Committee

Other Mander Communities

Science and Research

Biology and Life Sciences

Physical Sciences

Humanities and Social Sciences

Practical and Applied Sciences

Memes

Miscellaneous

Community stats

  • 12K

    Monthly active users

  • 2.8K

    Posts

  • 67K

    Comments