1

I hope this is the right site for this question...

I tutor in my spare time, and recently I downloaded a past GCSE exam paper and was attempting to edit the pdf- to select just a few pages to send to my student. Unfortunately, I could not do so because the pdf is password protected for editing.

Perhaps this is a silly question, but I am wondering how this works? Where is the security status of the pdf kept when I download it? What exactly is it that I am downloading when I click download for the pdf? I presume there is some code that describes the pdf, security etc as well as its contents. Is there a way for me to view where this is? Where is the 'password' stored? Presumably it is somewhere among the code describing the pdf, so that my computer can check against it when I try to gain access? But then how can the password be stored in code without me being able to decode what it is?

Sorry if some of these questions are basic. I don't have much of a formal background in computer science.

EDIT: Could the down-voters please tell me what is wrong with the question? Is it not suitable for the site? Is there a way I can improve it?

Meep
  • 127
  • 2

2 Answers2

3

A PDF file consists of a series of objects, and dictionary is one of the most important ones.

Every document has a "trailer dictionary" which holds references to a few important things, and optionally to an encryption dictionary. The encryption dictionary contains the information needed to decrypt the document. An example from here:

% Trailer dictionary
trailer
<<
    /Size 95         % number of objects in the file
    /Root 93 0 R     % the page tree is object ID (93,0)
    /Encrypt 94 0 R  % the encryption dict is object ID (94,0)
    /ID [<1cf5...>]  % an arbitrary file identifier
>>

% Encryption dictionary
94 0 obj
<<
    /Filter /Standard   % use the standard security handler
    /V 1                % algorithm 1
    /R 2                % revision 2
    /U (xxx...xxx)      % hashed user password (32 bytes)
    /O (xxx...xxx)      % hashed owner password (32 bytes)
    /P 65472            % flags specifying the allowed operations
>>
endobj

As you can see, you need user password in order to decrypt the file and get allowed operations. The allowed operations are specified by /P, and each bit in its position represents a different access operation.

In the above example, the permission flags are 65472 (decimal) or 1111111111000000 (binary). Bits 0 and 1 are reserved (always 0), bit 2 is the print permission (0 here, meaning that printing is not allowed), and bits 3, 4, and 5, are the "modify", "copy text", and "add/edit annotations" permissions.

You may refer to table 3.20 user access permissions described in the pdf reference file on page 123.

lennon310
  • 3,132
  • 6
  • 16
  • 33
1

But then how can the password be stored in code without me being able to decode what it is?

Additionally to the answer by lennon310, it may be important to note that this sort of security is based on the trust between Adobe and the different PDF editors. In other words, all PDF editors are expected to behave responsibly, and to follow the protection flags set in the document.

If an app doesn't observe those flags—and there are a lot of apps like that on shady websites—the PDF can be edited without needing for a password. However, by using those apps, the user puts himself at risk of possibly breaching the agreement between him and the owner of the PDF file, as well as running untrusted software on his machine, which may decide not to limit itself to make the PDF file editable, but also, say, put a keylogger to collect passwords and credit card numbers, or encrypt a bunch of files and ask for ransom.

Arseni Mourzenko
  • 134,780
  • 31
  • 343
  • 513