The phenomenon is best known in Microsoft Word. It's well known that Track Changes can hold deleted information, but so can many other features of the software. For example, the little-known Fast Save feature, developed in the days when hard drives were very slow, retains deleted blocks of data to accelerate synchronization between memory and the disk.
PDF, too, can carry hidden information. PDF presents text and graphics cleanly, but inside it's a mess of elements layered, hidden, and arranged in no obvious relation to the external appearance.
Even TIFF, a multipage graphical format, is a complex wrapper for multiple images and multiple text tags (TIFF stands for Tagged Image File Format). You might thing that you are safe with PNG or JPEG, simple image formats, but these two also allow textual tags. The tags are mostly intended for simple metadata like author, date of creation, location etc, but even these can be incriminating--and any other private text could in principle be hiding in the tags.
In principle, while redacting visible text, we could also extract the invisible text or graphics and redact sensitive entities with our combined automated/human process, as we do for the visible part of the document. But the hidden data is, internally, a jumble of unordered fields; it is not meant to be read. In some cases, it is nearly impossible to reconstruct how to present the text to a machine or human reader, as when an internal script builds up some text. For example, if a macro calculates a person's age from her birthdate, it's unlikely to be found by an automated system or even a human, yet it might might using birthdate data in fields which are also hidden in the document. In a scenario where ages are considered sensitive, for example, where discrimination lawsuits are a risk, such information needs to be found and deleted.