

'- Remove the containing bookmarks and endĪctiveDocument.Bookmarks("pStart"). ' in case any copied lines ended with space end-or-line. and change any double spaces to single space then any double markers to an end-of-paragraph. 'Perform the F&Rs to convert end-of-line ¶s to a unique marker. Selection.GoTo What:=wdGoToBookmark, Name:="pStart" '- Select the pasted content between the bookmarks Add Range:=Selection.Range, Name:="pEnd" Add Range:=Selection.Range, Name:="pStart" '- Add a starting bookmark paste in the clipboard content add an ending bookmark ' and double ¶s to denote a wanted end-of-paragraph. '- Q&D recorded code to paste content of copied PDF text with line endings with ¶ Here's the code (Edit: I keep forgetting how Reddit needs code blocks formatted!): I had the code associated with a button in a custom group, so I could copy from the PDF, then click in the Word document where I wanted it and click the button to insert it cleaned up. It also changes any double spaces that may be included in some PDFs before the ends-of-lines. If I was redoing it, I would probably use ranges rather than the selection methods that come in from recording, but it works.Įssentially, it pastes whatever is on the clipboard between two bookmarks, then does what u/EddieRyanDC recommended to change ends-of lines to spaces and double Enters to a single paragraph mark. Turns out it was a slightly modified recorded macro I'd put together for a specific job a few years ago. I was pretty sure I had some VBA code to do this somewhere. Once you get the hang of it, it takes about 60 seconds. When you do this, everything that had 2 paragraph codes is assumed to be the end of a paragraph, and all the extra paragraph marks have been removed.

I have cleaned up hundreds of these over the years and here is my process:ĭo a Find & Replace, replacing ^p1^p (2 paragraph codes) with some characters that will never naturally occur in the document - like "%%%".ĭo a Find & Replace, replacing ^p with " " (a space).ĭo a Find & Replace, replacing "%%%" with ^p. OCR has no way of telling the difference between the end of a line and the end of a paragraph. The text has been added by running the PDF through OCR (Optical Character Recognition). However, many PDF's aren't created like that directly. They are valid character codes and they are going to go with the text.Īdobe Acrobat (and other programs) can create PDFs that wrap paragraphs the way Word does.

The paragraph marks are showing up in Word because they are there in the PDF.
