Merge text from one PDF into another while preserving style and layout? [closed]

I have two PDF CV files: First CV (source) – Contains the text/content I want to reuse. Second CV (target) – Has the design (fonts, colors, layout) that I want to preserve. Both PDFs are text-based and share the same high-level structure: they have the same sections in the same order (e.g., Education, Experience, Skills). However, the type of content can differ—for example, a table in the source CV could correspond to a paragraph of text in the target CV. My goal is to fill in the target CV’s layout with the text from the source CV. I want to maintain the exact style, font, color, and layout of the target CV—essentially only replacing the textual content while leaving all design elements intact. What I've tried so far 1- Convert each PDF page to an image and use GPT/OCR to extract and rebuild HTML/CSS: This approach was not very accurate. The reconstructed HTML/CSS often failed to match the original layout, especially for complex sections or tables. 2- Using libraries like PyPDF or PyMuPDF to replace text: While these libraries can manipulate PDFs at the text level, the biggest challenge is handling content size differences. If the source text is longer (or shorter) than the target text space, it breaks the intended layout. Moreover, PDFs often don’t have a straightforward “flow” of text, making it difficult to do a simple one-to-one text replacement without layout shifts. The challenge PDFs are not primarily designed for reflowable text: If the new content doesn’t fit the space of the old text, the layout breaks. Sections can differ structurally (e.g., a table in one vs. a paragraph in the other), so simple text replacement often fails to adapt the layout. Question Is there a reliable, programmatic way to merge the content from the source CV into the target CV while preserving the target’s layout, style, and formatting? I’m looking for any approaches, libraries, or workflows that can handle: Complex layout differences (tables vs. paragraphs). Potential mismatches in text length. Preserving fonts, colors, and style from the target PDF. I’d appreciate any insights on strategies—maybe using advanced PDF manipulation libraries, template-based approaches, or a multi-step conversion to a more editable format (e.g., .docx or .odt) then re-exporting to PDF—to achieve this goal. Thank you!

Feb 25, 2025 - 10:07
 0
Merge text from one PDF into another while preserving style and layout? [closed]

I have two PDF CV files:

First CV (source) – Contains the text/content I want to reuse.
Second CV (target) – Has the design (fonts, colors, layout) that I want to preserve.

Both PDFs are text-based and share the same high-level structure: they have the same sections in the same order (e.g., Education, Experience, Skills). However, the type of content can differ—for example, a table in the source CV could correspond to a paragraph of text in the target CV.

My goal is to fill in the target CV’s layout with the text from the source CV. I want to maintain the exact style, font, color, and layout of the target CV—essentially only replacing the textual content while leaving all design elements intact.

What I've tried so far

1- Convert each PDF page to an image and use GPT/OCR to extract and rebuild HTML/CSS:
This approach was not very accurate. The reconstructed HTML/CSS often failed to match the original layout, especially for complex sections or tables.

2- Using libraries like PyPDF or PyMuPDF to replace text:
While these libraries can manipulate PDFs at the text level, the biggest challenge is handling content size differences. If the source text is longer (or shorter) than the target text space, it breaks the intended layout. Moreover, PDFs often don’t have a straightforward “flow” of text, making it difficult to do a simple one-to-one text replacement without layout shifts.

The challenge

PDFs are not primarily designed for reflowable text:

  • If the new content doesn’t fit the space of the old text, the layout breaks.
  • Sections can differ structurally (e.g., a table in one vs. a paragraph in the other), so simple text replacement often fails to adapt the layout.

Question

Is there a reliable, programmatic way to merge the content from the source CV into the target CV while preserving the target’s layout, style, and formatting? I’m looking for any approaches, libraries, or workflows that can handle:

  • Complex layout differences (tables vs. paragraphs).
  • Potential mismatches in text length.
  • Preserving fonts, colors, and style from the target PDF.

I’d appreciate any insights on strategies—maybe using advanced PDF manipulation libraries, template-based approaches, or a multi-step conversion to a more editable format (e.g., .docx or .odt) then re-exporting to PDF—to achieve this goal.

Thank you!