How to Edit DOCX files in PHP

Tara Prasad Swain PHP

There are many PHP free and paid libraries available to manipulate docx files. They provide features to add text, html, heading, table, images and other elements.

Docx files are gzipped XML files. A single docx file is made of many XML files those are gzipped together in a certain format.

So the docx documents can be easily edited in any Rich Text Editor we can have some mechanism in PHP to convert the docx into HTML and reversing the HTML into docx.

If we dig more into that, there are many Linux libraries available to that. One famous Linux library I came across with is Pandoc.

I found that Pandoc can convert documents in markdown, reStructuredText, textile, HTML, DocBook, LaTeX, MediaWiki markup, TWiki markup, OPML, Emacs Org-Mode, Txt2Tags, Microsoft Word docx, LibreOffice ODT, EPUB, or Haddock markup. That’s really crazy and reliable.

The installation process is very easy if you are familiar with Linux systems. You can compile from source code if the build is not available for your Linux distribution.

So now the logic is call system command using shell exec or exec through PHP, which will convert the docx document into HTML and can be displayed in the browser to inform of editable HTML in any Rich Text HTML Editor. And we can re-create the docx file from the updated HTML using the same system command.

You can refer the below functions.

public static function getDocxHTML($filename = '') {
        if (file_exists($filename) and $filename != '') {
            $tempfile = ROOT . '/tmp/test.html';

            $command = PANDOC_PATH . " -s '{$filename}' -o '$tempfile'";

            $str = shell_exec($command);

            if (file_exists($tempfile)) {
                $html = file_get_contents($tempfile);
                @unlink($tempfile);
                return $html;
            }
        }
        return '';
    }

    public static function updateDocx($filename = '', $data = '') {
        if (file_exists($filename) and $filename != '') {

            $tempfile = ROOT . '/tmp/test.html';

            file_put_contents($tempfile, $data);

            $command = PANDOC_PATH . " -s '{$tempfile}' -o '{$filename}'";

            $str = shell_exec($command);

            if (file_exists($tempfile)) {
                @unlink($tempfile);
                return true;
            }
        }
        return false;
    }

Hope this helps you.

Share this Post