Quantcast
Channel: Developer Express Inc.
Viewing all articles
Browse latest Browse all 2370

Word Processing Document API, Rich Text Editors (WinForms and WPF) – Reducing Document File Size (Best Practices)

$
0
0

Reducing document file size can improve document import/processing related operations. It can also help minimize file storage requirements in databases and cloud servers. In this blog post, I will describe different strategies to reduce Microsoft Word document file size using our Word Processing APIs.

IMPORTANT NOTE: The strategies outlined below involve the removal of document content. Deleted content cannot be restored. 

Simplify Your Documents

While obvious, document simplification is the best way to reduce/optimize file size. Simplification strategies include: 

  • Where possible, use a limited set of styles to format document content.
  • Convert the document from DOCM to DOCX format to eliminate macros. You can also use our RichEditDocumentServer.Options.DocumentCapabilities.Macros option to disable macros.
  • Disable Track Changes before saving the document. RichEditDocumentServer includes a RichEditDocumentServer.Options.DocumentCapabilities.TrackChanges property designed to disable tracking.
  • Reduce graphics content.
  • Use linked OLE objects instead of embedded OLE objects. If you are unable to use linked OLE objects, you can reduce embedded OLE object size or remove it before saving. Refer to the following article for additional information on OLE object support: OLE Objects in Word Documents
  • Reduce the use of fields and content controls. Unlink or remove fields before saving (see Replace Fields with Field Values to learn more).
  • Replace charts with compressed images.
  • Remove additional metadata (XML data, document properties, comments, RTF theme data).
  • Divide long tables into multiple short tables. In most instances, long tables do not affect file size but slow document rendering and layout calculation.

Use the OpenXML Format Instead of Legacy Formats

The OpenXML format (DOCX) is modern, open, and compatible across multiple platforms. While more efficient in certain scenarios, legacy formats (such as DOC, RTF) are proprietary and less flexible. OpenXML files are essentially ZIP archives with XML files and additional resources (like images and styles). As such, DOCX files are easier to store in a database. You can use our RichEditDocumentServer.Save method to convert documents into the desired file format.

Do Not Embed Fonts

The DevExpress Word Processing Document API allows you to embed fonts into your document. While documents with embedded fonts maintain appearance characteristics across different computing devices, these documents are much larger in size. If your solution displays documents in a controlled/managed environment, we recommend the use of the DevExpress DXFontRepository class. Please refer to the following help topic for additional information: Load and Use Custom Fonts Without Installation on the System

Reduce Image Size

You can use a third-party app to compress document images. Once compressed, simply call the PictureFormat.SetPicture method to replace the original image(s) with its compressed equivalent(s).

The following code snippet replaces the original image with its compressed equivalent:

using (RichEditDocumentServer wordProcessor = new RichEditDocumentServer()) {
    wordProcessor.LoadDocument("doc_with_images.docx");
    Document document = wordProcessor.Document;
    Shape shape = document.Shapes[0];
    DXImage sourceImage = shape.PictureFormat.Picture.DXImage;
    MemoryStream imageStream  = new MemoryStream();
    sourceImage.Save(stream);
    //Compress the image saved in the stream
    //...
    DXImage compressedImage = DXImage.FromStream(updatedImageStream);
    shape.PictureFormat.SetPicture(compressedImage);
}

Another tip is not to crop images. Use a saved pre-cropped version instead. You can use the PictureFormat.SourceRect property to crop the image in code and then save the output. The PictureFormat.SetPicture method allows you to replace the image with its cropped version.

The following code snippet crops an image, saves it, and then replaces the original image with its cropped equivalent:

using (RichEditDocumentServer wordProcessor = new RichEditDocumentServer()) {
  wordProcessor.LoadDocument("CroppedImages.docx");
  Document document = wordProcessor.Document;
  Shape shape = document.Shapes[0];
      if (shape.PictureFormat != null) { 
        DXBitmap image = shape.PictureFormat.Picture.DXImage as DXBitmap;
        var rectOffset = shape.PictureFormat.SourceRect;
        RectangleF imageRect = new RectangleF(image.Width * rectOffset.LeftOffset, 
            image.Height * rectOffset.TopOffset, 
              image.Width - image.Width * rectOffset.LeftOffset - image.Width * rectOffset.RightOffset, 
                image.Height - image.Height * rectOffset.TopOffset - image.Height * rectOffset.BottomOffset);
        MemoryStream imageStream = new MemoryStream();
        image.Crop(imageRect).Save(imageStream, image.ImageFormat);
        DocumentImageSource source = DocumentImageSource.FromStream(imageStream);
        shape.PictureFormat.SetPicture(source);
        shape.PictureFormat.SourceRect = new RectangleOffset();
    }
}

If large image use is necessary and app architecture allows you to store images separately, you can employ the following workaround. Iterate the document's shape collection and save all images to a database with unique identifiers. Once complete, replace original document images with empty images, or DOCVARIABLE fields (for dynamic image replacement), or remove images and mark their position in the document with bookmarks. By using this strategy, you will be able to save a lightweight version of the document and restore original document images when necessary:

Document document = wordProcessor.Document;
// iterate through document images, save them to the database 
// and replace original images with an empty image
int imageID = 1; // generate an image ID as you require
DocumentImageSource emptyImageSource = DocumentImageSource.FromImage(new DXBitmap(1, 1));
for (int i = document.Shapes.Count - 1; i >= 0; i--)
{
    Shape shape = document.Shapes[i];
    if (shape.PictureFormat != null)
    {
        DXBitmap image = shape.PictureFormat.Picture.DXImage as DXBitmap;
        using (MemoryStream imageStream = new MemoryStream()) {
            image.Save(imageStream, image.ImageFormat);
            byte[] imageBytes = imageStream.ToArray();
            // save image bytes to the database with the specified image ID
            // ...
            // change the image name (if required) to identify it later
            shape.Name = "Image " + imageID.ToString();
            // replace the current image with the empty image
            shape.PictureFormat.SetPicture(emptyImageSource);
        }
        imageID++;
    }
}
// save the document with dummy images
using (MemoryStream documentStream = new MemoryStream())
    document.SaveDocument(documentStream, DocumentFormat.OpenXml);
 
//...
// restore document images
richEditControl.LoadDocument(documentStream, DocumentFormat.OpenXml);
Document document = richEditControl.Document;
for (int i = document.Shapes.Count - 1; i >= 0; i--)
{
    Shape shape = document.Shapes[i];
    if (shape.PictureFormat != null)
    {
        string imageName = shape.Name;
        // extract the required image from the database by name
        byte[] imageBytes = ...;
        using(MemoryStream imageStream = new MemoryStream(imageBytes))
        {
            // replace the empty image with the original image
            DocumentImageSource imageSource = DocumentImageSource.FromStream(imageStream);
            shape.PictureFormat.SetPicture(imageSource);
        }
    }
}

Your Feedback Counts

Help us shape the future of our Office/PDF File API libraries. Please take a moment to respond to the following survey questions:


Viewing all articles
Browse latest Browse all 2370

Trending Articles