In 13.1 we released a beta for our new pdf viewer. This excellent addition to our control suite enables an in app viewing experience for your pdf documents. Given that these documents can be loaded by filename or even from a stream of bits, you have the ultimate flexibility in loading and displaying pdf documents in a controlled manner.
Consider the scenario, however, where you have hundreds (or thousands+) of pdf documents. Invariably the boss might one day ask for that one pdf document with specific text (any lawyers out there?). How does one efficiently search these documents without going opening each and every document? Starting in 13.2 we are greatly increasing your ability to manage and work with these documents within code.
var search = "parameters"; PdfDocumentProcessor processor = new PdfDocumentProcessor(); processor.LoadDocument("CSharpSpec.pdf"); var searchParams = new PdfTextSearchParameters { CaseSensitive = false, WholeWords = true }; var results = processor.FindText(search, searchParams); while (results.Status == PdfTextSearchStatus.Found) { var text = string.Join(", ", results.Words.Select(p => p.Text).ToArray()); Console.WriteLine("Found \"{0}\" on page {1}", text, results.PageIndex); results = processor.FindText(search, searchParams); } Console.WriteLine("That's all folks!"); Console.ReadKey();
Notice how easy it is to load up a pdf document processor and search for specific text. Now imagine doing this across your entire library of pdf documents!
In the age of “big data” it is imperative that we, as developers, have the ability work with any type of data: be it structured or unstructured. Indeed the best way to derive the greatest value from our data is the ability to handle it all at once. I think this tool will greatly help with pdfs!
As always, if there are any comments and/or questions, feel free to get a hold of me!
Seth Juarez
Email: sethj@devexpress.com
Twitter: @SethJuarez