C# tutorial: extract images from a PDF file


Extracting images from a PDF file

In the previous tutorial, you learnt how to extract all text from a PDF file. Besides text, you might want to get images from the PDF file. In this tutorial, I am going to show you how to extract images from a PDF file.

An image you see in a PDF is an object stream. An object can be retrieved from the PDF by using the GetPdfObject method of the PdfReader. Because there are different types of objects stored in the PDF, you need to check whether the retrieved object is an image. This can be done by converting the object to PRStream and using its Get method to get the type of the object. Then you will compare this type with the PdfName.IMAGE type. After the type object is determined, you will create an instance of PdfImageObject from the image object stream. With the instance of PdfImageObject you can get the data of the image in an array of bytes with the GetImageAsBytes method. This array of bytes will be saved to your disk by using the Write method of an instance of FileStream class.



The example code below extracts images from the jmf_tutorial.pdf file. The extracted images are saved in D:/imagesextracted folder. The picture below shows you the images that are extracted from the file and saved to the D:/imagesextracted folder.

PdfReader reader = new PdfReader("D:/jmf_tutorial.pdf");
PRStream pst;
PdfImageObject pio;
PdfObject po;
int n = reader.XrefSize; //number of objects in pdf document
FileStream fs=null;
String path="D:/imagesextracted/";
try
{

for (int i = 0; i < n; i++)
{

po = reader.GetPdfObject(i); //get the object at the index i in the objects collection
if (po == null || !po.IsStream()) //object not found so continue
continue;
pst = (PRStream)po; //cast object to stream
PdfObject type = pst.Get(PdfName.SUBTYPE); //get the object type
//check if the object is the image type object
if (type != null && type.ToString().Equals(PdfName.IMAGE.ToString()))
{

pio = new PdfImageObject(pst); //get the image
fs = new FileStream(path + "image" + i + ".jpg", FileMode.Create);
//read bytes of image in to an array
byte[] imgdata=pio.GetImageAsBytes();
//write the bytes array to file
fs.Write(imgdata, 0, imgdata.Length);
fs.Flush();
fs.Close();

   }
  }



}
catch (Exception e) { Console.WriteLine(e.Message); }

pdf extract images


Comments

lorretadt comment

 lorretadt

You can download the trial version from our website to Create c# pdf to image on rasteredge.com http://www.rasteredge.com/how-to/csharp-imaging/pdf-convert-raster/. Trial version will have certain limitations in terms of number of files you can convert.



2016-07-05
lorretadt comment

 lorretadt

I'm not a developer, i always use this free online pdf to image converter to convert pdf to image online http://www.rasteredge.com/how-to/csharp-imaging/pdf-convert-raster/


2016-07-05
lorretadt comment

 lorretadt

extract color image from pdf in c#
on this page http://www.rasteredge.com/how-to/csharp-imaging/pdf-image-extract/


2016-04-19
JonyGreen comment

 JonyGreen

I'm not a developer, i always use this free online pdf to image converter to extract images from pdf(http://www.online-code.net/pdf-to-image.html)


2015-10-08



This website intents to provide free and high quality tutorials, examples, exercises and solutions, questions and answers of programming and scripting languages:
C, C++, C#, Java, VB.NET, Python, VBA,PHP & Mysql, SQL, JSP, ASP.NET,HTML, CSS, JQuery, JavaScript and other applications such as MS Excel, MS Access, and MS Word. However, we don't guarantee all things of the web are accurate. If you find any error, please report it then we will take actions to correct it as soon as possible.