drupal hit counter
Jerry Huang | All posts tagged 'pdf2tiff'

Jerry Huang apps and developing apps

PDF to Tiff

18. July 2010 22:33 by Jerry in Old blog posts

I don’t know why, it seems to be hard to find something to convert PDF to Tiff over the web. Maybe TIFF is not that popular than other formats. Few months ago, I had a task at work to find a new way to generate tiff files out of crystal report. Previously we have a batch job that print the crystal report to a virtual printer to produce a tiff file. It was working fine, but from now and then the job hit an IO error couple times in a month. I was pissed off by being called at night (the job was running overnight). So I decided to revamp this piece of sxxx process.

After some searching, I realized that I need to solve 2 problems:

  • PDF to TIFF by using 3rd party component. I chose Ghostscript API, because it’s free and simple (one dll).
    The only issue of Ghostscript API (or other components) in this case is that, if the PDF has multiple pages, it produces multiple tiff files which that each tiff representing one page in the PDF. So it brings us to the next step;
  • Group all single page tiff files into one multiple pages TIFF

The source code provide below is running quite stable as batch job for a few months.

References

Ghostscript: http://www.ghostscript.com/ you need to put the gsdll32.dll in the same folder of your dll/exe

for step 2, the related source code was convert from C# in this article:
http://www.codeproject.com/KB/GDI-plus/SaveMultipageTiff.aspx
However, the ConvertToBitonal function has a bug. You need to dispose the original bitmap object (original.Dispose();) before returning destination.

Usage

ConvertFile("c:\2.tiff", "C:\temp\pdf\PWBSample.pdf", "c:\temp\tiff", "-IC:\Program Files\gs\gs8.70\lib;C:\Program Files\gs\fonts;C:/Windows/fonts")
 

Source Code

 

' Copyright (c) 2002 Dan Mount and Ghostgum Software Pty Ltd
'
' Permission is hereby granted, free of charge, to any person obtaining
' a copy of this software and associated documentation files (the
' "Software"), to deal in the Software without restriction, including
' without limitation the rights to use, copy, modify, merge, publish,
' distribute, sublicense, and/or sell copies of the Software, and to
' permit persons to whom the Software is furnished to do so, subject to
' the following conditions:
'
' The above copyright notice and this permission notice shall be
' included in all copies or substantial portions of the Software.
'
' THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
' EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
' MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
' NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
' BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
' ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
' CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
' SOFTWARE.


' This is an example of how to call the Ghostscript DLL from
' Visual Basic.NET. There are two examples, one converts
' colorcir.ps to PDF, the other is like command line Ghostscript.
' The display device is not supported.
'
' This code is not compatible with VB6. There is another
' example which does work with VB6. Differences include:
' 1. VB.NET uses GCHandle to get pointer
' VB6 uses StrPtr/VarPtr
' 2. VB.NET Integer is 32bits, Long is 64bits
' VB6 Integer is 16bits, Long is 32bits
' 3. VB.NET uses IntPtr for pointers
' VB6 uses Long for pointers
' 4. VB.NET strings are always Unicode
' VB6 can create an ANSI string
' See the following URL for some VB6 / VB.NET details
' http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnvb600/html/vb6tovbdotnet.asp

Option Explicit On

Imports System.Runtime.InteropServices
Imports System.Drawing.Imaging
Imports System.IO
Imports System.Drawing

Module gsapi

Public Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (ByVal dest As IntPtr, ByVal source As IntPtr, ByVal bytes As Long)

'------------------------------------------------
'UDTs Start
'------------------------------------------------
<StructLayout(LayoutKind.Sequential)> Public Structure GS_Revision
Public strProduct As IntPtr
Public strCopyright As IntPtr
Public intRevision As Integer
Public intRevisionDate As Integer
End Structure
'------------------------------------------------
'UDTs End
'------------------------------------------------

'------------------------------------------------
'Callback Functions Start
'------------------------------------------------
'These are only required if you use gsapi_set_stdio
Public Delegate Function StdioCallBack(ByVal handle As IntPtr, ByVal strptr As IntPtr, ByVal count As Integer) As Integer

Public Function gsdll_stdin(ByVal intGSInstanceHandle As IntPtr, ByVal strz As IntPtr, ByVal intBytes As Integer) As Integer
' This is dumb code that reads one byte at a time
' Ghostscript doesn't mind this, it is just very slow
If intBytes = 0 Then
gsdll_stdin = 0
Else
Dim ich As Integer = Console.Read()
If ich = -1 Then
gsdll_stdin = 0 ' EOF
Else
Dim bch As Byte = ich
Dim gcByte As GCHandle = GCHandle.Alloc(bch, GCHandleType.Pinned)
Dim ptrByte As IntPtr = gcByte.AddrOfPinnedObject()
CopyMemory(strz, ptrByte, 1)
ptrByte = IntPtr.Zero
gcByte.Free()
gsdll_stdin = 1
End If
End If
End Function

Public Function gsdll_stdout(ByVal intGSInstanceHandle As IntPtr, ByVal strz As IntPtr, ByVal intBytes As Integer) As Integer
' If you can think of a more efficient method, please tell me!
' We need to convert from a byte buffer to a string
' First we create a byte array of the appropriate size
Dim aByte(intBytes) As Byte
' Then we get the address of the byte array
Dim gcByte As GCHandle = GCHandle.Alloc(aByte, GCHandleType.Pinned)
Dim ptrByte As IntPtr = gcByte.AddrOfPinnedObject()
' Then we copy the buffer to the byte array
CopyMemory(ptrByte, strz, intBytes)
' Release the address locking
ptrByte = IntPtr.Zero
gcByte.Free()
' Then we copy the byte array to a string, character by character
Dim str As String = ""

For i As Integer = 0 To intBytes - 1
str = str + Chr(aByte(i))
Next
' Finally we output the message
Console.Write(str)
gsdll_stdout = intBytes
End Function

Public Function gsdll_stderr(ByVal intGSInstanceHandle As IntPtr, ByVal strz As IntPtr, ByVal intBytes As Integer) As Integer
gsdll_stderr = gsdll_stdout(intGSInstanceHandle, strz, intBytes)
End Function
'------------------------------------------------
'Callback Functions End
'------------------------------------------------

'------------------------------------------------
'API Calls Start
'------------------------------------------------
'Win32 API
'GhostScript API
' Public Declare Function gsapi_revision Lib "gsdll32.dll" (ByVal pGSRevisionInfo As IntPtr, ByVal intLen As Integer) As Integer
Public Declare Function gsapi_revision Lib "gsdll32.dll" (ByRef pGSRevisionInfo As GS_Revision, ByVal intLen As Integer) As Integer
Public Declare Function gsapi_new_instance Lib "gsdll32.dll" (ByRef lngGSInstance As IntPtr, ByVal lngCallerHandle As IntPtr) As Integer
Public Declare Function gsapi_set_stdio Lib "gsdll32.dll" (ByVal lngGSInstance As IntPtr, ByVal gsdll_stdin As StdioCallBack, ByVal gsdll_stdout As StdioCallBack, ByVal gsdll_stderr As StdioCallBack) As Integer
Public Declare Sub gsapi_delete_instance Lib "gsdll32.dll" (ByVal lngGSInstance As IntPtr)
Public Declare Function gsapi_init_with_args Lib "gsdll32.dll" (ByVal lngGSInstance As IntPtr, ByVal lngArgumentCount As Integer, ByVal lngArguments As IntPtr) As Integer
Public Declare Function gsapi_run_file Lib "gsdll32.dll" (ByVal lngGSInstance As IntPtr, ByVal strFileName As String, ByVal intErrors As Integer, ByVal intExitCode As Integer) As Integer
Public Declare Function gsapi_exit Lib "gsdll32.dll" (ByVal lngGSInstance As IntPtr) As Integer
'------------------------------------------------
'API Calls End
'------------------------------------------------

'------------------------------------------------
'User Defined Functions Start
'------------------------------------------------
Public Function StringToAnsiZ(ByVal str As String) As Byte()
' Convert a Unicode string to a null terminated Ansi string for Ghostscript.
' The result is stored in a byte array. Later you will need to convert
' this byte array to a pointer with GCHandle.Alloc(XXXX, GCHandleType.Pinned)
' and GSHandle.AddrOfPinnedObject()
Dim intElementCount As Integer
Dim intCounter As Integer
Dim aAnsi() As Byte
Dim bChar As Byte
intElementCount = Len(str)
ReDim aAnsi(intElementCount + 1)
For intCounter = 0 To intElementCount - 1
bChar = Asc(Mid(str, intCounter + 1, 1))
aAnsi(intCounter) = bChar
Next intCounter
aAnsi(intElementCount) = 0
StringToAnsiZ = aAnsi
End Function

Public Function AnsiZtoString(ByVal strz As IntPtr) As String
' We need to convert from a byte buffer to a string
Dim byteCh(1) As Byte
Dim bOK As Boolean = True
Dim gcbyteCh As GCHandle = GCHandle.Alloc(byteCh, GCHandleType.Pinned)
Dim ptrByte As IntPtr = gcbyteCh.AddrOfPinnedObject()
Dim j As Integer = 0
Dim str As String = ""
While bOK
' This is how to do pointer arithmetic!
Dim sPtr As New IntPtr(strz.ToInt64() + j)
CopyMemory(ptrByte, sPtr, 1)
If byteCh(0) = 0 Then
bOK = False
Else
str = str + Chr(byteCh(0))
End If
j = j + 1
End While
AnsiZtoString = str
End Function

Public Function CheckRevision(ByVal intRevision As Integer) As Boolean
' Check revision number of Ghostscript
Dim intReturn As Integer
Dim udtGSRevInfo As GS_Revision
Dim gcRevision As GCHandle
gcRevision = GCHandle.Alloc(udtGSRevInfo, GCHandleType.Pinned)
intReturn = gsapi_revision(udtGSRevInfo, 16)
Console.WriteLine("Revision = " & udtGSRevInfo.intRevision)
Console.WriteLine("RevisionDate = " & udtGSRevInfo.intRevisionDate)
Console.WriteLine("Product = " & AnsiZtoString(udtGSRevInfo.strProduct))
Console.WriteLine("Copyright = " & AnsiZtoString(udtGSRevInfo.strCopyright))

If udtGSRevInfo.intRevision = intRevision Then
CheckRevision = True
Else
CheckRevision = False
End If
gcRevision.Free()
End Function

Public Function CallGS(ByVal astrGSArgs() As String) As Boolean
Dim intReturn As Integer
Dim intGSInstanceHandle As IntPtr
Dim aAnsiArgs() As Object
Dim aPtrArgs() As IntPtr
Dim aGCHandle() As GCHandle
Dim intCounter As Integer
Dim intElementCount As Integer
'Dim iTemp As Integer
Dim callerHandle As IntPtr
Dim gchandleArgs As GCHandle
Dim intptrArgs As IntPtr

' Print out the revision details.
' If we want to insist on a particular version of Ghostscript
' we should check the return value of CheckRevision().
CheckRevision(704)

' Load Ghostscript and get the instance handle
intReturn = gsapi_new_instance(intGSInstanceHandle, callerHandle)
If (intReturn < 0) Then
Return (False)
End If

' Capture stdio
Dim stdinCallback As StdioCallBack
stdinCallback = AddressOf gsdll_stdin
Dim stdoutCallback As StdioCallBack
stdoutCallback = AddressOf gsdll_stdout
Dim stderrCallback As StdioCallBack
stderrCallback = AddressOf gsdll_stderr
intReturn = gsapi_set_stdio(intGSInstanceHandle, stdinCallback, stdoutCallback, stderrCallback)

If (intReturn >= 0) Then
' Convert the Unicode strings to null terminated ANSI byte arrays
' then get pointers to the byte arrays.
intElementCount = UBound(astrGSArgs)
ReDim aAnsiArgs(intElementCount)
ReDim aPtrArgs(intElementCount)
ReDim aGCHandle(intElementCount)
For intCounter = 0 To intElementCount
aAnsiArgs(intCounter) = StringToAnsiZ(astrGSArgs(intCounter))
aGCHandle(intCounter) = GCHandle.Alloc(aAnsiArgs(intCounter), GCHandleType.Pinned)
aPtrArgs(intCounter) = aGCHandle(intCounter).AddrOfPinnedObject()
Next
gchandleArgs = GCHandle.Alloc(aPtrArgs, GCHandleType.Pinned)
intptrArgs = gchandleArgs.AddrOfPinnedObject()
callerHandle = IntPtr.Zero

intReturn = gsapi_init_with_args(intGSInstanceHandle, intElementCount + 1, intptrArgs)

' Release the pinned memory
For intCounter = 0 To intElementCount
aGCHandle(intCounter).Free()
Next
gchandleArgs.Free()

' Stop the Ghostscript interpreter
gsapi_exit(intGSInstanceHandle)
End If

' release the Ghostscript instance handle
gsapi_delete_instance(intGSInstanceHandle)

If (intReturn >= 0) Then
CallGS = True
Else
CallGS = False
End If

End Function

Private Sub ClearTempFolder(ByVal path As String)
Dim folder As New DirectoryInfo(path)
Dim files() As FileInfo = folder.GetFiles()
For Each fi As FileInfo In files
fi.Delete()
Next
End Sub

Public Sub OutputToTiff(ByVal tifPath As String)
Dim folder As New DirectoryInfo(tmpFolder)
Dim files() As FileInfo = folder.GetFiles("*.tiff")
If files.Length > 0 Then
Dim imgs(files.Length - 1) As Image
SortByName(files)

For i As Integer = 0 To files.Length - 1
imgs(i) = Image.FromFile(files(i).FullName)
Next
saveMultipage(imgs, tifPath, "TIFF")
For Each img As Image In imgs
img.Dispose()
Next

'ClearTempFolder(tmpFolder)
End If
Directory.Delete(tmpFolder, True)
End Sub

Private tmpFolder As String

Private Function GetTempFolder() As String
Dim g As Guid = Guid.NewGuid()
While Directory.Exists(tmpFolder & "\" & g.ToString())
g = New Guid()
End While
Return tmpFolder & "\" & g.ToString()
End Function


Public Function ConvertFile(ByVal tifPath As String, ByVal pdfSource As String, ByVal tempFolder As String, ByVal startUpStr As String) As Boolean
tmpFolder = tempFolder

tmpFolder = GetTempFolder()
Directory.CreateDirectory(tmpFolder)
'ClearTempFolder("c:\temp\tiff")
Dim astrArgs(16) As String
astrArgs(0) = "accfax" 'The First Parameter is Ignored
astrArgs(1) = startUpStr
astrArgs(2) = "-dNOPAUSE"
astrArgs(3) = "-dBATCH"
astrArgs(4) = "-dSAFER"
astrArgs(5) = "-sDEVICE=tiffg4"
astrArgs(6) = "-r300"
astrArgs(7) = "-sOutputFile=" & tmpFolder & "\%03d.tiff"
astrArgs(8) = "-dDEVICEXRESOLUTION=204"
astrArgs(9) = "-dDEVICEYRESOLUTION=196"
astrArgs(10) = "-dDEVICEWIDTH=1686"
astrArgs(11) = "-dDEVICEHEIGHT=2292"
astrArgs(12) = "-dNOPLATFONTS"
astrArgs(13) = "-sFONTPATH=""c:\psfonts\"""
astrArgs(14) = "-c ""<< /Policies << /PageSize 5 >> /PageSize [595 842]/InputAttributes currentpagedevice /InputAttributes get mark exch {1index /Priority eq not {pop << /PageSize [595 842] >>} if } forall >>setpagedevice"" -f"""
astrArgs(15) = "-f"
astrArgs(16) = pdfSource
If CallGS(astrArgs) Then
OutputToTiff(tifPath)
End If

End Function

Private Function InteractiveGS() As Boolean
Dim astrArgs(2) As String
astrArgs(0) = "gs" 'The First Parameter is Ignored
astrArgs(1) = "-c"
astrArgs(2) = "systemdict /start get exec"
Return CallGS(astrArgs)
End Function


'------------------------------------------------
'User Defined Functions End
'------------------------------------------------

'Sub Main()
' ConvertFile()
' 'InteractiveGS()
' MsgBox("Done")
'End Sub

#Region "multi-pages tiff generation"
Private Function saveMultipage(ByVal bmp() As Image, ByVal location As String, ByVal type As String) As Boolean
If Not bmp Is Nothing Then
Try
Dim codecInfo As ImageCodecInfo = getCodecForstring(type)


For i As Integer = 0 To bmp.Length - 1
If bmp(i) Is Nothing Then
Exit For
End If
bmp(i) = CType(ConvertToBitonal(CType(bmp(i), Bitmap)), Image)
Next

If bmp.Length = 1 Then

Dim iparams As EncoderParameters = New EncoderParameters(1)
Dim iparam As Encoder = Encoder.Compression
Dim iparamPara As EncoderParameter = New EncoderParameter(iparam, CType((EncoderValue.CompressionCCITT4), Long))
iparams.Param(0) = iparamPara
bmp(0).Save(location, codecInfo, iparams)


ElseIf bmp.Length > 1 Then

Dim saveEncoder As Encoder
Dim compressionEncoder As Encoder
Dim SaveEncodeParam As EncoderParameter
Dim CompressionEncodeParam As EncoderParameter
Dim EncoderParams As EncoderParameters = New EncoderParameters(2)

saveEncoder = Encoder.SaveFlag
compressionEncoder = Encoder.Compression

' Save the first page (frame).
SaveEncodeParam = New EncoderParameter(saveEncoder, CType(EncoderValue.MultiFrame, Long))
CompressionEncodeParam = New EncoderParameter(compressionEncoder, CType(EncoderValue.CompressionCCITT4, Long))
EncoderParams.Param(0) = CompressionEncodeParam
EncoderParams.Param(1) = SaveEncodeParam

File.Delete(location)
bmp(0).Save(location, codecInfo, EncoderParams)



For i As Integer = 1 To bmp.Length - 1
If bmp(i) Is Nothing Then
Exit For
End If

SaveEncodeParam = New EncoderParameter(saveEncoder, CType(EncoderValue.FrameDimensionPage, Long))
CompressionEncodeParam = New EncoderParameter(compressionEncoder, CType(EncoderValue.CompressionCCITT4, Long))
EncoderParams.Param(0) = CompressionEncodeParam
EncoderParams.Param(1) = SaveEncodeParam
bmp(0).SaveAdd(bmp(i), EncoderParams)
bmp(i).Dispose()
Next

SaveEncodeParam = New EncoderParameter(saveEncoder, CType(EncoderValue.Flush, Long))
EncoderParams.Param(0) = SaveEncodeParam
bmp(0).SaveAdd(EncoderParams)
bmp(0).Dispose()
End If
Return True


Catch ee As System.Exception
Throw New Exception(ee.Message + " Error in saving as multipage ")
End Try
Else
Return False
End If

End Function

Private Function getCodecForstring(ByVal type As String) As ImageCodecInfo
Dim info() As ImageCodecInfo = ImageCodecInfo.GetImageEncoders()

Dim i As Integer
For i = 0 To info.Length - 1 Step i + 1
Dim EnumName As String = type.ToString()
If info(i).FormatDescription.Equals(EnumName) Then
Return info(i)
End If
Next

Return Nothing

End Function

Private Function ConvertToBitonal(ByVal original As Bitmap) As Bitmap
Dim source As Bitmap = Nothing

' If original bitmap is not already in 32 BPP, ARGB format, then convert
If original.PixelFormat <> PixelFormat.Format32bppArgb Then
source = New Bitmap(original.Width, original.Height, PixelFormat.Format32bppArgb)
source.SetResolution(original.HorizontalResolution, original.VerticalResolution)
Using g As Graphics = Graphics.FromImage(source)
g.DrawImageUnscaled(original, 0, 0)
End Using
Else
source = original
End If

' Lock source bitmap in memory
Dim sourceData As Imaging.BitmapData = source.LockBits(New Rectangle(0, 0, source.Width, source.Height), ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb)

' Copy image data to binary array
Dim imageSize As Integer = sourceData.Stride * sourceData.Height
Dim sourceBuffer(imageSize) As Byte
Marshal.Copy(sourceData.Scan0, sourceBuffer, 0, imageSize)

' Unlock source bitmap
source.UnlockBits(sourceData)

' Create destination bitmap
Dim destination As Bitmap = New Bitmap(source.Width, source.Height, PixelFormat.Format1bppIndexed)

' Lock destination bitmap in memory
Dim destinationData As BitmapData = destination.LockBits(New Rectangle(0, 0, destination.Width, destination.Height), ImageLockMode.WriteOnly, PixelFormat.Format1bppIndexed)

' Create destination buffer
imageSize = destinationData.Stride * destinationData.Height
Dim destinationBuffer(imageSize) As Byte

Dim sourceIndex As Integer = 0
Dim destinationIndex As Integer = 0
Dim pixelTotal As Integer = 0
Dim destinationValue As Byte = 0
Dim pixelValue As Integer = 128
Dim height As Integer = source.Height
Dim width As Integer = source.Width
Dim threshold As Integer = 500

' Iterate lines

For y As Integer = 0 To height - 1
sourceIndex = y * sourceData.Stride
destinationIndex = y * destinationData.Stride
destinationValue = 0
pixelValue = 128

' Iterate pixels
For x As Integer = 0 To width - 1
' Compute pixel brightness (i.e. total of Red, Green, and Blue values)
pixelTotal = CType(sourceBuffer(sourceIndex + 1), Integer) + CType(sourceBuffer(sourceIndex + 2), Integer) + CType(sourceBuffer(sourceIndex + 3), Integer)
If pixelTotal > threshold Then
destinationValue += CType(pixelValue, Byte)
End If
If pixelValue = 1 Then
destinationBuffer(destinationIndex) = destinationValue
destinationIndex = destinationIndex + 1
destinationValue = 0
pixelValue = 128
Else
pixelValue = pixelValue >> 1
End If
sourceIndex += 4
Next
If pixelValue <> 128 Then
destinationBuffer(destinationIndex) = destinationValue
End If
Next

' Copy binary image data to destination bitmap
Marshal.Copy(destinationBuffer, 0, destinationData.Scan0, imageSize)

' Unlock destination bitmap
destination.UnlockBits(destinationData)
original.Dispose()
' Return
Return destination
End Function

Private Function Compare(ByVal d1 As FileInfo, ByVal d2 As FileInfo) As Integer
Return d1.Name.CompareTo(d2.Name)
End Function

Private Sub SortByName(ByVal files As FileInfo())
Array.Sort(Of FileInfo)(files, New Comparison(Of FileInfo)(AddressOf Compare))

End Sub

#End Region

End Module