Skip to content

Macro SMILE_PDF_MERGE

Merge multiple PDF files and create one bookmark entry per PDF file with PROC GROOVY and open-source Tool PDFBox

  • Author: Katja Glass
  • Date: 2021-01-29
  • SAS Version: SAS 9.4
  • License: MIT
  • Comment: Make sure to download PDFBOX, e.g. from here https://pdfbox.apache.org/download.html - the full "app" version
  • Issues: "unable to resolve class" messages mean the PDFBOX is not provided correctly. "ERROR: PROCEDURE GROOVY cannot be used when SAS is in the lock down state." means that your SAS environment does not support PROC GROOVY, for this the macro cannot run the groovy code. "WARNUNG: Removed /IDTree from /Names dictionary, doesn't belong there" - this message is coming from PDFBox.
  • Reference: A paper explaining how to use PDFBOX with PROC GROOVY also for TOC is available in the following paper (https://www.lexjansen.com/phuse/2019/ct/CT05.pdf)
  • Example Program: test_smile_pdf_merge

Parameters

Parameter Description
DATA Input dataset containing INFILE and BOOKMARK variable, INFILE containing single pdf files (one file per observation), BOOKMARK containing the corresponding bookmark label for this file
OUTFILE Output PDF file (not in quotes)
PDFBOX_JAR_PATH Path and jar file name for PDFBOX open source tool, e.g. &path/pdfbox-app-2.0.22.jar
SOURCEFILE Optional SAS program file where PROC GROOVY code is stored, default is TEMP (only temporary)
RUN_GROOVY NO/YES indicator whether to run the final GROOVY code (default YES)


Examples

DATA content;
   ATTRIB inFile     FORMAT=$255.;
   ATTRIB bookmark FORMAT=$255.;
   inFile = "&inPath/output_1.pdf";  bookmark = "Table 1";    OUTPUT;
   inFile = "&inPath/output_2.pdf";  bookmark = "Table 2";    OUTPUT;
   inFile = "&inPath/output_3.pdf";  bookmark = "Table 3";    OUTPUT;
RUN;
%smile_pdf_merge(
   data            = content
 , outfile         = &outPath/merged_output.pdf
 , pdfbox_jar_path = &libPath/pdfbox-app-2.0.22.jar
 , sourcefile      = &progPath/groovy_call.sas
 , run_groovy      = YES
);

Checks

  • existence of required parameters (DATA, OUTFILE, PDFBOX_JAR_PATH), abort;
  • existence of parameter SOURCEFILE, if not use TEMP;
  • RUN_GROOVY must be NO or YES, abort;
  • PDFBOX_JAR_PATH must exist and must be a ".jar" file;
  • existence of data, contains observations, contains variables infile and bookmark;
  • files in variable "infile" must exist;

Macro

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
%************************************************************************************************************************;
%* Project    : SMILE - SAS Macros, Intuitive Library Extension
%* Macro      : smile_pdf_merge
%* Parameters : DATA            - Input dataset containing INFILE and BOOKMARK variable,
%*                                INFILE containing single pdf files (one file per observation),
%*                                BOOKMARK containing the corresponding bookmark label for this file
%*              OUTFILE         - Output PDF file (not in quotes)
%*              PDFBOX_JAR_PATH - Path and jar file name for PDFBOX open source tool, e.g. &path/pdfbox-app-2.0.22.jar
%*              SOURCEFILE      - Optional SAS program file where PROC GROOVY code is stored, default is TEMP (only temporary)
%*              RUN_GROOVY      - NO/YES indicator whether to run the final GROOVY code (default YES)
%*
%* Purpose    : Merge multiple PDF files and create one bookmark entry per PDF file with PROC GROOVY and open-source Tool PDFBox
%* Comment    : Make sure to download PDFBOX, e.g. from here https://pdfbox.apache.org/download.html - the full "app" version
%* Issues     : "unable to resolve class" messages mean the PDFBOX is not provided correctly.
%*              "ERROR: PROCEDURE GROOVY cannot be used when SAS is in the lock down state." means that your SAS environment
%*              does not support PROC GROOVY, for this the macro cannot run the groovy code.
%*              "WARNUNG: Removed /IDTree from /Names dictionary, doesn't belong there" - this message is coming from PDFBox.
%*
%* ExampleProg: ../programs/test_smile_pdf_merge.sas
%*
%* Author     : Katja Glass
%* Creation   : 2021-01-29
%* License    : MIT
%*
%* Reference  : A paper explaining how to use PDFBOX with PROC GROOVY also for TOC is available in the following paper
%*              (https://www.lexjansen.com/phuse/2019/ct/CT05.pdf)
%*
%* SAS Version: SAS 9.4
%*
%************************************************************************************************************************;
/*
Examples:
DATA content;
   ATTRIB inFile     FORMAT=$255.;
   ATTRIB bookmark FORMAT=$255.;
   inFile = "&inPath/output_1.pdf";  bookmark = "Table 1";    OUTPUT;
   inFile = "&inPath/output_2.pdf";  bookmark = "Table 2";    OUTPUT;
   inFile = "&inPath/output_3.pdf";  bookmark = "Table 3";    OUTPUT;
RUN;
%smile_pdf_merge(
   data            = content
 , outfile         = &outPath/merged_output.pdf
 , pdfbox_jar_path = &libPath/pdfbox-app-2.0.22.jar
 , sourcefile      = &progPath/groovy_call.sas
 , run_groovy      = YES
);
*/
%************************************************************************************************************************;

%MACRO smile_pdf_merge(data = , outfile = , pdfbox_jar_path = , sourcefile = TEMP, run_groovy = YES);

   %LOCAL macro;
   %LET macro = &sysmacroname;

%*;
%* Error handling I - parameter checks;
%*;

   %* check: existence of required parameters (DATA, OUTFILE, PDFBOX_JAR_PATH), abort;
   %* check: existence of parameter SOURCEFILE, if not use TEMP;
   %* check: RUN_GROOVY must be NO or YES, abort;

   %IF %LENGTH(&data) = 0
   %THEN %DO;
       %PUT %STR(ERR)OR: &macro - DATA parameter is requried. Macro will abort.;
       %RETURN;
   %END;

   %IF %LENGTH(&outfile) = 0
   %THEN %DO;
       %PUT %STR(ERR)OR: &macro - OUTFILE parameter is requried. Macro will abort.;
       %RETURN;
   %END;

   %IF %LENGTH(&pdfbox_jar_path) = 0
   %THEN %DO;
       %PUT %STR(ERR)OR: &macro - PDFBOX_JAR_PATH parameter is requried. Macro will abort.;
       %RETURN;
   %END;

   %IF %LENGTH(&sourcefile) = 0
   %THEN %DO;
       %PUT %STR(WAR)NING: &macro - SOURCEFILE parameter is needed - TEMP will be used.;
       %LET sourcefile = TEMP;
   %END;

   %IF %UPCASE(&run_groovy) NE YES AND %UPCASE(&run_groovy) NE NO
   %THEN %DO;
       %PUT %STR(ERR)OR: &macro - RUN_GROOVY parameter must be NO or YES. Macro will abort.;
       %RETURN;
   %END;

   %* check: PDFBOX_JAR_PATH must exist and must be a ".jar" file;
   %IF %SYSFUNC(FILEEXIST(&pdfbox_jar_path)) = 0
   %THEN %DO;
       %PUT %STR(ERR)OR: &macro - PDFBOX_JAR_PATH file does not exist. Macro will abort.;
       %RETURN;
   %END;
   %IF %UPCASE(%SCAN(&pdfbox_jar_path,-1,.)) NE JAR
   %THEN %DO;
       %PUT %STR(ERR)OR: &macro - PDFBOX_JAR_PATH must be a ".jar" file. Macro will abort.;
       %RETURN;
   %END;

%*;
%* Preparations;
%*;

   %* include quotes around sourcefile if not available;
   DATA _NULL_;
       ATTRIB path FORMAT=$500.;
       path = SYMGET('sourcefile');
       IF UPCASE(STRIP(path)) NE "TEMP"
       THEN DO;
           IF SUBSTR(sourcefile,1,1) NE '"' AND SUBSTR(sourcefile,1,1) NE "'"
           THEN DO;
               CALL SYMPUT('sourcefile','"' || STRIP(path) || '"');
           END;
       END;
   RUN;

%*;
%* Error handling II - data checks;
%*;

   %LOCAL dsid rc error;
   %LET error = 0;

   %* check: existence of data, contains observations, contains variables infile and bookmark;
   %LET dsid=%SYSFUNC(OPEN(&data,is));
   %IF &dsid EQ 0
   %THEN %DO;
       %PUT %STR(ERR)OR: &macro - DATA (&data) does not exist. Macro will abort.;
       %RETURN;
   %END;
   %ELSE %IF %SYSFUNC(ATTRN(&dsid,NLOBS)) = 0
   %THEN %DO;
       %PUT %STR(ERR)OR: &macro - DATA (&data) does not contain any observations. Macro will abort.;
       %LET rc=%SYSFUNC(CLOSE(&dsid));
       %RETURN;
   %END;
   %ELSE %IF %SYSFUNC(VARNUM(&dsid,infile)) = 0 OR %SYSFUNC(VARNUM(&dsid,bookmark)) = 0
   %THEN %DO;
       %PUT %STR(ERR)OR: &macro - DATA (&data) does not contain required variables (infile and bookmark). Macro will abort.;
       %LET rc=%SYSFUNC(CLOSE(&dsid));
       %RETURN;
   %END;
   %LET rc=%SYSFUNC(CLOSE(&dsid));

   %* check: files in variable "infile" must exist;
   %* update BOOKMARK labels, replace double quotes;
   DATA _smile_indat;
       SET &data;
       RETAIN _smile_msg 0;
       IF FILEEXIST(infile) = 0
       THEN DO;
           PUT "%STR(ERR)OR: INFILE does not exist: " infile " - Macro will abort.";
           CALL SYMPUT('error','1');
       END;
       IF INDEX(bookmark,'"') > 0 AND _smile_msg = 0
       THEN DO;
           PUT "%STR(WAR)NING: Double quotes are not supported for BOOKMARK texts and are removed.";
           _smile_msg = 1;
       END;
       bookmark = TRANWRD(bookmark,'"','');
   RUN;

   %IF &error NE 0
   %THEN %DO;
       %GOTO end_macro;
   %END;

%*;
%* Create PROC GROOVY program file;
%*;

   FILENAME cmd &sourcefile;

   DATA _NULL_;
       FILE cmd LRECL=5000;
       SET _smile_indat END=_eof;
       IF _N_ = 1
       THEN DO;
           PUT "PROC GROOVY;";
           PUT "    ADD CLASSPATH = ""&pdfbox_jar_path"";";
           PUT "    SUBMIT;";
           PUT ;
           PUT "import org.apache.pdfbox.multipdf.PDFMergerUtility;";
           PUT "import org.apache.pdfbox.pdmodel.PDDocument;";
           PUT "import org.apache.pdfbox.pdmodel.interactive.documentnavigation.destination.PDPageDestination;";
           PUT "import org.apache.pdfbox.pdmodel.interactive.documentnavigation.destination.PDPageFitWidthDestination;";
           PUT "import org.apache.pdfbox.pdmodel.interactive.documentnavigation.outline.PDDocumentOutline;";
           PUT "import org.apache.pdfbox.pdmodel.interactive.documentnavigation.outline.PDOutlineItem;";
           PUT "import java.io.File;";
           PUT ;
           PUT "public class PDFMerge2 {";
           PUT "    public static void main(String[] args) {";
           PUT;
           PUT @8 "//Instantiating PDFMergerUtility class";
           PUT @8 "PDFMergerUtility PDFmerger = new PDFMergerUtility();";
           PUT ;
           PUT @8 "//Setting the destination file";
           PUT @8 "PDFmerger.setDestinationFileName(""&outfile"");";
           PUT ;
           PUT @8 "//adding the source files";
       END;
       PUT @8 "PDFmerger.addSource(new File(""" inFile +(-1) """));";
       IF _eof
       THEN DO;
           PUT @8 "PDFmerger.mergeDocuments(null);";
       END;
   RUN;

   DATA _NULL_;
       FILE cmd LRECL=5000 MOD;
       ATTRIB _temp FORMAT=$200.;
       SET _smile_indat END=_eof;
       IF _N_ = 1
       THEN DO;
           PUT @8 "//Open created document";
           PUT @8 "PDDocument document;";
           PUT @8 "PDPageDestination pageDestination;";
           PUT @8 "PDOutlineItem bookmark;";
           PUT @8 "document = PDDocument.load(new File(""&outfile""));";
           PUT ;
           PUT @8 "//Create a bookmark outline";
           PUT @8 "PDDocumentOutline documentOutline = new PDDocumentOutline();";
           PUT @8 "document.getDocumentCatalog().setDocumentOutline(documentOutline);";
           PUT @8 ;
           PUT @8 "int currentPage = 0;";
       END;

       _temp = SCAN(inFile,-1,"/\");
       PUT @8 "//Include file " _temp;
       PUT @8 "pageDestination = new PDPageFitWidthDestination();";
       PUT @8 "pageDestination.setPage(document.getPage(currentPage));";
       PUT @8 "bookmark = new PDOutlineItem();";
       PUT @8 "bookmark.setDestination(pageDestination);";
       PUT @8 "bookmark.setTitle(""" bookmark +(-1) """);";
       PUT @8 "documentOutline.addLast(bookmark);";
       PUT ;
       PUT @8 "//Change currentPage number";
       PUT @8 "currentPage += PDDocument.load(new File(""" inFile +(-1) """)).getNumberOfPages();";
       PUT ;
       IF _eof
       THEN DO;
           PUT @8 "//save document";
           PUT @8 "document.save(""&outfile"");";
           PUT ;
           PUT "}}";
           PUT "endsubmit;";
           PUT "quit;";
       END;
   RUN;

%*;
%* Optionally execute PROC GROOVY code;
%*;

   %IF %UPCASE(&run_groovy) = YES
   %THEN %DO;
       %PUT &macro: Run Groovy Program;
       %PUT &macro: The following warning might come from PDFBox: %STR(WAR)NING: Removed /IDTree from /Names dictionary ...;
       %INCLUDE cmd;
   %END;

%end_macro:

%*;
%* cleanup;
%*;

   FILENAME cmd;

   PROC DATASETS LIB=WORK NOWARN NOLIST;
       DELETE _smile_indat;
   RUN;


%MEND;