Problem: You want to download all the PDFs of an author’s articles from their web-page. But there are 35 of them. You don’t want to repeat the click > wait > save to disk cycle 35 times. You don’t want to miss any of the PDFs by mistake.
Solution: A very nice Firefox add-on called DownThemAll.
- place all JPEG files to be converted into a single directory
convert *.jpg my_pdf_file.pdf
The order of the pages will match the order of the original file-names. Conversion can be slow and resource-intensive if the number of JPEG files is large (ca. 100). The resulting PDF file is typically about the same size as the sum of the original file sizes.
The text narrates an encounter and dialog between Yi Yin 伊尹 and the Shang king Tang 商湯. Yi Yin abandons the Xia 夏 king, and reports to Shang Tang the evils of Xia government, the suffering of the people, and celestial omens for the overthrow of Xia. The same narrative appears in the “Shen Da 慎大” chapter of the Lüshi Chunqiu 呂氏春秋, though with only occasional precise textual parallels.
The following loose transcription and translation aims at capturing the emerging consensus about what the text means. References to web publications on the text follow.
It was when Yin went from Xia to Bo [Shang Tang’s capital],
Carefully he arrived, and was in Tang’s presence.
Tang said: “Come! You have perhaps some fortunate intent [志 “intent” an error for 言 “news”]?”
Yin said: “My Lord, I come, having been on the road ten days now.
I have scrutinized the common people of Xia, they […] lucky and good,
but as for their Lord, he has lost all [good?] intention, is excessively fond of the two Jade Ladies, and has no sympathy for his common people.
The people indeed said: ‘We will perish together with you.’
It is a disaster: he abuses virtue, does violence to [?], and abandons the written codes.
The Xia have omens, in the West and in the East, seeing manifestations in the sky. Their people all say, ‘It is [the sign of] our swift calamity [or “It is we who have invited this calamity”].’
They all say, ‘Why now does the eastern omen not manifest itself? What shall we do now?’”
Tang said, “Is it all so, what you have told me of Xia’s secrets [or “eclipse”, or “agonies”]?” Yin said, “That’s how it is.”
Tang’s covenant was extended to Yin, and he thence busied himself with the great Ying ritual [for warding off meteorological calamity].
Tang went to campaign against those who would not ally with him.
Zhi [i.e. Yin] planned. Zhi’s virtue was not faulty.
From the west they destroyed the western settlement [Xia], and defeated the state of Xia.
Xia counted [i.e. took a census of] its people, entered into Shui, and talked of “battle”.
Di [i.e. Tang] said: “Spare not a single one.”
Convert scanned images of Chinese documents to real, searchable, editable text.
There is some information for OCR options on Ubuntu/Linux, but it doesn’t explain the set up for Chinese text very well. OCRFeeder can be installed from the Ubuntu Software Center (Applications > Ubuntu Software Center – click on Office). OCRFeeder works as a graphical front end for OCR engines like Tesseract that do the actual optical character recognition. Tesseract provides files for language specific OCR on their downloads page. For Chinese, these are
chi_sim.traineddata.gz for traditional and simplified Chinese respectively.
- Download the files and gunzip them.
- Move them to the
tessdatadirectory. For me the path is
- Start OCRFeeder.
- Open the OCR Engines dialog ( Tools > OCR Engines).
- Click “Add”, and fill in the fields as follows:
- Name: Tesseract – Traditional Chinese
- Image format: TIFF
- Failure string: (leave blank)
- Engine path: /usr/local/bin/tesseract (or whatever the path is for your tesseract installation)
- Engine arguments: $IMAGE $FILE -l chi_tra; cat $FILE.txt; rm $FILE
- That was for traditional Chinese. For simplified Chinese, add another engine. The following fields will be different:
- Name: Tesseract – Simplified Chinese
- Engine arguments: $IMAGE $FILE -l chi_sim; cat $FILE.txt; rm $FILE
It should now be possible to select either form of Chinese when performing OCR.