Jump to content

Cannot Import User File PDF


mgvh
 Share

Recommended Posts

In view of another post about Moulton & Milligan's Vocabulary of the Greek NT, I went to archive.org and downloaded the PDF of this public domain work.

The file is ~56MB.

User User Files > Import User Tool

I locate the file and start the import.

It runs for a minute or so, gets about a third of the way through the import, then just quits, and the import dialogue simply disappears.

I tried this twice and same problem happens.

Link to comment
Share on other sites

Accordance should produce something, but even if it worked perfectly the resulting file would not be very satisfying as a lexicon. The problem is that the PDF was made with an OCR process that doesn't recognize Greek. Most of the 56MB is images of the pages, which Accordance will discard. Then you should get reasonably good English, but just jumbles of Latin letters where the Greek was. You could get better results if you do your own OCR on the file with a program that can at least recognize Greek letters, but I don't know of a commercial program that will recognize polytonic Greek.

Link to comment
Share on other sites

Try Abbyy fine Reader for PC 15, then you can also select it should highlight the questionable characters. So you have only to check this characters and there you are able to edit them. (unfortunately not available on Mac). So you are luckily to use Windows here:-)

 

The Mac version can polytonic. And the Windows version is far better.

 

The OCR on archive.org is with Abbyy, but one which was new as Jesus walked on (t)his earth.

 

With tessereract you can train the OCR.

 

Or https://transkribus.eu/Transkribus/ this is able also to read handwriting. If there is no template for the font, then it need 30 pages to train. 

 

I also contacted archive.org and then they made a new OCR. But your own would be better. 

 

Greetings

 

Fabian

 

I can try with my Abbyy.

Edited by Fabian
Link to comment
Share on other sites

The Mac version can polytonic. And the Windows version is far better.

When I use the latest release of FineReader for Mac (January 2020) on polytonic Greek, I find that it does not recognize the polytonic accents, giving me words like Ανεχθήναι with an acute accent or a dieresis instead of a circumflex accent.

Link to comment
Share on other sites

When I use the latest release of FineReader for Mac (January 2020) on polytonic Greek, I find that it does not recognize the polytonic accents, giving me words like Ανεχθήναι with an acute accent or a dieresis instead of a circumflex accent.

Thanks, I only have seen it does. Not if it is correct:-)

 

BTW do you have reported it?

 

Greetings

 

Fabian

Link to comment
Share on other sites

When I use the latest release of FineReader for Mac (January 2020) on polytonic Greek, I find that it does not recognize the polytonic accents, giving me words like Ανεχθήναι with an acute accent or a dieresis instead of a circumflex accent.

 

I looked at this a few years back, the issue of Greek polytonic OCR. I couldn't find anything off the shelf that was reliable. I sent some examples to one company but in the end, dead languages were not a focus. Something trained with Tesseract under it or gImageReader was possible but would require a bunch of work on the training end. I haven't been back to it. I took a version of North and Hillard and OCR'd that, I would still be doing corrections on it to turn it into a usable tool, but I've started using Dickey's composition book. I may finish it up, but as an example it showed a lot remained unfinished in polytonic OCR.

 

Thx

D

Link to comment
Share on other sites

Thanks, I only have seen it does. Not if it is correct:-)

 

BTW do you have reported it?

 

Greetings

 

Fabian

I didn't report it, because it looked to me like "Greek" meant Modern Greek. If it occasionally recognized a polytonic accent or breathing mark, I would believe the software was meant to work with polytonic Greek. Probably there isn't enough money in adding support for polytonic Greek.

 

When I wrote above, "I don't know of a commercial program that will recognize polytonic Greek," I used the adjective "commercial" because I know of an Open Source program that recognizes polytonic Greek. It's called Lace. I believe they started with Tesseract and trained it for polytonic Greek. Then they let it loose on Patrologia Graeca and put the results online in 2014. There were still lots of errors, but they had some ideas for cleaning them some of them up automatically, like OCRing a volume three times and taking results where 2 out of 3 results matched. Then came Lace2, and I see there's now a 2020 version of it. I haven't used it, but from the demo site it looks like it can handle texts with mixed Latin and Greek characters.

 

In the meantime, the First1KGreek program apparently found funding to clean up texts, so there are 23 million words of Greek online in TEI XML, much of it grammatically tagged. But that stops at 250 AD, so the next step is First2KGreek, which would include most of Patrologia Graeca and later critical editions that are out of copyright. http://www.opengreekandlatin.org/

Edited by jlm
  • Like 1
Link to comment
Share on other sites

I've talked once with a jewish publisher owner. He told me, if he has something to type, he hired over an internet site (sorry I forgot the name) in India employes, they type and proofread it twice for a small money.

Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...