--- Forwarded Message from "Richard Kunst" <[log in to unmask]> --- >Reply-To: <[log in to unmask]> >From: "Richard Kunst" <[log in to unmask]> >To: "'Language Learning and Technology International Information Forum'" <[log in to unmask]> >References: <[log in to unmask]> >In-Reply-To: <[log in to unmask]> >Subject: RE: #8932 Feedback on IRIS Asian OCR sw? >Date: Sat, 13 Sep 2008 13:25:25 -0400 >Organization: Humanities Computing Lab >Thread-Index: AckVo7GVSdqknohQRf6wp1WJKootogAG9wlQ On Thu, 11 Sep 2008 12:16:28 -0400 Jose Rodriguez <[log in to unmask]> wrote > Does anyone have any feedback on ReadIRIS Asian edition of their OCR > software? Dear Jose and list, A few years ago I OCRed a few thousand pages of Chinese text using the Asian Add-on to ReadIRIS ver.10. It was excellent, as is ReadIRIS in general, but as with other OCR software, it had its frustrating quirks. It was sometimes inferior to, sometimes better than the core ReadIRIS. (I assume the engine was developed independently.) There were some characters which were so regularly misrecognized that I gradually built up a list of "the usual suspects" to watch for during post-editing. I have appended it below. And if that doesn$E2Aot pass through the listserv, it is also on the following web page: http://www.humancomp.org/misc/problem_characters_in_readiris_chinese_ocr.html Almost all of the text I OCRed was horizontal L-to-R, but as I recall, it worked OK for vertical text too. I tried out the Japanese and Korean OCR as well with good success, but didn't do more than a few small tests. Best wishes, Rick Kunst _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ The Humanities Computing Laboratory A Nonprofit Education and Research Corporation 109 Lariat Lane, Suite B Chapel Hill, NC 27517 USA Tel. +1 919 656-5915 E-mail: [log in to unmask] http://www.humancomp.org _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ ************************************************ (Some characters used in editing ReadIRIS OCR output.) $E7o$BD$E8aTM$E7oAE$E4PIA$E7(R)$BB$E4$BD$A4$E8aC$E5a$BA$E5ea$E6aa$E5O*$E5+/-+/-$ E7$BBU$E8BA$E7u=>$E5ea$E8Aa$E5O'$E6oB$E3AC$E5oa$E6u'$E5ea$E5cu$E6uTM$E6u'$E5ae$E 5#'$E5uu$E7i$B0$E5<>o$E5$A4(c)$E6ao Correct Incorrect $E8$A6A (replace $E8Ac) $E5e$AC (replace $E6oi) $E5e=> (replace $E6oB) $E6o$B0 (replace $E6o*) $E5Nnil (replace JL) $E4$BAU (replace $E4PIA) $E8aC (replace $E6o*) $E8AA (replace $E8AO) $E5cu (replace $E5cdeg.) $E5<>o (replace $E5AEa) $E9oAE (replace $E9o$B4) $E6o$A0 (replace $E5OE) $E5<>e (replace $E4$BAe) $E5ae (replace $E5aa) $E7i$B0 (replace $E5ou) $E5<>o (replace $E5AEa) $E8$A6A (replace $E5$A6*) $E7oN (replace $E5ua) Incorrect $E8aTM (replace *selectively* with $E7o$BD ) $E5aa (replace *almost* all with $E5a$BA ) $E7$BBN (replace *selectively* with $E7$BBU ) $E5a$BA (replace *selectively* with $E5+/-+/- ) $E4$BD$A4 (replace $E4$BDe$E5Aa$E5ATM$E4aeE$E7oPI$E7i$A6$E5Ao$E4oc etc.) $E6aa$E7*u (replace $E6aa$E6deg.i...) $E9pia$E9pie (replace $E9pia$E8aee,$E9pia$E8oi,$E9pi$A4$E8oi,$E9pia$E5o+/-) *********************************************** LLTI is a service of IALLT, the International Association for Language Learning (http://iallt.org/), and The Consortium for Language Teaching and Learning (http://www.languageconsortium.org/). Join IALLT at http://iallt.org. Otmar Foelsche, LLTI-Editor ([log in to unmask]) ***********************************************