LLTI Archives

March 1999, Week 1

LLTI@LISTSERV.DARTMOUTH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
LLTI-Editor <[log in to unmask]>
Reply To:
Language Learning and Technology International Information Forum <[log in to unmask]>
Date:
Mon, 1 Mar 1999 17:15:00 EST
Content-Type:
text/plain
Parts/Attachments:
text/plain (70 lines)
--- Forwarded Message from Richard Kunst <[log in to unmask]> ---

>From: Richard Kunst <[log in to unmask]>
>Reply-To: [log in to unmask]
>To: Language Learning and Technology International Information Forum    <[log in to unmask]>
>Cc: "C. Ratcliff" <[log in to unmask]>
>Subject: Re: #4873 Asian lang. online dictionaries
>In-Reply-To: <[log in to unmask]>
>Date: Sat, 27 Feb 1999 13:35:59 -0500 (Eastern Standard Time)
>Priority: NORMAL

------------------
On Tue, 23 Feb 1999 17:08:08 EST LLTI-Editor 
<[log in to unmask]> wrote:
> 
> Is there anyone who knows of online dictionaries for Chinese, Japanese,
> Korean, Indonesian, Thai, or Hindic languages which allows one to perform
> so-called "wild card" searches? 

Christian-

The on-line dictionary engine which is part of the WinCALIS/UniEdit 
system does what you are seeking, although it is not quite as 
flexible as a fully wild card-based system. You can search for words 
which contain strings of 1 or more Han characters (kanji); 2 or more 
Japanese kana characters; or 3 or more alphabetic characters (*any* 
alphabet). The characters can be in any position in a word. The 
search can be limited to whole words (bounded by spaces or predefined 
delimiters, such as zero-width spaces or "joiners") or the beginning 
of words (to ignore inflection, since there is no automatic 
lemmatization), or the search can be unlimited.

As for dictionaries using this system, at the moment I am only aware 
of dictionaries of any useful size for Japanese-and-English (the Jim 
Breen dictionary database of 105,000 entries--mostly proper nouns) 
and Korean-and-English (25,000+ entries), which we at Duke distribute 
-- there may be more out there. But the WinCALIS/UniEdit dictionary 
engine can be applied to *any* Unicode-encoded text database which 
has been formatted in a specific way (newline = record separator, '/' 
= field separator). Thus if you have a database of lexical 
information for, say, Indonesian, or Bengali, if you format it as 
above, then run the automatic indexer, you could then use the 
dictionary engine to search it the way you desire.

I have perhaps jumped to a conclusion about your use of "online." The 
above dictionary engine is for use on-line in Windows, but not on the 
Web, at least at the moment (although we could certainly adapt it so 
that a Java client like Java WebCALIS or Java UniEdit could make use 
of it--just haven't gotten around to that yet.

One good way to get the dictionary engine, along with documentation 
and the Japanese and Korean dictionaries, is to download the Uniedit 
or WinCALIS Author Tryout Editions from our Website, 
http://www.lang.duke.edu. Contact me directly for more information. I 
would welcome an incentive to make the newest 32-bit Dictionary 
Module available for download, since it can exchange Unicode text 
data on the clipboard nicely with other Unicode apps such as MS 
Office, MSIE, and Netscape. Thus you can highlight text in a Web 
browser and look it up using the WinCALIS/UniEdit dictionary engine.

Best,
Rick Kunst

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
OIT Humanities Computing Facility  Tel. (919) 660-3194
319 North Building - Box 90269     Fax: (919) 660-3191 
Duke University                    E-mail [log in to unmask]
Durham, NC 27708 USA               http://www.lang.duke.edu
_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/

ATOM RSS1 RSS2