LISTSERV - NISUS Archives - LISTSERV.DARTMOUTH.EDU

NISUS Archives

October 2015

NISUS@LISTSERV.DARTMOUTH.EDU

	LISTSERV Archives
	NISUS Home
	NISUS October 2015

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives
Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]
Subject:	Re: Indexing
From:	Nobumi Iyanaga <[log in to unmask]>
Reply To:	[log in to unmask]
Date:	Tue, 27 Oct 2015 11:53:52 +0900
Content-Type:	text/plain
Parts/Attachments:	text/plain (136 lines)
Hello Philip,

Thank you for your detailed answer. Although all this seems to be really complicated, what you describe explains very well what happens in reality. As you say, these details don't matter much in real life work, since anyway, the generated index has the right page numbers. But all this is rather surprising and confusing...

I found another problem: when we run Index Using Word List, some occurrences of a word that is listed in the list file do not get indexed: one of the cases in which the word is not indexed is when it comes just after an apostrophe (for example, if you have "antiquité" in your word list, "the antiquité" would be indexed, but not "l'antiquité", etc.). There are perhaps other cases...

So we have to check each occurrence using the Find box, but this is very time consuming.

Anyway, although the current implementation of indexing is well thought, it seems that there are many points which could be / should be improved...

Best regard,

Nobumi Iyanaga

P.S. I am sorry for these belated replies. This is because when I reply, I always forget to change the address field of the listserve. It is a real annoyance!

> On Oct 26, 2015, at 1:08 PM, spaelti <[log in to unmask]> wrote:
> 
> Hello Nobumi,
> 
> Let me try to explain again how this works, although this is a bit presumptuous on my part, since I’m not really sure I know myself.
> 
> Basically Nisus treats (so it seems) indexing as a kind of attribute, except that the “attribute” is an __array of arrays of topics__. If this sounds complicated, it’s because it is ;-)
> 
> If you just straight index something, the array of arrays will be (()), that is an array containing an empty array.
> If you use ‘Index As’ on the text “Nisus Writer” and use “Nisus Writer” as the topic, the array of arrays will be ((“Nisus Writer”))
> If you index the text “Nisus Writer” once using ‘Index’ and once using ‘Index As’ with the topic “Document Processor”, the array of arrays will be ((), (“Document Processor”))
> You can add further topic arrays using “Additional Index As”.
> 
> Now the point is that these arrays are ‘applied’ to stretches of text. So if you apply one or more topics to the text “Nisus Writer” and then select just the text “Nisus” and ask what the indexAsTopics are, you will get the same answer as if you select the whole thing, or just the “N” or just the space between “Nisus” and “Writer”, etc.
> 
> One problem is that you can apply different topics to overlapping bits of text. If we apply “Nisus” to “Nisus” and “Nisus Writer” to “Nisus Writer”, then “Nisus” will have the array of arrays ((“Nisus”), (“Nisus Writer”)), while the space and the text “Writer” will have the array of arrays ((“Nisus Writer”)) only.
> 
> Now if you want to find out what topics have been applied to a bit of text, and you select it, and then use “Index As…” to bring up the dialog, you will notice that Nisus ‘corrects’ the selection to select the whole stretch of text where the relevant topic has been applied. This can make it difficult to figure what has been applied to a bit of text, if more than one topic has been applied. If a bit of text has been indexed using both “Index” and “Index As” then the “Index As” dialog will show empty, presumably because the ‘first’ index topic is the empty array (for the “Index”).
> 
> Now the macro retrieves the index arrays by using the attributes. This is also tricky. Assume you have indexed “Nisus Writer Pro” with the topic “Nisus Writer” and then you apply italic to “Pro” then the macro will see two stretches of attributes: “Nisus Writer “ and “Pro”. Both stretches will have the same index topic array ((“Nisus Writer”)), but the stretches will be seen separately by the macro because of the formatting, even though the formatting is irrelevant for the indexing. It would actually be quite difficult to write a general macro that correctly identifies the whole stretch where a particular indexing topic has been applied.
> 
> In the end a lot of these complications are really not relevant. Since in general you just want a particular page to be associated with a particular topic in your index, it often doesn’t matter if the topic has been applied to extra bits of text on the same page.
> 
> Hope this clarifies things.
> best
> Philip
> 
> 
>> On 2015 Oct 25, at 13:30, Nobumi Iyanaga <[log in to unmask]> wrote:
>> 
>> Hello Philip,
>> 
>> Thank you very much for your reply.
>> 
>>> On Oct 24, 2015, at 3:16 PM, spaelti <[log in to unmask]> wrote:
>>> 
>>>> ...
>>>> 
>>>> There are cases which can be confusing, for example, I have the name "Yamato" (old name for Japan), and "Yamato-takeru" (the name of a hero in Japanese mythology). For these names, I may have this table:
>>>> 
>>>> Yamato	Yamato (old name for Japan)
>>>> Yamato-takeru	Yamato-takeru (a hero in Japanese mythology)
>>>> 
>>>> Then, "Yamato" in "Yamato-takeru" would be indexed twice, both as "Yamato" and "Yamato-takeru". If in the generated index, the page for "Yamato" appears in the entry for "Yamato-takeru", that is not good. Is there any way to avoid this kind of situation…??
>>> 
>>> Yes, I just tried this, and this does seem to work this way. Apparently, since hyphen counts as a word boundary, the indexing finds “Yamato” inside “Yamato-takeru” and indexes it twice. I was hoping that by having it index “Yamato-takeru” first this might be avoided, but apparently this doesn’t work. It also doesn’t work to replace the hyphen with a non-breaking hyphen.
>>> 
>>> The simplest thing here is to avoid putting such hyphenated items in the word list. Then when you have indexed using the word list, use Find to find all instances of “Yamato-takeru” and remove the indexing (which will remove their index as “Yamato”). Then you can index them again correctly
>> 
>> Thank you for trying this. Yes, I will do as you indicated.
>> 
>>> ...
>>>> 
>>>> In order to remove index mark from words, we have to find these words to which index marks were added, but this seems not easy...? Indexed words can have some color, but it is not very visible. And there is no command to find "Next Indexed Word", or something like this... Is it possible to find these words with regex?
>>> 
>>> I’m not sure if you can use Powerfind to find indexed items. My quick test didn’t work.
>>> It seems that indexing is handled a bit differently from other styles. And the reason is probably that they need to deal with the “Index As” topics. Since you can index any text bit for multiple topics, this can get quite complicated. Even writing macros to handle this is complicated. Below I’m attaching a macro that will select the next indexed bit of text, and show you what it is indexed for
>>> 
>> 
>> I tried your macro. It works very well. Thank you very much for it! I will use this macro to work on the indices.
>> 
>> One thing that I found using this macro is that there are cases in which a string of words is indexed as several times, even spaces between are indexed... This is perhaps because of a special setting that I did in my table of words to be indexed... Example:
>> 
>> I have a title of an old book:
>> 
>> Kogo shūi 古語拾遺
>> 
>> I index it as "Kogo shūi 古語拾遺"; but as there may be cases in which the word appear only in Roman characters (without the kanji), I added another entry:
>> 
>> Kogo shūi
>> 
>> to be indexed as "Kogo shūi 古語拾遺". Now, when I find the indexed words with your macro, each of the following: "Kogo", " ", "shūi" and "古語拾遺" are indexed as "Kogo shūi 古語拾遺" (what is strange is that the first space is indexed, while the second one between "shūi" and "古語拾遺" is not...). But in another similar case, this is not the first space, but another space which is indexed, etc. -- Actually, this does not make any change in the generated indices, so I can leave things as they are, but this is very strange...
>> 
>> Another this that I find very annoying is that when I select an indexed words or strings of words, and choose "Index As...", the "indexed as words" don't appear, so that it is impossible to know by which words it was indexed, not to edit the "indexed as words". I think this is a major problem, and I will post a feature request to Nisus for that...
>> 
>> Best regard,
>> 
>> Nobumi Iyanaga
>> Tokyo,
>> Japan
>> 
>>> ...
>>> 
>>> # Macro Select Next Indexed Item
>>> 
>>> $doc = Document.active
>>> $indexStyles = $doc.textIndexStyleNames
>>> 
>>> # Get the name of the index style
>>> $indexName = Active Text Index Name
>>> 
>>> # Go through the document and look for indexed items
>>> $sel = TextSelection.active
>>> $loc = $sel.bound
>>> $txt = $sel.text
>>> 
>>> while $loc < $txt.length
>>> $attr = $txt.attributesAtIndex $loc
>>> $range = $txt.rangeOfAttributesAtIndex $loc
>>> $topic = $attr.textIndexTopicsForStyleName($indexName)
>>> # If an indexed item is found select it and quit
>>> if Defined $topic
>>> $indexed = TextSelection.new $txt, $range
>>> $doc.setSelection $indexed
>>> Prompt $indexed.substring, 'Indexed as :' & $topic
>>> Exit
>>> end
>>> $loc = $range.bound
>>> end
>>> 
>>> # If nothing is found
>>> Prompt 'No indexed items found.'
>>> 
>>> # end of macro
>>> 
>>> 
> 
> Philip Spaelti
> [log in to unmask]
ATOM RSS1 RSS2
LISTSERV.DARTMOUTH.EDU