The Mudcat Café TM
Thread #100514 Message #2017488
Posted By: JohnInKansas
05-Apr-07 - 02:09 PM
Thread Name: Tech: How can I get keywords for each lyric?
Subject: RE: Tech: How can I get keywords for each lyric?
The files generally are non-text, and extracting the text only to look for a complete listing of keywords associated with each song is something of a challenge.
If you can download the DOS version and unzip it somewhere separate from where you keep the Windows version that I assume you'd rather use, the file "KEYS" contains a plain text list of all the keywords that are used; but won't tell you which keywords are used for each song. You can copy "KEYS" somewhere and change the name to KEYS.TXT, and it will open in notepad, but will be somewhat "unoriented." If you right-click on it (with or without the change to .txt) and "send to" Word, everything should line up legibly, and in Word you can do a few global replace operations to replace the multiple spaces with tabs (^t) or paragraph breaks (^p) if you want.
The batch file, KEYWORD.BAT, is intended to let you print a list of the keywords, but probably won't run in recent windows "DOS Windows" since it uses a command that was dropped from the Command set. The simplest way to use the DOS version probably would be to just boot to an older DOS, but if your hard drive is formatted NTFS DOS can't read anything from it, so you'd have to make a separate partition (FAT or FAT32 format) to put the DOS DIGITRAD in, or run it from a separate drive (CD?) to do it that way.
In the DOS version, the file named "TITLES" can be opened in the same way as the KEYS file, and is legible in Word, giving a list of all the tune titles. A "vertical bar" that's used as a separator can be replaced with a paragraph break by putting "^124" (without the quotes - the ANSI value of the character is 124) in the Word "Find" box and "^p" (without the quotes) in the "Replace With" box and clicking "Replace All" if it makes it easier to read/use. Again, there is no association of keywords with the individual titles.
If you open (from the DOS version) the file named "Z02.ASK" in Word and replace
"^012" with "^p" - Replace All "^014" with "^p" – Replace All "^027^002" with ^p^p – Replace All (allow time, >8,500 replacements)
You'll get the full text with each song separated by double paragraph breaks. The keywords for each song will be at the end of the song, preceded by an "@" for each keyword. There will be extraneous "unprintable" characters, and the document will be around 3,418 pages long, so if you're short on memory the last "replace all" especially may take a very long time, or may crash (lock up) your Word. It worked fine with my Word 2002, with 1GB RAM.
Some of the remaining "unprintables" are outside the ANSI charvalue range, and can't be easily "replaced out" in Word, but you can of course do a bit of manual cleanup.
There appears to be an "end section" in this file where the titles and keywords are associated. If you do a Find "HARDTAC" and do "Find Next" about 5 times, you'll get to around page 3343 (of 3418 the way my Word repaginated) where this section appears, and it goes to the end of the document. I haven't cracked the unprintables here, which would be how you'd separate the entries to get a "readable" list, but it looks like it may be possible.(?).