The Mudcat Café TM
Thread #100514   Message #2018069
Posted By: JohnInKansas
06-Apr-07 - 03:03 AM
Thread Name: Tech: How can I get keywords for each lyric?
Subject: RE: Tech: How can I get keywords for each lyric?
You want a list of all the tunes, with the keywords for each tune listed with the tune name. Here's one way of doing it, using the DOS 2002 Digitrad.

This is long, so follow carefully. The principles for each step are simple, but the execution can be a bit tedious.

I'll repeat the key instructions from the previous post.

1. Open (from the DOS 2002 Digitrad version) the file named "Z02.ASK" in Word.

2. Save from Word as a .doc using a new filename. (I used "Z02.doc")

3. Use Edit – Find and Replace (shortcut Alt-e, e)

In "Find What," type "^012" and in "Replace With," type "^p" [without the quotes, both places]

Note: ^012 means "the character whose ANSI value is 12" and ^p means a paragraph break.

Click "Replace All"

4. I recommend that you save the file at this point, as some computers may have difficulty with the rather large "Replace All" that follows.

5. Reopen Edit | Find and Replace (Alt-e, e)

In "Find What," type "^027^002" and in "Replace With," type "^p^p" [without quotes, both places]

Click "Replace All"

5a. If the last "Replace All" fails to complete, you can reopen the document you saved at step 4, select (highlight) half of the document, and repeat step 5. Then select the other half and repeat step 5 again. My Word 2002, with 1GB RAM had no trouble with it as a one-step replace, but if you have limited memory and/or temp space you might need to do it in nibbles instead of in gulps.

6. When step 5 is successfully completed, you'll get the full text with each song separated by double paragraph breaks. The keywords for each song will be at the end of each song, preceded by an "@" for each keyword. There will be extraneous "unprintable" characters, and the document will be around 3,418 pages long

Some of the remaining "unprintables" are outside the ANSI charvalue range, and can't be easily "replaced out" in Word, but you can of course do a bit of manual cleanup.

7. I suggest that you save the document again at this point. If you're nervous (which you should be), you can change the filename (Z02a.doc?).

8. Use Edit | Find (Ctl-F) with "HARDTAC" in the "Find What" box and click "Find Next." Repeat the click on "Find Next" (about 5 times total) until you get to approximately page 3,343 (of 3,418 the way my Word repaginated).

9. At the last "Find Next" you should end with the word "HARDTAC" highlighted. Hit Esc or click "Close" or click the "X" to close the Find box. Hit the "left arrow" once to move the insertion place to just in front of the word "HARDTAC" and once more to leave one character to the left of the "H" (or just click one character to the left of the "H").

10. Hold down both the Shift and Control keys, and hit the "Home" key (Ctl-Shift-Home) to select everything back to the start of the document, and then hit "Delete."

11. You now have a document consisting only of the "tail end" of the original. I suggest you save now, with a new filename (Z02b.doc?).

12. The remaining document that you just saved has each "DOS Song Name" with the title and "keywords" for each song immediately adjacent; but also contains a number of "unprintable" characters and has no line breaks to make it readable.

13. Reopen Edit | Find and Replace (Alt-e, e) and type "^031" in the "Find What" box, and "^p" in the "Replace With" box [without the quotes, both places]

Click "Replace All"

14. You should now have a document of about 160 pages, with the DOS Tune name followed by the title and then by any keywords associated with the tune, each in a separate paragraph. Each paragraph still begins with a few "unprintable" characters.

Since this may be usable, I suggest you save one more time, with or without changing the filename (I used Z02c.doc)

15 Reopen Edit | Find and Replace (Alt-E, E) and put a single space " " in the "Find What" box, and "^t" in the "Replace With." (^t means a tab)

Click "Replace All"

16. This last step will totally screw up your document, since it replaces every space with a tab. Don't worry about it.

17. Use Ctl-A to select the entire document and then key Alt-a, v, x. (Or select Table, Convert, Text to Table) Make sure that the "Separate at Tabs" button is selected, and then hit Enter.

NOTE: This is a massive conversion. My computer handled it okay, but if you have lesser "resources" you may need to select part of the document, do the step 17 conversion on that part, and then select the remainder of the document and do step 17 again on that part, or select even smaller portions and make multiple conversions by repeating the step for each chunk.

18. At this point you will have a very large table, or several smaller ones, probably with columns running off the right of the page. Click anywhere in the leftmost column, and key Alt-a, c, c (Or use Table | Select | Column). This should select (highlight) the first column of the table. Key Alt-a, d, c (Or click Table | Delete, Column) and the first column, where all the unprintable characters are should disappear.

NOTE: If you needed to break the conversion to table(s) into smaller chunks, repeat step 17 for each table. It is STRONGLY RECOMMENDED that you save the document once the left column is deleted for all your tables (or even after each deletion).

NOTE: The next step – (19.) conversion from table to text requires immense computer resources and this step may fail. As a practical matter, it is not recommended that you try to convert a table longer than about 20 pages (or less) at a time. Key F5, type a page number, click "Go To" and then "Alt-a, t" (Table | Split Table) to split the table into smaller ones. Save when done splitting it up. Do step 19 separately for each table.

19. With your cursor anywhere in a table, select the entire table (Table | Select | Table, or Alt-a, c, t). Click Table | Convert | Table to Text (Alt-A, v, b) and again check to make sure the "Separate at tabs" is still selected.

Hit enter.

Repeat this step for each table in your document.

Save your work. (Recommended: save after each table conversion is completed.)

20. Reopen Edit | Find and Replace (Alt-e, e). Put "^t" in the "Find What" box and a single space " " in the "Replace With" box. Click "Replace All"

21. If all went well, you now have a document with the DOS name of each tune, followed by the text tune name, followed by any keywords for the tune, with each tune in a separate paragraph.

22. I suggest you save with a new filename (Tunes and Keywords.doc or similar).

Your final document, per Word Tools | Word Count should have about 8,982 paragraphs. Due to a quirk in Word's global replace function, it occasionally drops a "paragraph marker" that's "strange" and doesn't get found, so this isn't necessarily an accurate count of the number of tunes indexed. In 10 point type, my conversion is 159 pages, 1,295 KB

You may want to do some additional cleanup and formatting, but that's optional to suit your purposes. There are a couple of places where "strange unprintables" resulted in the DOS name being preceded by a tab. You can remove these "errors" replacing "^p^t" with "^p" – Replace All.

John