Lyrics & Knowledge Personal Pages Record Shop Auction Links Radio & Media Kids Membership Help
The Mudcat Cafesj

Post to this Thread - Sort Descending - Printer Friendly - Home


Tech: html from a word document

John Hardly 12 Mar 09 - 04:31 PM
treewind 12 Mar 09 - 04:33 PM
John Hardly 12 Mar 09 - 04:39 PM
GUEST,.gargoyle 12 Mar 09 - 04:53 PM
John Hardly 12 Mar 09 - 04:58 PM
JohnInKansas 12 Mar 09 - 06:24 PM
Stilly River Sage 12 Mar 09 - 06:27 PM
John Hardly 12 Mar 09 - 06:36 PM
treewind 12 Mar 09 - 06:48 PM
Joe Offer 12 Mar 09 - 06:50 PM
John Hardly 12 Mar 09 - 06:55 PM
John Hardly 12 Mar 09 - 06:59 PM
JohnInKansas 12 Mar 09 - 07:11 PM
Joe Offer 12 Mar 09 - 07:19 PM
Austin P 12 Mar 09 - 10:12 PM
GUEST,.gargoyle 12 Mar 09 - 10:47 PM
Austin P 12 Mar 09 - 11:20 PM
Stilly River Sage 13 Mar 09 - 01:00 AM
JohnInKansas 13 Mar 09 - 02:45 AM
Austin P 13 Mar 09 - 05:05 AM
JohnInKansas 13 Mar 09 - 05:51 AM
treewind 13 Mar 09 - 06:03 AM
John Hardly 13 Mar 09 - 11:21 AM
Artful Codger 13 Mar 09 - 11:32 AM
Stilly River Sage 13 Mar 09 - 11:43 AM
Austin P 13 Mar 09 - 12:35 PM
Stilly River Sage 13 Mar 09 - 12:44 PM
JohnInKansas 13 Mar 09 - 01:31 PM
JohnInKansas 14 Mar 09 - 01:17 PM
treewind 14 Mar 09 - 01:58 PM
JohnInKansas 14 Mar 09 - 04:38 PM
treewind 14 Mar 09 - 05:32 PM
JohnInKansas 14 Mar 09 - 11:05 PM
GUEST,Carmela 31 May 09 - 12:24 AM
Joe Offer 31 May 09 - 03:52 AM
JohnInKansas 31 May 09 - 06:32 AM
GUEST,Jon 31 May 09 - 06:53 AM
Artful Codger 02 Jun 09 - 09:30 PM
GUEST,.gargoyle 02 Jun 09 - 09:47 PM
Stilly River Sage 02 Jun 09 - 11:30 PM
JohnInKansas 03 Jun 09 - 05:11 AM
treewind 03 Jun 09 - 05:24 AM
Artful Codger 04 Jun 09 - 01:25 AM
Share Thread
more
Lyrics & Knowledge Search [Advanced]
DT  Forum Child
Sort (Forum) by:relevance date
DT Lyrics:







Subject: Tech: html from a word document
From: John Hardly
Date: 12 Mar 09 - 04:31 PM

Does anyone know if there's a simple way to take a document created with Microsoft Word -- a document that has already been formatted with italic, bold, underline -- basically all the formatting that code does...

...and make the text appear with the code so that it could be merely copied onto a web page?


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: treewind
Date: 12 Mar 09 - 04:33 PM

MS Word will export HTML. The code it produces is ghastly but it usually displays pretty much like the original.

Select file->save as and look at the file type options...

Anahata


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: John Hardly
Date: 12 Mar 09 - 04:39 PM

Thanks. The code doesn't appear to be in the document (saved as html) but I'll try to paste it into my blog and see.


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: GUEST,.gargoyle
Date: 12 Mar 09 - 04:53 PM

It should word with MS Front Page which is part of the WORD package.

SIncerely,
Gargoyle


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: John Hardly
Date: 12 Mar 09 - 04:58 PM

Thanks. It doesn't work.

I was trying to shortcut. I have a document that I wrote some time ago. Because it contains lots of italic, bold, and the like, I wanted to find a shortcut to copy and paste it into my blogspot page. But if I copy it into the page, it just ignores anything that's been preformatted. I have to go back in and edit the blog to make bold, italics, and even line breaks appear.

Bummer. How shall I survive?


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: JohnInKansas
Date: 12 Mar 09 - 06:24 PM

If you want to use it for something like a post (but if it's long, preferably not at mudcat) you can just put a <pre> in front of it and close it with </pre> and the "preformatting" will be largely preserved without markups within the text.

When you "save as" html in Word, it stays a word document, but should appear somewhere on your drive as a file with .htm file extension. If you just click to open it, it will default to reopen in Word.

If you right click on the .htm file, you can tell it to "open using" (or "send to") your browser. In your browser you can right click and "view source" where you'll see that it is all "properly" displayed in a humongous ugly mass of preposterously bloated html code. Copy the code from Notepad and paste into a text document (or just "save as .txt" directly from Notepad).

John


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: Stilly River Sage
Date: 12 Mar 09 - 06:27 PM

Take your Word document and save as "web page html" in the options in the "save as" menu. I will tell you that Word is full of formatting crap and it will be a sloppy page. It might mess up a form. Are you simply wanting to make paragraphs, italics, and bold, and use a named text at a specific size?

Does your blog use the carets or the [ and ] symbols? Sometimes that makes a difference.

I'll PM my email. It probably won't take more than a couple of minutes to type in the code by hand, but I also use Front Page, etc. I can convert it, send it to you as a text file, and you can paste that in.

SRS


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: John Hardly
Date: 12 Mar 09 - 06:36 PM

Maggie, it's blogspot. I'm trying to convert most of my website biz to using a blog (lots of reasons many artists are doing the same thing). For one thing, it's more flexible to use when most of my internet biz is being done on the etsy site.

One of my biggest hurdles in doing so is that anything I've composed in Word and have saved to my computer, I need to re-edit once I've put it into the blogspot post box. In fact, I even have to copy and paste one paragraph at a time, or I have to go back and forth to figure out where they were supposed to be. If I don't, no matter how it's save in Word, it pastes into the blog box in one, uninterrupted block of text.

There've been other vexing problems getting used to blogspot -- paragraph breaks I put into the text magically expand or disappear when I publish the post.

It's not the end of the world. It's just more steps. I won't give up. Yet. But I wish I could figure out what others do. They can't ALL be composing in those post boxes (without good editing capabilities) and then publishing. Can they?

I will say, I'm more inclined to give up the blog for want of interest. I'm not sure what other artists are doing to generate interest in their blogs, but just the linkage I'm using is resulting in virtual nuthin'.


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: treewind
Date: 12 Mar 09 - 06:48 PM

When you paste into the blogspot text box, you can only paste the text. There's nothing you can do about that - you have to type any formatting codes in yourself, using whatever convention blogspot uses. It's often not HTML, unlike Mudcat which allows (limited) HTML codes in the text box.

"They can't ALL be composing in those post boxes (without good editing capabilities) and then publishing"

You could compose in a decent text editor (NOT a word processor) but other than that, the answer is yes - they probably are!

I like Notepad2, a perfect replacement for Windows notepad.

Anahata


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: Joe Offer
Date: 12 Mar 09 - 06:50 PM

Hi, John -
What you want to do isn't a particularly difficult process, but it takes a clear head to get the process down. If you type a Word document and choose "save as HTML," Word will produce a complete page file that can be uploaded to a Website, and that page will appear in a browser in almost the same format you saw in Word. BUT, as John in Kansas so colorfully says, Word produces "humongous ugly mass of preposterously bloated html code."

Now, if you view that HTML document in Word or in a browser, and attempt to highlight and copy it, what you copy is the text WITHOUT the HTML tags. You need to view the document with Notepad or "view source" or something that displays all the HTML tags, like the <p> paragraph tag. Same goes for Word documents pasted into a Mudcat message box - if you want the HTML formatting, all those tags have to be visible.

Does that help at all? Don't give up - I KNOW we can talk you through this.

-Joe-


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: John Hardly
Date: 12 Mar 09 - 06:55 PM

Thanks. I'm starting to get the picture.

Curiously, I suppose I could compose in my Frontpage and then grab the code it creates in the split view (the code at the top screen of the split). Right? And if I pasted a word document into the lower, it would give me the code up above. Right?


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: John Hardly
Date: 12 Mar 09 - 06:59 PM

I just tried that. If I copy and paste into a text box in Frontpage, the code will appear in the upper window of the split screen. I can take that and copy it and past it into blogspot and see if that does it. At least blogspot does have an option to edit html. If I convert the post box to that option, my pasted code should work. I'll try.

Sorry for thinking out loud. And thanks for the help.


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: JohnInKansas
Date: 12 Mar 09 - 07:11 PM

When pasting from Word, the "paragraph" mark is a special problem for most html sites. The paragraph end mark in Word is a rather complex special creature that includes both a carriage return and a line feed. A carriage return - in the olden days of lineotype - returns the the start of the same line you were typing in. A line feed drops down one line, but continues from the same place you were at in the line above. You have to have one of each to go back to the left side of the page and drop down a line.

Some browsers are fussy about whether you have a line feed (LF) followed by a carriage return (CR) or a (CR) followed by a (LF), and will "fail" to advance if they're in the wrong order. When working directly from Word, or from some other word processors, if you use two consecutive paragraph breaks, you're sure to get either (CR) (LF) (CR) (LF) or (LF) (CR) (LF) (CR) both of which contain at least one (CR) followed by (LF) and at least one (LF) followed by (CR) so it always works, although occasionally some browsers will give you a "double break" while others only show one. By design, html is supposed to ignore gracefully anything it doesn't understand, so usually just "Enter Enter" will give you a "workable" result.

For more precise control, you can (as we all once did at the 'cat) just type <br> instead of hitting enter.

Back when we all had to code the paragraph breaks my practice was to type in Word and the do an edit|replace, find paragraph ends (you type ^p in Word Find/Replace) and replace all with (<br>^p). Mudcat then ignored a single Word "paragraph" so the document remained readable in Word, but still got the breaks in the post.

As an incidental comment, in Word if you click File|Open you get a little box where you can type the name of a "file" to open. If you type http:\\www.mudcat.org the web page will open in Word. It looks like crap, but Word can (in all versions where I've tried it) display an html page in reasonably adequate manner. As noted above, when you "Save As .html" it may not look like it worked, because when you click on the saved file to reopen it, it opens in Word, so it looks like it hasn't changed anything. Open it in your browser and you can look at the "source code" with a right-click.

John


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: Joe Offer
Date: 12 Mar 09 - 07:19 PM

OK, so this is a shameless commercial announcement, but what the heck?

You could try the Mudcat HTML Guide, which is a pretty comprehensive guide to basic HTML. I think you'll find that learning a few HTML tags is simpler than dealing with the garbage that Word produces.

You'll also find some of the guides in this Google search to be helpful.

It feels really wonderful when you can do your own HTML and not depend on Microsoft to do it for you - and remember this: You can do it better!!!

-Joe-


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: Austin P
Date: 12 Mar 09 - 10:12 PM

I just tried saving a document as HTML using Openoffice thinking that might be netter, but I have to say that while the generated HTML is cleaner than that output by Micro$oft Word, it is still pretty messy.

What they all seem to do is wrap each whole line of text in formatting tags, even if the whole page is the same!

Joe is right, for most text-only purposes you only need to learn just a few 'tags', e.g. bold, italic and underline. You can even do bullet points and lists with a single command.

It only takes 10 minutes to learn and is a lot neater (and easier to 'debug', i.e. edit).

AP


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: GUEST,.gargoyle
Date: 12 Mar 09 - 10:47 PM

Mr. John In Kansas - thank you for the "lineotype" clarification. It fits together - we had a "hot lead" setter in 9th grade that the school bi-monthly newspaper was composed on.

Mudcat taught me all the HTML I need to know...and some consider me a savant.

When I interface with web-designers and tell them:

I only do HTML from "Notepad"

They hold their hands together and bow, and BOW and BOW

Frequently, I find myself typing E-Mails with tags...(which don't work. )

I also did not know that Search no longer required the old compulsory brackets [....] This was not in Mudcat Updates.

The PREVIEW option at Mudcat is also a dandy place to check things...even if you are not posting to Mudcat - if you use it be careful...a slip of the return button could reveal ALL.

Sincerely,
Gargoyle

Just remember to CLOSE your tags....or you look foolish...as I often do...until a benevolent, under-paid, clone cleans up the mess.


HTML tags fixed, to limit "shouting."
-Underpaid and overscorned and always benevolent Joe Offer-


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: Austin P
Date: 12 Mar 09 - 11:20 PM

No need to shout!

;o)


Good man, I use notepad as well. It's the only way ...

AP


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: Stilly River Sage
Date: 13 Mar 09 - 01:00 AM

Notepad is the one I used to convert John's page, and saved in a couple of forms for him to see.

There is another trick. :) OF course there is. . .

If you have a long document, lots of paragraphs, you can simply load it in notepad and type the first <p> tag (I used html to make this appear), then copy it and move your cursor to the beginning of each paragraph and add it in. Then do the same, typing </p>, copying, and pasting at the end of each paragraph.

Or, if you're really lazy, you can load a Word document in Notepad (to scrape out all of the formatting), keeping just the basic paragraph breaks, then select all and copy. Open a new Word page, paste it in. Select it all, then go to the menu above and find the one to convert text into a table, with the new row appearing at each ¶ mark. Select the table yet again and add a new row on each side of this long odd-looking table. Type <p> in the empty top left cell, select and copy, then select the entire column and paste the <p> in every cell at once. Repeat this on the other side--type </p> in the top cell on the right, select and copy, then select the entire column and paste it in.

It takes longer to describe this than it takes to do it.

Finally, select the whole table, and paste it into notepad again. Notepad will scrape out the table and just leave the text with the code in place at the beginning and end of each paragraph. You may have to go remove some blank lines, but not a big deal. If you're doing a long bit of text, this saves a lot of time. In that same front cell you can also add font information, etc.

SRS


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: JohnInKansas
Date: 13 Mar 09 - 02:45 AM

Stilly -

One of the MAJOR FAULTS with Office 2007 is that they split up lots of the menus (that used to be useful and now are just "cute") so that Table operations are now found on three separate "top menus" at opposite ends one-from-t'other. So you ignore the menus and use "quick keys."

In Word, select some text: Alt-A, V, X conVerts the teXt to a table.

Even in Word 2007 you'll get an intermediate menu that lets you choose whether to break by paragraphs, at tabs, or at other (e.g. commas). Earlier Word versions would show you the menu so you could choose the next key. (In Word 2007, since ONLY IDIOTS ARE EXPECTED TO USE IT, there is no menu.)

The inverse, with cursor in a table: Alt-A, V, B conVerts the taBle back to text, again with the option to use tabs, commas, etc for the "cell splits."

With your cursor in a column, Alt-A, I, L Inserts a column to the Left of the one you're in.

Again in a column, Alt-A, I, R Inserts a column to the Right.

If you select a column (Alt-A, S, C) or a row (Alt-A, S, R), you can use "global replace" to change only the stuff in the selection.

An Example:

If you list a bunch of files, one per paragraph, convert to a table of one column. Copy the column and paste it into a new column to the right. You can then select the column on the right and edit all the filenames (with a global replace if appropriate) to get "old name in the left column" and "new name in the right. Add a column on the left in which each cell contains "REN" (without the quotes).

Alt-A, V, B gives you:

REN (tab) oldfilename (tab) newfilename

Global replace tabs (^t in the find box) with a singe space in the "replace with" box, and you have:

REN oldfilename newfilename

with one file per line.

Save as a .txt file in the same folder withe files you want to rename.

In Windows Explorer, change the .txt to .bat and double click it and all the files now have the new names.

You've just created and run a DOS batch file that does something that Windows alone can't do without an excruciatingly traumatic and painful labor. And Word "table functions" - obliterated and hidden in Word 2007, 'cause (according to Microsoft) you're an idiot and aren't supposed to use them - are one key to doing it simply.

John


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: Austin P
Date: 13 Mar 09 - 05:05 AM

JohninKansas: Neat.

Reinforces what I thought - avoid Word 2007 like the plague - I'm sticking with Office 2003! Though these days I also use the OpenOffice WP which is free and works just fine (and is faster than M$ too).

Microsoft keep 'improving' Office in order to sell you the new version when in reality it hasn't been broken for years, and they end up confusing and annoying everyone (well Ballmer's in charge now so no surprise there).

Back in 1991 I was producing official documents for the Navy using Wordstar and Harvard Graphics on a green-screen IBM XT. Printed on a Laser printer they are visually identical to the output of 'modern' WP's (except no colour of course, that came 2 years later). These were big documents too of 200-300 pages.

And the paragraph numbering worked first time, unlike in Word, which still doesn't after all these years ... infix operators ... NROFF ... tag directives ... blah ... blah ... (dragged away by men in white coats).


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: JohnInKansas
Date: 13 Mar 09 - 05:51 AM

I've never had any notable trouble with paragraph numbering in Word, since "3.5 for DOS" although a couple of the common methods are nearly impossible to use in Word 2007. If you apply a "numbered list" you should get a new number each time you hit Enter. If you need a break without a new number you use Shift-Enter to put in a "soft break" instead of a paragraph. I don't often use numbered, or bulleted, lists, as I find I have more control with a {seq paras} field that I can insert where I want it, start or restart with a number I SELECT, and have it update itself whenever I CHOOSE to let it.

Instead of the previous versions' layout where you had a nice set of a half-dozen choices each of which leads logically to another half-dozen, etc., always taking you in no more than two or three steps to a complete task window where you could actually do something ...

Word 2007 has a "toolbar" with approximately 95 indistinguishable "icons." And there are 9 of those "toolless bars" each with an equal number of indistinguishable icons.

Most of the "icons" take you to a bunch of "preset choices" that tell you nothing about what will happen if you click one of them. As an example, if you click on "Table" you get 150 identical little squares as a "visual field" and if you click on one of them it will create an empty table with some number of rows and columns.

Only a total IDIOT creates a table and then tabs through it to fill in the squares. You type the data, tab separated for columns and paragraphs for rows, and then you convert it to a table.

But the command to convert text to a table is on a different "top menu" from the one that says "Table," three levels down through the maze of fuzzy icons, and the command to convert a table back to text is on a third different top menu, at the opposite end three levels down through apparently identical meaningless icons.

The problem is that almost any "professional level" task requires using at least three "top toolbars" and frequently "drilling down" two to five levels in each one to get to the "menus" that are actually useful if YOU want to choose what to do instead of accepting a bunch of kiddy babble defaults.

The single menu that in previous Word versions "popped up" when you clicked on an inserted image, with all the tools you needed, in Word 2007 requires three separate menus (on two different toolbars) to set half of the properties you really need if you want to control all the attributes of the image.

Instead of a tool for doing constructive work, they've made it into a toy for adolescents (and magazine editors) that resembles an ancient gaudy arcade pinball machine, where you click the buttons and "hope that something good happens." Frequently you just get a flashing light that says "TILT" if you're actually trying to accomplish some purpose you've elected in advance. You're supposed to just click things and believe that what it does must be good for you.

What you want to do doesn't matter.

Unfortunately my older Word "went obsolete" and when they stopped delivering updates to keep it compatible with the XP patches, it simply stopped working. It was Word 2002. Word 2003 comes next.

And full support for WinXP is scheduled to end in April of this year.

But they say that Windows 7 will be "better than Vista" ... "soon" ... or at least "someday." I haven't heard of any plans to return Office to something useful.

John


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: treewind
Date: 13 Mar 09 - 06:03 AM

If you want to edit HTML in notepad, get notepad2. It does syntax highlighting, which is the most useful feature any text editor can have beyond plain editing - it saves you from making hundreds of silly mistakes before you put them on the web. I've used it for writing programs in C too. And it's free, of course. It even comes with instructions for replacing notepad with it.

BTW, Open Office is quite usable and free too, if you're fed up with throwing good money after bad with MS Office.

Anahata


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: John Hardly
Date: 13 Mar 09 - 11:21 AM

Stilly's modification worked like a charm. I changed the blogspot box to "edit html" and pasted the document as she had prepared it. It came out perfectly.


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: Artful Codger
Date: 13 Mar 09 - 11:32 AM

There are a couple dangers trying to copy text from generated HTML. First, you need to make sure that all the tags are balanced, i.e. that every start tag has a matching end tag, and vice versa. It can often happen that the text at the start of your extract will be at a deeper or shallower level than the end.

Second, it is increasingly common to place as much formatting as possible in separate style sheets, and simply cross-reference the formatting you use through named references within tags. When you copy the text, all these references will be broken, unless your destination document references the same style sheet, or an equivalent one using the same names the same way. Obviously, that won't be true when pasting into a Mudcat message entry box.


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: Stilly River Sage
Date: 13 Mar 09 - 11:43 AM

John, I'll read your description of tables later, I'm passing through quickly to see if the formatting worked for John. Glad to see it did.

I haven't tried this table trick recently, and I recently installed MS Office 2007 (it is now updated one the computers at work, so I ended up having to upgrade here also). Have you tried any of these things in OpenOffice? It behaves more like Word 2003. I'll poke around with your batch file and also look a Open Office and see if there is a preferred way to do this. (I used to have some useful batch files I wrote in DOS decades ago. . . I haven't written any for Word. I'm glad to see it's still possible.)

SRS


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: Austin P
Date: 13 Mar 09 - 12:35 PM

Stilly: "so I ended up having to upgrade here also" [my emphasis].

In a classic M$ move Office 2007 and older versions are incompatible out-of-the-box (2007 uses the XHTML, not .doc et al format).

Many people like me are refusing to play their game.

But: "Users of pre-Office 2007 Windows versions of Word, Excel, and PowerPoint can use the Microsoft Office Compatibility Pack for Word, Excel, and PowerPoint 2007 File Formats, to make it possible to view and create files in the new Office 2007 file formats."

Typically, M$ don't encourage you to do this. They want your money.

You may call me a freetard, but it seems that the whole world has been seduced by (a) the triumph of form over substance (eye candy vs. content) and (b) interoperability (read: everyone has to use Microsoft products of all the same version).

Mutter ... grumble ... gripe ... .rtf ... ASCII ... (goes on for 94 pages)

;O)
AP


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: Stilly River Sage
Date: 13 Mar 09 - 12:44 PM

I used the compatibility pack for a while, and it worked, but I was running more and more into things that I needed to see how it looked in Word 7. The choices I make for work and the choices I make for home computing aren't always identical. Since I do a lot of work on my home computer (though now I'm using a work laptop for some of it) I made the change. I'll get over it.

SRS


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: JohnInKansas
Date: 13 Mar 09 - 01:31 PM

Users of older Office versions can download the free compatibility pack to be able to read documents (including spreadsheets from Excel, garbage from Project, etc), but the simpler option is for the users of the new Office 2007 to set their default to save everything in "compatible mode." Word 2007 offers 15 separate file types you can use when saving, and one is "Word 95 - 2003" that should be readable in any of the older versions, without changing anything on the older computers.

When you save to the "compatible" .doc format, you may get annoying messages claiming that "certain features will be lost if you do this" but I have yet to find a "feature" in Word 2007 .docx, .docm, etc that is useful to me. The messages should go away if you set .doc as the default file type for Word 2007.

I can see some potential for using some of the "new features" but no actual need for them in my current tasks, as I don't correspond with too many illiterate adolescents (or magazine editors).

John


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: JohnInKansas
Date: 14 Mar 09 - 01:17 PM

Before this thread drifts out of sight, and nearer to the original subject, some people might be interested in an article in the current issue of American Scientist magazine about the difficulty of displaying mathematical equations on the web.

Writing Math on the Web has the comment:

"The Web would make a dandy blackboard if only we could scribble an equation"

The author, Brian Hayes, is a regular writer for the magazine, and gives a good discussion of why, at present, there isn't really a good way of displaying "two dimensional" information in html (or variants of html currently available). Although he doesn't touch on problems other than mathematical equations, the problems associated with all the subscripts and superscripts, odd characters, fractions, matrices, etc. needed to display complex equations are very much the same as the "why can't we write music" one.

One approach, used by some, is to "print" the equations to a graphic "clip" and link it into the html as a "picture." This approach is, of course, one that we can't use at mudcat, and there are lots of other places where it's not appropriate.

Web pages as PDF files can be used, but only some browsers can display them directly, so a separate "reader" program is usually used, even when the browser might be able to display the .pdf without one.

The writer can, to some extent "hand craft" reasonably comples equations using complex tables, but doing even a "decently adequate" job of it is very much "labor intesive" and browser capabilities or setup may not display what's intended.

He gives a brief discussion of TEX and LaTEX, pointing out that these both are designed for printing stuff, and while they work - sort of - for some web sites they require the reader to have appropriate fonts installed on their own computer, and common font substitutions, even for "nominally the same" character sets often give the reader/viewer a different result than intended by the writer. Both TEX and LaTEX use "defaults" for some parameters that are almost always correct for printers, but frequently aren't optimal for viewers.

A couple of other programs are described, with options to have the "code" interpreted by the writer/builder, by the server, or by a user; but all of these methods require additional special programs, plug-ins, clip-ins, or other "additional resources" not commonly available, and load additional "burden" on hardware and bandwidth resources.

Current versions of the html spec permit you to "embed" the fonts that you want the viewer to use, but this puts an additional burden on server bandwidth that is prohibitive for many sites, and - once again - not all browsers are set up to use the embedded characters the same.

The article won't solve anyone's problems, but those interested in "generalities" might want to take a look.

Note that many articles on the site are restricted to "registered users only." This one isn't tagged as being restricted, so I think anyone can get a look at it. There's a button at the top for a "printable view" or a more convenient link (IMO) to download a .pdf in the sidebar on the right.

John


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: treewind
Date: 14 Mar 09 - 01:58 PM

Equations...
If you aren't allowed graphics, and you also can't rely on the user having the fonts required, that doesn't leave a lot of wriggle room.

TeX was invented specifically for typesetting mathematical equations because there wasn't a good way of doing it. It's true that it was designed for print - the web didn't exist then. It still hasn't been bettered for scientific document production - people who've learned to use properly it laugh at their colleagues struggling with WYSIWYG word processors.

Maybe SVG* is the way forward fo that one. Sending a few symbol definitions as scalable outlines would be more efficient than bitmap graphics, and would display optimally on any screen resolution. Which is part of what PDF does...

Anahata
* scaleable vector graphics


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: JohnInKansas
Date: 14 Mar 09 - 04:38 PM

Anahata -

Glad to see that you read the article.

TEX is used by several technical publishers, but not by anywhere near a majority of them, even for print. There are at least four or five competing systems used widely enough to be fairly well known. All of them have fairly steep "learning curves" and all of them require programs - drivers and interpreters, as a minimum - that are not widely available. Even when the necessary software is "free" there's still setup and learning "expense" to be considered.

As to WYSIWYG word processors, I've been able to write those equations in Word quite nicely since the olden days of DOS. The learning curve is possibly less steep than for TEX, if you choose to do it that way, although the instructions have been omitted in the newest version. (You DO NOT do it in Word by using the built-in "Equation Editor." Like most "features for idiots" that have proliferated in newer Office versions it's a piece of crap, IMO.)

As with TEX, the Word equations display properly in the program where you create them, and they print accurately, but they're marginally useful in html. Some of the "add on" programs described in the article let your browser "read TEX code" if I read the meaning right; but it's still something "extra beyond html" that works only for those few who have the add on.

John


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: treewind
Date: 14 Mar 09 - 05:32 PM

"Glad to see that you read the article."
Er, yes, I did read it, but after I'd posted...

A.


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: JohnInKansas
Date: 14 Mar 09 - 11:05 PM

Well at least "you and I and a guy named Brian" seem to be pretty much agreed on why we can't scribble (math or music?) easily on the web.

I think the limitations on what we can do easily are pretty well known. Most of the attempts at a "two dimensional" web language seem to have been done by academics, with original uses intended for "local network" kinds of applications. (I'd include sending to hardcopy printers as a local kind of thing, since you have to sign a contract and agree on a format to do it, and it's one-to-one communication.)

The real limitations on the methods available now are mostly "bandwidth" limited:

... not enough speed to send the more complex stuff back and forth,
... not enough space to store the extra boilerplate,
... and not enough interested and skilled users (with money) to get to any sort of "critical demand mass."

John


Post - Top - Home - Printer Friendly - Translate

Subject: Anti allergic
From: GUEST,Carmela
Date: 31 May 09 - 12:24 AM

Good afternoon. Enjoy your own life without comparing it with that of another.
I am from Liberia and , too, and now am writing in English, give please true I wrote the following sentence: "To address the ?anti allergic effects of antihistamines,? a workshop was convened in new orleans at the time of the th annual meeting of the american.Kaboodle fogarty anti allergy cot mattress protector review and product info."

Regards 8-) Carmela.
    Post undeleted, since I can't quite figure out what Carmela is talking about - but I'm grateful for the thread being resurrected.
    -Joe Offer-


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: Joe Offer
Date: 31 May 09 - 03:52 AM

I can't realy answer Carmela's question (and maybe it's just garbage I should delete), but I'm glad she refreshed the thread.

I found out something interesting. I typed and formatted a document in Word - no need to save it. Then I copied the text and pasted it into a new Hotmail message, with the settings in the normal "rich text format." Then I selected "edit in HTML, and the HTML formatting tags appeared, ready to paste into a Mudcat message or wherever, without all the extraneous HTML that Word adds if you save the document as HTML.

It sounds convoluted, but it's actually quite simple. You can also type your text in a Hotmail message and use Hotmail's formatting utilities, and then just select "edit in HTML" and you'll have your text with formatting.

-Joe-


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: JohnInKansas
Date: 31 May 09 - 06:32 AM

I'm not sure I can tell whether GUEST,Carmela is asking a question or just making a comment.

I do find the LITTLE FOGARTY ANTI ALLERGY COT BED QUILTED MATTRESS PROTECTOR at Amazon.

In the post here, I see:

?anti allergic effects of antihistamines,?

with ? where I would expect something else.

This could be the result of the "Regionalization" of computers, and especially of Microsoft products. A person in Liberia most likely will have a computer, keyboard, and programs, specific to their locale, and some character glyphs (the "picture that is shown on the viewing screen" when a specific key is hit) may be associated with a different "character number" than is used elsewhere.

The intention might have been to enclose the text in "single-quotes" to be:

'anti allergic effects of antihistamines,' ...

In this case, in Word and other word processors, it usually is possible to use "curly quotes," rather than the simpler straight ones. (Microsoft Word calls them "smart quotes.")

HTML recognizes only the "straight quotes" in coding, so curly/smart quotes need to be turned off when posting, especially if you are composing in a word processor that can use them and then copying to a post. In Word, this setting is usually in the "Autocorrect as you type" preferences (Tools menu in older Word versions, I believe. In Word 2007 you click the cow-splat, then select "Word Options" down at the bottom, and at "Proofing" click the "Autocorrect Options" button.)

Some HTML "readers" - i.e. browsers and web sites - can recognize curly double-quotes and replace them with straight quotes to avoid "breaking up the code" in a post; but even those that handle the double-quotes may fail to make the same kind of conversion for curly single-quotes, in which case usually an "unknown character" glyph appears.

One may "code" the curly quotes, if they actually are needed in text:

&#145; typed into a post should give a Left single quote = ‘
&#146; typed in should give a Right single quote = ’
&#147; typed in should give a Left double quote = “
&#148; typed in should give a Right double quote = ”

&#034; typed in should give you the straight quote = "
This is the one you want to use in "html code" and it can also be entered as &quot; ( " ) or as &#X022; ( " ).

There is no "straight single-quote" mark in standard html, and the "apostrophe" is used, coded &#039; ( ' ). This is what Word uses with smart quotes turned off.

(The curly quotes above may not display on all browsers, depending on whether the default font selected contains them.)

It's also possible that a "regionalized" computer/program/keyboard uses a different character number for the character. When this is done, the character number often is one "unassigned" in the Unicode standard, and the same kind of "unknown character" symbol will be displayed by the readers' browsers.

To some extent, with Microsoft products at least, many of the "regionalized" differences can be overcome by selecting a different language in the programs and/or in the Operating System; but there are limitations on how completely the "regional artifacts" can be removed. Sometimes improved results are obtained by adding a language but for some combinations the "new" one needs to be set as the default languge.

John


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: GUEST,Jon
Date: 31 May 09 - 06:53 AM

I think what you are commenting on above, John, is affected by the character encoding of the document.

I'm not sure what will happen with this.

&lsquo;hello&rsquo;
‘hello’

&ldquo;hello&rdquo;
“hello”


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: Artful Codger
Date: 02 Jun 09 - 09:30 PM

Character encoding doesn't matter if you use the numeric or named escapes ("character references"). HTML uses Unicode values; any name or encoded value maps to one and only one Unicode character. Fonts typically only support a subset of all the Unicode characters, but virtually every modern font will support all of these quote characters, because they occur so frequently. A user's locale-specific quoting convention (angle bracket quotes, or bottom-aligned opening quotes) has no effect on the quotes as encoded.

Unless the file encoding is specified at the start of an HTML file (using a special directive), you can only use the ASCII characters in the 7-bit range (0 to 127). The apostrophe and the straight double quote are the only quote characters which fall within this range. And only these quotes can be used to delimit string values within HTML tags (i.e. within angle-brackets). Whenever you want a literal quote, you're supposed to encode it (as "&apos;" or "&quot;"), but HTML processors will seldom complain about bare quote characters outside of tags.

The other quote symbols don't fall within 7-bit ASCII range, so they must be escaped (like "&ldquot;") to ensure they will be properly handled. This also applies to any other character whose Unicode value is beyond 127: accented characters, most symbols, non-Roman characters, language-specific characters...

You don't have any control over the encoding of Mudcat messages, so here you always have to encode special characters, including word processor quotes. A lot of folks just copy-n-paste from their word processors without previewing, then they wonder why their messages end up with a bunch of question marks, or why their foreign language text is wonky when other people try to read it. Basically, it's because of these encoding issues. I've provided Java and Python programs which you can use to properly escape raw text on the clipboard before pasting it; search for "htmlesc".


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: GUEST,.gargoyle
Date: 02 Jun 09 - 09:47 PM

Mr. Codger

RE: I've provided Java and Python programs which you can use

Please provide the link.

An example page would also be helpful

What about Arabic?

Sincerely,
Gargoyle


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: Stilly River Sage
Date: 02 Jun 09 - 11:30 PM

Who is Carmela?


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: JohnInKansas
Date: 03 Jun 09 - 05:11 AM

SRS -

A post by GUEST,Carmella was noted by JoeO(?) as a possible spam that he considered deleting. The poster claimed to be from Ethiopia(?) and was just "learning to use English," citing(?) a paper purported to have been delivered at a US conference apparently sponsored by "Little Fogarty." The post appears to not appear now.

On the subject of "broken characters" ...

From old notes:

List of languages supported in Windows 2000, Windows XP, and Windows Server 2003

How to change your keyboard layout

Windows Keyboard Layouts

[Note that the last page indicates that you must use Internet Explorer, and you must turn off popup blockers or put the site in your trusted sites list in IE. When you choose one of the 118 keyboard layouts, a popup appears showing the keys on the keyboard selected. If you hover your mouse pointer over the Shift key, and on some keyboards the Alt-Gr key, the shifted keys are displayed. The popup window can't be resized, so you may have to use the zoom control at the bottom right to drop back to about 75% zoom to fit the whole keyboard in the popup window.]

"Regionalized" computer products (hardware and software) may be adjusted so that a specific key produces the same "keyscan" code but calls a different (Unicode) character from the font maps that the the computer loads in order to know what glyph to print. The keyscan codes may also be altered in hardware. In some cases loading a different language will produce all of the character glyphs appropriate to that language; but in some cases either hardware changes (a different keyboard) and/or "synthetic accomodations" such as mapping some characters to unusual places on the existing keyboard may be required.

It is often not just a case of getting the right Unicode char number; but may be a matter of ferreting out what characters that are a single keystroke in common languages must be coded when using another "regionalization."

A Microsoft Press book titled Developing International Software purports to explain all this; but like most MPress books it's vastly overpriced and underinformative. (And it's ~1,060 pages of vaguaries and vacuities.) There is some useful content, but the publisher couldn't make it work to print the book (personal knowledge) and had to "punt" all the non-US-English characters as individual "artwork."

Vista claims to have "better" accessability to international conversions than the WinXP and Win2K that produced the book, but nothing I've found published explains "how it's better" - although we are assured Windows 7 will be "better than Vista" but arent' told how or why except in the "child-babble" used by a few tech editors or in the ad-speak that's currently the only language used by Microsoft.

John


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: treewind
Date: 03 Jun 09 - 05:24 AM

If you do need to post the odd unusual character properly using it's numeric code (whether to Mudcat or in any other HTML) The Unicode Charts (unicode.org) page is a good reference. THE reference, I suppose.

Anahata


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: html from a word document
From: Artful Codger
Date: 04 Jun 09 - 01:25 AM

Threads for my scripts:
Python (Mac only)
Java (Should work cross-platform)

If you use and like them, please let the moderators know to capture them in some more easily findable place. My search for "htmlesc" only found one of the threads.

Gargoyle: Yes, the scripts also handle Arabic, Cyrillic, Devanagari... Should even handle the extended (4-byte) sequences for Japanese and Chinese characters, though I've never tested this. If you can get text onto the clipboard in Unicode (which happens pretty much by default these days), these scripts should be able to handle the HTML escaping.

Whether the encoded characters are viewable depends on the font the user selects for display.


Post - Top - Home - Printer Friendly - Translate
  Share Thread:
More...

Reply to Thread
Subject:  Help
From:
Preview   Automatic Linebreaks   Make a link ("blue clicky")


Mudcat time: 28 April 6:31 AM EDT

[ Home ]

All original material is copyright © 2022 by the Mudcat Café Music Foundation. All photos, music, images, etc. are copyright © by their rightful owners. Every effort is taken to attribute appropriate copyright to images, content, music, etc. We are not a copyright resource.