Jump to content

Making a User Tool Pt. 3. Importing


Timothy Jenney
 Share

Recommended Posts

The most sophisticated importing feature in Accordance is available for HTML files (.htm extension). I regularly use Word, so converting .doc or .docx file types to .htm is fairly easy. I just use Word 2008 and "Save as Web Page...." The only problem is that Word inserts carriage returns [CR] after each line in order to preserve the layout of the document. That's fine for a web page, but a problem in Accordance. That's because User Tools have re-sizeable windows, so we want the text to flow properly as the window resizes.

 

Currently Accordance (8.4.7) [correctly] removes carriage returns. Unfortunately, it does not insert a space in their place. [i've been assured the next update to the program will do so.] In the interim, users will have to manually insert a space between the last word on the line and the first word on the next line.

 

Another option is available though. That's to use a text editor like TextWrangler ( http://www.barebones.com/products/textwrangler/ ). Use the Find command (Cmd-F) and follow the instructions on "TextWrangler Setting.png" below. It will substitute spaces for all the carriage returns in your document. Make these changes, save your document, and then convert it in Accordance. It will work perfectly.

post-29215-128060972047_thumb.png

Link to comment
Share on other sites

Thank you for addressing and explaining this problem of the dropped spaces when creating user tools. I have mentioned this on the forum before and it seemed to me that I was the only person experiencing this problem. I resorted to searching for single spaces and replaces with double spaces. While this added a lot of extra spaces, at least the words did not run together. Thanks for explaining very clearly what is taking place and offering a solution. As always, your work with Accordance is greatly appreciated.

 

Tom

Link to comment
Share on other sites

Accordance does not currently offer direct import from .pdf documents. That's a shame, as more and more .pdf documents are becoming available. Happily, there are now a number of .pdf to .htm converters available for Mac users. I've found the following programs, but have not had a chance to evaluate them yet:

deskUNPDF: http://www.docudesk.com/deskunpdf_product_home.shtml [free trial download]

SOLID PDF to Word for Mac: http://www.mac-pdf-converter.com/ [free trial download]

recosoft ( http://www.recosoft.com/ ) offers a suite of products including both conversion to Ms Office and Apple iWork formats.

 

If you just need a single document converted, there are a number of web sites currently offering the service for free. [i'll let you do your own search for those.]

 

If you've found a good converter I haven't mentioned, or had experience with any of these products, feel free to chime in with your thoughts.

Edited by Timothy Jenney
Link to comment
Share on other sites

Thank you for addressing and explaining this problem of the dropped spaces when creating user tools. I have mentioned this on the forum before and it seemed to me that I was the only person experiencing this problem. I resorted to searching for single spaces and replaces with double spaces. While this added a lot of extra spaces, at least the words did not run together. Thanks for explaining very clearly what is taking place and offering a solution. As always, your work with Accordance is greatly appreciated.

 

Tom

 

Thank you, Tom.

 

I confess I am just passing this information along from David Lang and Rick Bennett, both of whom are far better at this sort of thing than I am. Reminds me of a good quote I heard during my doctoral work, "If I have seen further than others, it's because I have stood on the shoulders of giants!"

Link to comment
Share on other sites

If you've found a good converter I haven't mentioned, or had experience with any of these products, feel free to chime in with your thoughts.

 

Here is one that converts from PDF to Word. It is free until August 8th (normally $39.95):

 

http://www.anypdftools.com/pdf-to-word-for-mac.html

 

I have played with it a little. Seems to do a nice job. PDF -> Word, then you can export it from there . . .

 

Tom

Link to comment
Share on other sites

I agree, this seems to be working well, and should be a helpful tool for those who want to convert a PDF into a User Tool.

 

Note: the free offer is really nice, but read the letter with the registration code carefully to save yourselves the hassles I had from not doing so. You register with their email address, not yours, and you right-click to paste the text into the box.

Link to comment
Share on other sites

This podcast is now posted. It covers importing plain texts, html and TLG directly into a User Tool, rounding out our three-part series on "Making a User Tool." I hope we will see many more of our users making their work available to others on the Accordance Exchange: http://www.accordancefiles1.com/exchange/

Link to comment
Share on other sites

Thank you for your Podcasts. I've really enjoyed the ones I've been able to watch.

 

I just finished watching this episode, Making a User Tool Pt. 3. Importing. It looks like the text converted back to Yehudit got swapped directions one too many times. Is there an easy way to correct this or prevent it from happening?

 

Thanks,

John S Gilliom

Link to comment
Share on other sites

Thank you for your Podcasts. I've really enjoyed the ones I've been able to watch.

 

I just finished watching this episode, Making a User Tool Pt. 3. Importing. It looks like the text converted back to Yehudit got swapped directions one too many times. Is there an easy way to correct this or prevent it from happening?

 

Thanks,

John S Gilliom

 

Thanks so much, John!

 

It's all about the interaction between Accordance and the specific word processing software used in the original document. I'm one of the few people here at Accordance that use Word. One method I've been experimenting with is to keep the original document open in Word, then using copy and replace for particularly problematic words.

 

As you may know, Hebrew is a particularly difficult language for most word processors. Many of the folks here at Accordance recommend Mellel (http://www.redlers.com/ ) for that reason.

Link to comment
Share on other sites

  • 2 weeks later...

Hello Dr. J (et al),

 

Two things:

 

1) Text wrangler is a good piece of software. However, I upgraded to BBEdit long ago just exactly for stuff like this. Not only can you get to remove carriage returns and stuff, but it also cleans up plain text documents and turns them into decent html code.

 

2) It has been a year or so since I used Accordance to import from html. The problem I ran into is that there is no support for tables and indented text (either in indented block paragraphs in outline style or in nested lists). Have these issues been addressed? I love the user tool feature. But, until it's able to make use of basic tables and indentations, it remains for me a tool I can use with some documents, but not with most documents (because they almost need to have these two features to look right).

 

If you'd like to see what I'm talking about in detail, you can download the attached zipped file. The html file looks ok. But the imported user tool doesn't look right. The borders on the table don't exist and the spacing is off. Likewise the nested bullets are flattened out.

 

Like I said, I love the user tools. I'd just like to see the user tools mature into a more usable feature for Accordance.

 

thanks,

DS2-HYMNS.zip

Link to comment
Share on other sites

We do plan to improve the importing, though busy with other features right now. I cannot promise that we'll include tables though. HTML is such a messy and variable format that it is a major challenge to import it. Just look at all the attributes that it can give tables!

Link to comment
Share on other sites

Good morning Helen,

 

Html is messy. And the tables are probably one of the messiest parts. I'll grant you that. However, it would be good to have the ability to make some basic tables without any frills or thrills. When your people get around to it, it would be good to start a discussion on some basic standards to adhere to, so that we know what to make the tables look like in html before we import them into Accordance.

 

Thanks.

 

 

 

Link to comment
Share on other sites

I think that most of our users wouldn't know how to edit an HTML file to conform to basic standards. They either download them or export to HTML from another format.

Link to comment
Share on other sites

Outis,

 

I do want to remind you that you can insert a link to a file on your hard drive in a user tool. That link can open a table or photo

Link to comment
Share on other sites

  • 2 weeks later...

For those of us who have large numbers of documents in Mellel format, is there a known way to convert a .mellel to a .htm for user tool import?

Link to comment
Share on other sites

  • 3 years later...

 

Hello

 

I know this topic is a little bit older, but I have some problems.

 

Mac OS 10.9.3, Accordance 10.4.2.1, MS Word 2011 14.4.2

 

When I copy text from the Internet to MS Word, save it as .htm with UTF-8 and import it to a user tool in Accordance then I have:

 

1. that some space between the words are missing e.g. Jesuslovesyou (not always but often).

2. some Umlaut and ß for the german ss and other special character like ñ, ń, etc. are missing. e.g. Matthus instead Matthäus (always)

 

 

Anther thing:

I have also often problems to copy txt text in the user tool with Umlaut etc. then I have to search and replace all the ä, ö, ü, Ä, Ö, Ü, ß etc.

 

Does Accordance or another user have some solutions for that problems?

 

Were it possible to make .xml to import or has that other problems?

 

Greetings

 

Fabian

Link to comment
Share on other sites

Fabian, in order to have Accordance recognize the file as UTF8, be sure to include this line in the html header:

 

Link to comment
Share on other sites

In order to solve the space problem I open the htm file in TextWrangler (which is a free download) then from the 'Text' menu I select 'Remove Line Breaks'.

 

After you save the file import into Accordance and spaces should be OK.

Link to comment
Share on other sites

Dr J,

Can you provide a link for the latest podcast user tools pt 3?

Link to comment
Share on other sites

 

Hi

 

Joel

 

I see this in my header when I save as .htm and after go to the html source in MS Word.

 

<meta http-equiv=Content-Type content="text/html; charset=utf-8">

 

its not the exact the same like your.

 

If I change to your tag then It looks a little better:-)

 

see the original:

 

Test

 

 

 

lstuwvao

 

öäü

 

 

ßßßßßß

((((

))))

≪≪

≫≫

<

y

<>

ñ

 

and from the user tool:

Test

lstuwvao

öäu

ffffff

((((

))))

j"j"

k"k"

<

y

<>

Ò

äöü comes good but the rest not.

It is possible that Accordance can handle what MS Word or Libre Office does tagged?

MS Word

<meta http-equiv=Content-Type content="text/html; charset=utf-8">

So we not have to change always the tag.

Thanks for your help.

Link to comment
Share on other sites

 

No sorry the problem is still there.

 

I've seen that it has it not saved with your solution. MS Word give then an alert. It has it changed from utf-8 to unicode.

 

Is it possible that the different language in the header is a problem?

Link to comment
Share on other sites

Fabian, do you mind uploading your html file so we can take a look for ourselves? Also, if you are just copying and pasting, why not copy and paste into the user tool directly?

Link to comment
Share on other sites

Hello Joel


1. At the moment I can only a few lines copy into a user tool. Even if the text is longer, a bug too. see below

2. with copy over .docx or .txt it comes not correct too, look at this ⬇︎ the Umlaut (and also the ß for the german sharp S comes as fi )

A [1]
[1] 1. A, a, der erste Buchstabe des lateinischen Alphabets. Als Abk¸rzung: 1) = der Vorname Aulus.

2) = Antiquo (ich verwerfe den neuen Vorschlag), auf den Stimmtafeln in rˆm. Volksversammlungen.

3) = Absolvo (ich spreche frei), auf den Stimmtafeln der Richter; dah. A gen. littera salutaris bei Cic. Mil. 15.

4) vor Zahlen, Jahresbezeichnung

 

more than this goes not at once. And sometimes only a few character

 

3. copy direct loose the format (bold, italic, h1, links)

4. So it were be nice if it goes over MS Word as .htm

over TextEdit or Libre Office it goes not, because they save it as .html and Accordance can't recognize it.

 

Joel, I'm not permitted to upload a .htm file in the basic and advanced uploader.

Link to comment
Share on other sites

  • 2 weeks later...

Fabian, We've had a chance to look at your HTML file. The header used there is fine - Accordance is recognizing it as UTF-8. We've found two bugs in the import. First, it was stripping line breaks that weren't marked with a
, rather than turning them into spaces. This caused your words running together. Second, if it encountered one unicode character it couldn't handle (ā) it would skip all of the other unicode characters it can normally handle (like ß or greek characters).

 

Both of these bugs have been fixed for the next release, so you should experience a much smoother job soon.

Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...