Scan HTML

(Need another help file? Try Macrex Help Contents. MACREX help key <CTRL><ALT>F1)

Contents
Introduction
Operation
Explanation of Options settings

Introduction

This feature is designed to produce a skeleton index from a website. It does this by looking at the links in the website. Each link is has a piece of text to describe it which is used as the “heading”. For the locator (page reference) the actual HTML link is used. Often the text automatically chosen as the heading will be irrelevant or unhelpful. The indexer therefore has to edit the skeleton index after it has been generated. To enable this, a special keystroke (CTRL-W) has been introduced, which will display the page of the website referenced by the link in the entry currently being edited. The indexer can thus easily see the material that is being referred to by the link, and can modify the heading (or add new, more relevant headings) accordingly. When editing an HTML index, you can choose to have the hypertext links hidden from view. These links are confusing on the eye, and do not normally need to be changed. If, however, you need to see the hypertext link in full you can display it by pressing <ALT>W, or you can adjust the MACREX General options Menu 2, 6 - HTML Index? to HTML Index - show hyperlinks.

The output will be a HTML file in which each index entry has a link to the appropriate part of the web site.

As distributed the program scans HTML files. However provision is made to modify it for other markup languages.

Since web sites are frequently updated a provision is made so that it can be re-scanned and only data related to new links will be added to the index. Currently deleted links are not detected so will have to be removed manually.

Operation

Make sure you have a local copy of the website on your hard disk and that you know where it is. MACREX is not designed to operate directly on remote websites.

Start a new index, or open the existing html index if you want to update it.

Go to the Utilities sub program (
Utilities->Utilities Menu) and choose H – scan HTML file.

You will see the following menu

SCAN HYPERTEXT SETTINGS MENU

A - Pattern for start of target <a: *name: *=: *"
B - Pattern for middle of target ">
C - Pattern for end of target </a>
D - Text starting link <a href="
E - Text to recognize link <a *href *=?*</a>
F - Help

 

Press <ESC> to save defaults, ^L to load, $ to save option files, ? for help
Change options if needed then press <ENTER> to continue ==>

 

If you are scanning HTML links only, you will not need to change any of this. For information on what the options mean, check options A-E below. For other markup languages these options can be changed and saved (like all the other MACREX options). In this case the changed options can be stored in a file with the extension .scan.

If you need to make any changes do so now. Next press <ENTER> to continue. You will then be asked to locate the drive containing the web site. As an example we will generate a skeleton index of the MACREX help files.  These can be found in C:\Program Files\MACREX\MX9\help or C:\Program Files (x86)\MACREX\MX9\help if you have a 64 bit operating system. The drive to select in both of these examples is C. Once you have selected the drive MACREX will scan the existing index for hypertext references. If you are making a new index it will say 0 hypertext references found in existing index. If you already have an index with links, MACREX will scan the existing index for hypertext references. This is to make sure it doesn’t duplicate any if you are updating an index.

Press
<ENTER> and you will be transferred to the file selector screen. Now navigate to the folder containing the website. Once you are there select any htm or html file – it doesn’t matter which, and press <ENTER>, when you will see the following menu:

+----------------------------------------------+
¦Scan this file only                           ¦
¦Scan all matching files in the same directory ¦
¦Scan a different file type                    ¦
+----------------------------------------------+

Usually you will want to select Scan all matching files in the same directory since most web sites (and help file groups) contain many files.

MACREX will then scan the files and prepare the draft index in a temporary file called
html.mbk. This will then be loaded into the current MACREX index and you may get a number of errors, such as Unmatched {. If you do correct these errors and possibly put a marker in the entry (like !!edit!!) so you can find them later to sort them out.

Once the file is loaded you will be taken to the MACREX edit screen where you will see something like this

1 add characters, [href]
2 add console font, [href]
3    installing dejavu fonts to enable arabic characters, [href]
4 adding entries, [href]
.


You can then edit this as with a normal MACREX index. The marker
[href] shows that there is a hypertext link there which has been hidden. They are hidden by default because they are very long and confusing and you don’t normally need to change them. You will see that entries 5 and 6 are nonsense.

5    &lt;alt&gt;= or &lt;alt&gt;, [href]
6    &lt;alt&gt;l and &lt;alt&gt;&lt;shift&gt;l, [href].

In order to find out what part of the file an entry actually refers to press <CTRL>W while editing it. If you want to see the hyperlink you can press <ALT>W to see the entry with the hyperlink in full. You can also have them permanently displayed  from General options Menu 2. Press <CTRL>W while editing the entry and you can see that in this case the entry is pointing to the use of <ALT>= or <ALT>, for a MACREX soft comma, so you can then modfiy the text to produce a more relevant entry.

Explanation of Options settings

E - Text to recognize link <a *href *=?*</a>
B - Pattern for middle of target ">
C - Pattern for end of target </a>
D - Text starting link <a href="

These options allow MACREX to recognise the link, as it is coded in HTML. The asterisk is a wildcard which will map any number of any character(s). For example the following link opens the help file authority_menu_help.htm. The text between the middle symbol “> and the end </a> is that which appears as the link and is used as the heading in the draft index that MACREX produces.

<a href="authority_menu_help.htm">Authority / Autocomplete Options</a>

A - Pattern for start of target <a: *name: *=: *"

This option allows MACREX to recognise the target of the link – the point to which the index entry takes you.

F - Help

Shows this help file.

Last updated 7 December 2014    Macrex Help Contents