HTML ScanningThis section explains how Globalyzer scans HTML source and will instruct you in the process of customizing the Included HTML Tags and the Ignored HTML Tags within an HTML Rule Set. Tag configuration tells Globalyzer how to detect display text within your HTML source - regardless of whether the HTML is embedded within a static HTML file or a jsp, asp or aspx file. Scanning HTML CodeYou can configure a Globalyzer HTML Rule Set to scan HTML code that exists within any web file. These include:
Scanning HTML Blocks for Display TextYour purpose in using Globalyzer to scan blocks of HTML code likely centers around your desire to detect and perhaps externalize to resource files any display strings embedded in that code. You might also want to find the paths to image files, but this can be done with no special tweaking - just make sure all of the image file types you use in your application are included in the Static File References category in your Rule Set. If you are scanning static HTML files, you might simply need to quantify the number of strings that will require translation within a body of static HTML. You might be considering conversion of the static HTML pages to jsp, asp, or aspx files so you can place the display text in resource files and dynamically retrieve the strings at runtime from those files according to the locale of the user. You may be scanning jsp, asp or aspx files that contain blocks of HTML with embedded display text. Regardless of your purpose and regardless of the file type you are scanning, an HTML Rule Set is only concerned about the static blocks of HTML code within your source files. If your files include other in-line code that also contains display text, you will need to create a separate Rule Set for the appropriate programming language and scan the file separately with that Rule Set to detect the embedded strings. In other words, let's say you have a .jsp file that includes a combination of server-side Java code, static HTML code, and JavaScript - all containing display text. To scan - and later externalize - all of the display text embedded in this code sample, you will need to create a Java Rule Set, a JavaScript Rule Set and an HTML Rule Set. Then you will scan your source code with each Rule Set you created. The reason for this is that each programming language is governed by a separate set of rules dictating what defines a "string" or a piece of text that may be displayed to a user. When you scan the above sample of code for embedded strings using a default HTML Rule Set (no customization) you will see the following in the Embedded Strings Scan Results:
How does Globalyzer determine which of the strings embedded in the HTML blocks are displayed in the browser? The basics of Globalyzer's HTML string detection can be summed up in four short rules:
Configuring Included HTML TagsIn the previous section we explained that Globalyzer looks for the inner-most matching tags in the static HTML and grabs any text in between those. The exception to this rule lies in the fact that Globalyzer is configured to include certain tags, and hence even a default HTML Rule Set will grab the following text as a single embedded text block:
That Globalyzer grabs this entire block of text at once is important for several reasons. First, the entire block should be externalized intact because the translator needs to be able to combine the words in the order that suits the user's locale. Granted, the translator won't literally have the user's name - as this is a variable populated at runtime. In this case, the person internationalizing the code will need to do more than simply externalize this piece of display text. For the other two strings detected in the sample HTML, the user could externalize the strings to a resource file as they are. With this string, the developer needs to do a little bit of refactoring before the string can be externalized. This includes calling Java's MessageFormat object, which will allow the translator to not only translate the translatable elements of this phrase, but also to specify the order in which they should be displayed on the page. Similar classes exist within other programming languages such as Microsoft's .NET. Let's now focus on the fact that Globalyzer included the bold tags and the in-line Java in line 28 of our sample .jsp file. Globalyzer will automatically include style tags such as bold tags and italics tags. During the Rule Set creation and editing processes, you can view, edit, delete from and add to the list of tags that Globalyzer will include.
By included, we mean that Globalyzer sees these tags as plain text. The default settings in this category could be sufficient. But if you are using a third-party tag library that includes non-standard tags, some of which you want Globalyzer to include, these can be added to the list. On the other hand, you may consistently delineate cohesive blocks of text with a
tag that Globalyzer is by default configured to include - such as the Configuring Ignored HTML TagsGlobalyzer also has a list of Ignored HTML Tags. Ignored tags and all text in between are removed prior to scanning. The default settings should be sufficient, but you can view, edit, delete from and add to the list of tags that Globalyzer will ignore.
Note: If a tag is configured to be on both the Included list and the Ignored list, it will be ignored. |