Unexpected diffs when rebuilding

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Unexpected diffs when rebuilding

Daniel Ruggeri-3

Hi, all;
   I am preparing to T&R 2.4.44 and am concerned with some of the diff output I see after rebuilding docs. FYI: I've migrated from OpenJDK 8 to OpenJDK 11 since my previous rebuild of the docs (which means I had to drop the Xbootclasspath argument)

The output I am seeing does not render properly in my terminal (guess it doesn't support ISO-8859-1), but it seems like the original file is 'correct'. However, when I rebuild the docs, these characters are HTML encoded rather than ISO-8859-1. Is this expected? I've double checked the README and nothing stands out. Perhaps something related to JDK11 move and ditching Xbootclasspath?


Example:

Index: manual/vhosts/name-based.html.en
===================================================================
--- manual/vhosts/name-based.html.en    (revision 1880272)
+++ manual/vhosts/name-based.html.en    (working copy)
@@ -25,10 +25,10 @@
 <div class="toplang">
 <p><span>Available Languages: </span><a href="../de/vhosts/name-based.html" hreflang="de" rel="alternate" title="Deutsch">&nbsp;de&nbsp;</a> |
 <a href="../en/vhosts/name-based.html" title="English">&nbsp;en&nbsp;</a> |
-<a href="../fr/vhosts/name-based.html" hreflang="fr" rel="alternate" title="Fran□ais">&nbsp;fr&nbsp;</a> |
+<a href="../fr/vhosts/name-based.html" hreflang="fr" rel="alternate" title="Fran&#231;ais">&nbsp;fr&nbsp;</a> |

<snip>


Apologies if this was discussed already - I only stumbled upon it as I tried to T&R just now.

-- 
Daniel Ruggeri
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected diffs when rebuilding

Eric Covener
On Fri, Jul 24, 2020 at 1:25 PM Daniel Ruggeri <[hidden email]> wrote:
>
> Hi, all;
>    I am preparing to T&R 2.4.44 and am concerned with some of the diff output I see after rebuilding docs. FYI: I've migrated from OpenJDK 8 to OpenJDK 11 since my previous rebuild of the docs (which means I had to drop the Xbootclasspath argument)
>
> The output I am seeing does not render properly in my terminal (guess it doesn't support ISO-8859-1), but it seems like the original file is 'correct'. However, when I rebuild the docs, these characters are HTML encoded rather than ISO-8859-1. Is this expected? I've double checked the README and nothing stands out. Perhaps something related to JDK11 move and ditching Xbootclasspath?

I think this is the issue in the long running thread on this list.
I personally think this change in anchor title in *.en files is
harmless.
I am curious if you also  get manpage entries changed after the build.
Those would need scrutiny I guess, although in that case the english
is probably not an issue but only other languages where we might find
the wrong codepage or HTML entitites?

I also think we should drop xbootclasspath and bail out if java < 11
so people are less likely to waffle between the two flavors.
...And spot-check the output if someone can articulate a problem with
some encoding.




>
> Index: manual/vhosts/name-based.html.en
> ===================================================================
> --- manual/vhosts/name-based.html.en    (revision 1880272)
> +++ manual/vhosts/name-based.html.en    (working copy)
> @@ -25,10 +25,10 @@
>  <div class="toplang">
>  <p><span>Available Languages: </span><a href="../de/vhosts/name-based.html" hreflang="de" rel="alternate" title="Deutsch">&nbsp;de&nbsp;</a> |
>  <a href="../en/vhosts/name-based.html" title="English">&nbsp;en&nbsp;</a> |
> -<a href="../fr/vhosts/name-based.html" hreflang="fr" rel="alternate" title="Fran□ais">&nbsp;fr&nbsp;</a> |
> +<a href="../fr/vhosts/name-based.html" hreflang="fr" rel="alternate" title="Fran&#231;ais">&nbsp;fr&nbsp;</a> |
>
> <snip>
>
>
> Apologies if this was discussed already - I only stumbled upon it as I tried to T&R just now.
>
> --
> Daniel Ruggeri



--
Eric Covener
[hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Unexpected diffs when rebuilding

Daniel Ruggeri-3

On 7/24/2020 5:27 PM, Eric Covener wrote:
On Fri, Jul 24, 2020 at 1:25 PM Daniel Ruggeri [hidden email] wrote:
Hi, all;
   I am preparing to T&R 2.4.44 and am concerned with some of the diff output I see after rebuilding docs. FYI: I've migrated from OpenJDK 8 to OpenJDK 11 since my previous rebuild of the docs (which means I had to drop the Xbootclasspath argument)

The output I am seeing does not render properly in my terminal (guess it doesn't support ISO-8859-1), but it seems like the original file is 'correct'. However, when I rebuild the docs, these characters are HTML encoded rather than ISO-8859-1. Is this expected? I've double checked the README and nothing stands out. Perhaps something related to JDK11 move and ditching Xbootclasspath?
I think this is the issue in the long running thread on this list.
I personally think this change in anchor title in *.en files is
harmless.

Many thanks, Eric

Yeah... I figured it was all related. For clarity, there are hundreds (maybe thousands) of changes across both the anchors in .en files, but also the contents of various other language translations. All appear to be related to the "special" characters. I've attached the full output of svn diff after a rebuild of the docs in the 2.4.x branch.


I am curious if you also  get manpage entries changed after the build.
Those would need scrutiny I guess, although in that case the english
is probably not an issue but only other languages where we might find
the wrong codepage or HTML entitites?

This adds a bit of confusion actually because I took a look at README :-)
There haven't been changes under docs/man though the files were all clearly rebuilt.... but yes, as you expect, all of the characters that seem outside of the ASCII range (but are correct in ISO-8859-1) have been HTML encoded in docs/manual.


I also think we should drop xbootclasspath and bail out if java < 11
so people are less likely to waffle between the two flavors.
...And spot-check the output if someone can articulate a problem with
some encoding.

Aye - I was toying with the idea myself of just patching the script to inspect the version of java first and include/omit the Xbootclasspath based on that output. Given that I am just not terribly familiar with the details here, I was going to bring this up for conversation after T&R was done. But... this huge number of changes threw a wrench into the machinery so I wanted to ask what the expected behavior is.


That said... what is the expectation? The README next to build.sh is ambiguous about what we *want* to happen:

> ### UTF-8 vs. XML entities in foo.html.en
>
> Old JDK's happily put UTF-8 bytes into ISO8859-1 english files which seems wrong.
> Newer JDK's (w/o -Xbootclasspath? in build.sh?) will replace them with XML entities.
>
> Impact: XML entities break manpages (if checked in)

It seems that I should commit the changes because docs/manual has changed and docs/man has not? I also ask because if we have half the devs using newer JDKs that implement "expected" behavior, and half the devs using JDKs implementing the old behavior, we'll have constant waffling back and forth of committed files between the two formats. This will leave SVN history full of noise (though, I guess that doesn't matter much for the generated files?)

I'm happy to update build.sh with whatever we decide is "correct", but for now I want to move forward with the planned T&R once I understand what "correct" is.






Index: manual/vhosts/name-based.html.en
===================================================================
--- manual/vhosts/name-based.html.en    (revision 1880272)
+++ manual/vhosts/name-based.html.en    (working copy)
@@ -25,10 +25,10 @@
 <div class="toplang">
 <p><span>Available Languages: </span><a href="../de/vhosts/name-based.html" hreflang="de" rel="alternate" title="Deutsch">&nbsp;de&nbsp;</a> |
 <a href="../en/vhosts/name-based.html" title="English">&nbsp;en&nbsp;</a> |
-<a href="../fr/vhosts/name-based.html" hreflang="fr" rel="alternate" title="Fran□ais">&nbsp;fr&nbsp;</a> |
+<a href="../fr/vhosts/name-based.html" hreflang="fr" rel="alternate" title="Fran&#231;ais">&nbsp;fr&nbsp;</a> |

<snip>


Apologies if this was discussed already - I only stumbled upon it as I tried to T&R just now.

--
Daniel Ruggeri




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

changes.diff.gz (442K) Download Attachment