Welcome to the new Parasoft forums! We hope you will enjoy the site and try out some of the new features, like sharing an idea you may have for one of our products or following a category.

Quality tasks from Diff tool and special characters (UTF8)

dgoedh
dgoedh Posts: 63

Hi,

I am encountering problems with the Quality tasks from Diff tool (XML mode, literal XML) parsing special characters. E.g. DOSSIERNAAM!@#$%^&amp;*()_+?{}|[]<>,./ gets in quality task converted to: Changed element text content value from the expected "DOSSIERNAAM?????????????????? ????X" to "DOSSIERNAAM������������������ ����X" in XPath ....

Why is it displaying questionmark-diamonds. In the Differences tab these special characters are parsed/resolved correctly. Thus only the Quality task cannot cope with them.

Overall encoding is set to UTF-8 in Preferences/Misc

We are using SOAtest 2021.1 (10.5.1.202112172109)

Thanks in advance.

Regards.

Daniel Goedhart

Tagged:

Answers

  • benken_parasoft
    benken_parasoft Posts: 1,306 ✭✭✭

    Do you have steps to reproduce this issue?

    I just created a simple Diff tool in XML mode.

    I set the diff control to this:

    <foo>DOSSIERNAAM!@#$%^&amp;*()_+?{}|[]</foo>
    

    Then I set the diff input to this (one character difference):

    <foo>DOSSIERNAAM!@#$%^&amp;*()_+?{}|[]a</foo>
    

    The quality task shows as expected:

    Changed element text content value from the expected "DOSSIERNAAM!@#$%^&*()_+?{}|[]" to "DOSSIERNAAM!@#$%^&*()_+?{}|[]a" in XPath /foo/text()
    
  • dgoedh
    dgoedh Posts: 63

    First of all thanks Benken for looking into this. Apologizes for obviously including the wrong snapshot of special characters.
    It should be this one:
    KLANTCATEGORIEÇüéâäàåçêëèïîìÄÅ;000000005;DOSSIERNAAMÉæÆôöòûùÿÖÜø£Øáíóú ªº¿®X;

    As can be seen it has been parsed in via the SOAtest CSV client into the Diff Tool, using XML,literal XML validation. Originally the CSV is generated on a Linux Red Hat app server and transferred via the SFTP client to the local file system.
    In the Differences tab these special chars are resolved as question marks, like stated before. In the quality task it gets a bit confused and you see the black diamonds displayed. In the Differences these special character lines are not highlighted as being a difference.

    Thanks in advance.

    Regards,

    Daniel

  • benken_parasoft
    benken_parasoft Posts: 1,306 ✭✭✭
    edited April 2022

    I'm still unable to reproduce the issue you describe. I've tried this in both 2021.1 and 2021.2.
    What OS are you using? Can you also describe how your diff tool is configured (mode, diff engine, etc.)? What does the control source look like (literal, file, etc.)? What is the input to the diff tool? Is the diff tool chained to the output of another test? Is the input a traffic message? If it is a HTTP traffic message, what character encoding is in the HTTP header and in the XML prolog?

    As can be seen it has been parsed in via the SOAtest CSV client into the Diff Tool

    I also created a CSV Client and echoed the text as UTF-8. I can chain a Diff to "Response Payload Converted to XML" and diff the XML as expected. If I modify the diff control then I see the expected quality task with no characters corrupted.

    Given that you see corrupted characters for both the expected and actual value in the error message, this suggests characters for both the diff input and diff control source were read using the wrong encoding. If your control source is a file on disk then it will generally be decoded using the character set you specify in preferences under Parasoft > Misc, unless there is something else in the file to indicate the character set like a unicode byte ordering mark. If your Diff input is a HTTP traffic message then it is possible the character set in the response header is incorrect if the characters are being decoded incorrectly.

  • dgoedh
    dgoedh Posts: 63

    Hi, We are running Parasoft SOAtest on Windows Server 16 (1607). In the Preferences/Misc we use UTF-8 encoding. The csv file originates from a remote Linux (red hat) machine (I tried AIX as well, our previous remote, but this gives the same results).

    • On the remote, the original tabs have been replaced by semicolons, using a perl routine: perl -p -i -e 's/[\t]/;/g' *.csv
    • And also a sort routine ((head -n 1 bd_hlp_${Peil_Jaar}*_1_euro.csv && tail -n +2 bd_hlp${Peil_Jaar}*_1_euro.csv | sort -t";" -n -k22) > Sort_Euro${Run}.csv)
    • We use the SFTP-client to transfer the CSV file from remote to local filesystem.
    • Subsequently the Request Payload Convert CSV to XML client is used (Format CSV. from native to XML. seperator ; quote ").
    • XML -> Diff is hooked up with the following settings: Diff Mode: XML.
    • Regression Control: Mode: Literal XML. Regression Control Source: File.
    • Ignored Differences: Some timestamps xpaths have been listed in this section.
    • Options: Engine: XMLUnit (default). Only Ignore element order and Ignore difference in comments are checked.
    • File: Windows CR LF UTF-8 xml file on the local windows machine.

    Traffic Viewer gives: KLANTCATEGORIE����������������;DOSSIERNAAM������������������ ����X;However the question marks are displayed as squares within SOAtest.

    At last I updated the reference xml file to contain the special characters as well, instead of the question marks that have been copied in from the Differences. Then quality task gives something like this:

    Changed element text content value from the expected "KLANTCATEGORIEÇüéâäàåçêëèïîìÄÅ" to "KLANTCATEGORIE����������������" in XPath

    Hope this is enough information for you to reproduce this issue on your side.
    Thanks in advance.

    Daniel

  • benken_parasoft
    benken_parasoft Posts: 1,306 ✭✭✭
    edited April 2022

    Your CSV is being corrupted or either written or read with the wrong character encoding somewhere. In your specific situation, is it difficult to say exactly where, if it happened on the server or when the file was downloaded over sftp. If SOAtest is reading the file as UTF-8 at a certain place then perhaps the file wasn't saved as UTF-8, for example.

  • dgoedh
    dgoedh Posts: 63
    edited April 2022

    Thanks Benken for all your effort. Indeed the 'troublesome'file from the remote has been saved somehow as ANSI instead of UTF-8 while transferring it over SFTP. The other files however, without special characters, get saved as UTF-8. I will consult the developers to look into this.

    Happy Eastern in advance.

    Regards,
    Daniel

  • dgoedh
    dgoedh Posts: 63

    As an addtition, the application on the aix/linux that handles these special characters apparently generates the particular CSV in ANSI format, instead of UTF-8 as far as I can judge with Notepad++. This might have been a known issue for a while and I have notified the developers to look into this.

    Regards,

    Daniel