1
What could cause the calculated field serial.InStockCode to have stray data?
Question asked by Stan Paulauskas - 4/6/2022 at 8:06 AM
Answered
I'm reviewing the data contained in the Serial table using Python and the ODBC driver. From the metadata table I expect that serial.InStockCode would take the following values: 

if( (InvNum = 0) and (SONum = 0) then StockCode else NULL)

In the cases where this evaluates to StockCode, I'm seeing additional characters insert into the record. The addition of the characters causes the StockCode length to be truncated. Here is one example record. 
I've tried to format the data in such a way that the hidden characters can be seen. 

I expected this field to just contain a StockCode but that doesn't seem to be the case. 

Here's a minimum working example of the code.

conn = pyodbc.connect("DSN=Atrex-RO")
atrex = conn.cursor()
statement = "SELECT * FROM serial WHERE InStockCode is not null"
atrex.execute(statement)

columns = [x[0] for x in atrex.description]
for row in atrex.fetchall():
    row = {k: v for k, v in zip(columns, row)}
    print(row)
    for k, v in row.items():
        if k in ('InStockCode', 'StockCode'):
            print(f"{k}: {v}")
            print("\tEncoded as UTF-8: ", v.encode())
            print("\tCharacters as Unicode Numbers: ", [ord(c) for c in v])
atrex.close()

4 Replies

Reply to Thread
0
Terry Swiers Replied
Employee Post
Don't know what to tell you on this one, but I suspect that this is something python specific.  I brute forced the following php code to test this using the current ODBC driver for Atrex....

';
  }

$Connection = odbc_connect('Test', 'administrator', 'EDBDefault');
if ($Connection == false) { die; }

$query = "select * from serial where InStockCode is not null and stockcode = 'ASY-PREAMP-1.0' range 1 to 1";
$recordset = odbc_exec($Connection, $query);

$stockcode = odbc_result($recordset, 'stockcode');
$instockcode = odbc_result($recordset, 'instockcode');

DumpString('StockCode', $stockcode);
DumpString('InStockCode', $instockcode);
if ($Connection != false) { odbc_close($Connection); }

I get identical results for both the StockCode and InStockCode fields:

StockCode: ASY-PREAMP-1.0 [ 65 83 89 45 80 82 69 65 77 80 45 49 46 48 ]
InStockCode: ASY-PREAMP-1.0 [ 65 83 89 45 80 82 69 65 77 80 45 49 46 48 ]
So at least as far as the ODBC driver and PHP are concerned, both fields return the same value with my test data.

0
Stan Paulauskas Replied
What's strange is that I'm only seeing the issue on this specific field. All other string fields are formatted without any problems at all. 

I've executed your php script (with the exception of DumpString) against our data and see no error. I've also used Microstrategy to pull the data for this table and again see no issue. This certainly seems to be an issue with how Python's dealing with this specific field.
0
Stan Paulauskas Replied
It seems that for some reason, python is treating the serial.InStockCode field as a SQL_WCHAR type. All the other string fields on the table pull back as a SQL_CHAR type. By default, it uses UTF-16LE to decode these data. This is causing the introduction of the special characters. 

From the PyODBC documentation this suggests the ODBC driver is delivering the data with the wrong format.  

When a buffer is being read, the driver will tell pyodbc whether it is a SQL_CHAR or SQL_WCHAR buffer. However, pyodbc can request the data be converted to either of these formats. Most of the time this parameter should not be supplied and the same type will be used, but this can be useful in the case where the driver reports SQL_WCHAR data but the data is actually in UTF-8.
 Attempting to override the SQL_WCHAR behavior with UTF-8 encoding leads to a UnicodeDecodeError: 

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 2: invalid start byte
Now, why python is sensitive to this and php/microstrategy are not is still a mystery to me.
0
Stan Paulauskas Replied
Marked As Answer
TL;DR: This is probably a bug with pyodbc in relationship to the Elevate DB ODBC driver. 

I've been able to side-step the problem by using the pypyodbc library instead of pyodbc. It seems that the two handle the ODBC driver's SQL_WCHAR data differently. This also explains why php and Microstrategy worked without issue.

Reply to Thread