9 Replies - 471 Views - Last Post: 09 October 2018 - 01:27 AM

#1 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6823
  • View blog
  • Posts: 28,272
  • Joined: 12-December 12

utf-8? encoding issue

Posted 03 October 2018 - 11:53 AM

I have a varbinary field which is base64 encoded data.

The data is returned by a stored procedure like this:

CONVERT(varchar(Max), CV.[Data], 0) 	 AS CompendiumHTML,


and displays as:

<html lang="en"><head><title></title>  <meta charset="UTF-8"></head>  <body contenteditable="true" style="FONT-FAMILY: arial" designmode="on">  <p>Лорем ипсум долор сит амет, ет про долорем фацилис сцаевола, солум фуиссет тациматес яуи ех, реяуе постеа репримияуе вис ад. Еи вис иуварет алиенум волуптариа, сале еуисмод цонсеяуат но дуо, ат алияуид сцрипторем при. Ид мел нонумы тинцидунт. Нисл плацерат ат меа, ест аперири адиписцинг ех, елецтрам реформиданс еа хас. Ут нец симул дебитис еффициантур.</p>  <p>  <p>Либер яуаерендум ан вим, яуи утинам муциус те, яуандо алтерум ад пер. Сед дицта аппетере вулпутате ин, инцоррупте садипсцинг цу еос. Алиа глориатур цум цу, промпта праесент ех ест. Еи нусяуам елаборарет волуптатибус дуо, адхуц доцтус ид сед.</p></body></html>


If I use an online base64 converter (https://www.base64decode.org/) I get the correct output:

<html lang="en"><head><title></title>
<meta charset="UTF-8"></head>
<body contenteditable="true" style="FONT-FAMILY: arial" designmode="on">
<p>Лорем ипсум долор сит амет, ет про долорем фацилис сцаевола, солум фуиссет тациматес яуи ех, реяуе постеа репримияуе вис ад. Еи вис иуварет алиенум волуптариа, сале еуисмод цонсеяуат но дуо, ат алияуид сцрипторем при. Ид мел нонумы тинцидунт. Нисл плацерат ат меа, ест аперири адиписцинг ех, елецтрам реформиданс еа хас. Ут нец симул дебитис еффициантур.</p>
<p>
<p>Либер яуаерендум ан вим, яуи утинам муциус те, яуандо алтерум ад пер. Сед дицта аппетере вулпутате ин, инцоррупте садипсцинг цу еос. Алиа глориатур цум цу, промпта праесент ех ест. Еи нусяуам елаборарет волуптатибус дуо, адхуц доцтус ид сед.</p></body></html>


What step am I missing in my stored procedure do get the correct output please?

Here is the original varbinary data:

0x3C68746D6C206C616E673D22656E223E3C686561643E3C7469746C653E3C2F7469746C653E0D0A3C6D65746120636861727365743D225554462D38223E3C2F686561643E0D0A3C626F647920636F6E74656E746564697461626C653D227472756522207374796C653D22464F4E542D46414D494C593A20617269616C222064657369676E6D6F64653D226F6E223E0D0A3C703ED09BD0BED180D0B5D0BC20D0B8D0BFD181D183D0BC20D0B4D0BED0BBD0BED18020D181D0B8D18220D0B0D0BCD0B5D1822C20D0B5D18220D0BFD180D0BE20D0B4D0BED0BBD0BED180D0B5D0BC20D184D0B0D186D0B8D0BBD0B8D18120D181D186D0B0D0B5D0B2D0BED0BBD0B02C20D181D0BED0BBD183D0BC20D184D183D0B8D181D181D0B5D18220D182D0B0D186D0B8D0BCD0B0D182D0B5D18120D18FD183D0B820D0B5D1852C20D180D0B5D18FD183D0B520D0BFD0BED181D182D0B5D0B020D180D0B5D0BFD180D0B8D0BCD0B8D18FD183D0B520D0B2D0B8D18120D0B0D0B42E20D095D0B820D0B2D0B8D18120D0B8D183D0B2D0B0D180D0B5D18220D0B0D0BBD0B8D0B5D0BDD183D0BC20D0B2D0BED0BBD183D0BFD182D0B0D180D0B8D0B02C20D181D0B0D0BBD0B520D0B5D183D0B8D181D0BCD0BED0B420D186D0BED0BDD181D0B5D18FD183D0B0D18220D0BDD0BE20D0B4D183D0BE2C20D0B0D18220D0B0D0BBD0B8D18FD183D0B8D0B420D181D186D180D0B8D0BFD182D0BED180D0B5D0BC20D0BFD180D0B82E20D098D0B420D0BCD0B5D0BB20D0BDD0BED0BDD183D0BCD18B20D182D0B8D0BDD186D0B8D0B4D183D0BDD1822E20D09DD0B8D181D0BB20D0BFD0BBD0B0D186D0B5D180D0B0D18220D0B0D18220D0BCD0B5D0B02C20D0B5D181D18220D0B0D0BFD0B5D180D0B8D180D0B820D0B0D0B4D0B8D0BFD0B8D181D186D0B8D0BDD0B320D0B5D1852C20D0B5D0BBD0B5D186D182D180D0B0D0BC20D180D0B5D184D0BED180D0BCD0B8D0B4D0B0D0BDD18120D0B5D0B020D185D0B0D1812E20D0A3D18220D0BDD0B5D18620D181D0B8D0BCD183D0BB20D0B4D0B5D0B1D0B8D182D0B8D18120D0B5D184D184D0B8D186D0B8D0B0D0BDD182D183D1802E3C2F703E0D0A3C703E0D0A3C703ED09BD0B8D0B1D0B5D18020D18FD183D0B0D0B5D180D0B5D0BDD0B4D183D0BC20D0B0D0BD20D0B2D0B8D0BC2C20D18FD183D0B820D183D182D0B8D0BDD0B0D0BC20D0BCD183D186D0B8D183D18120D182D0B52C20D18FD183D0B0D0BDD0B4D0BE20D0B0D0BBD182D0B5D180D183D0BC20D0B0D0B420D0BFD0B5D1802E20D0A1D0B5D0B420D0B4D0B8D186D182D0B020D0B0D0BFD0BFD0B5D182D0B5D180D0B520D0B2D183D0BBD0BFD183D182D0B0D182D0B520D0B8D0BD2C20D0B8D0BDD186D0BED180D180D183D0BFD182D0B520D181D0B0D0B4D0B8D0BFD181D186D0B8D0BDD0B320D186D18320D0B5D0BED1812E20D090D0BBD0B8D0B020D0B3D0BBD0BED180D0B8D0B0D182D183D18020D186D183D0BC20D186D1832C20D0BFD180D0BED0BCD0BFD182D0B020D0BFD180D0B0D0B5D181D0B5D0BDD18220D0B5D18520D0B5D181D1822E20D095D0B820D0BDD183D181D18FD183D0B0D0BC20D0B5D0BBD0B0D0B1D0BED180D0B0D180D0B5D18220D0B2D0BED0BBD183D0BFD182D0B0D182D0B8D0B1D183D18120D0B4D183D0BE2C20D0B0D0B4D185D183D18620D0B4D0BED186D182D183D18120D0B8D0B420D181D0B5D0B42E3C2F703E3C2F626F64793E3C2F68746D6C3E


Is This A Good Question/Topic? 0
  • +

Replies To: utf-8? encoding issue

#2 modi123_1   User is offline

  • Suitor #2
  • member icon



Reputation: 15356
  • View blog
  • Posts: 61,566
  • Joined: 12-June 08

Re: utf-8? encoding issue

Posted 03 October 2018 - 12:40 PM

Make a CLR procedure and access the .NET functionality?
Was This Post Helpful? 0
  • +
  • -

#3 astonecipher   User is offline

  • Senior Systems Engineer
  • member icon

Reputation: 2996
  • View blog
  • Posts: 11,539
  • Joined: 03-December 12

Re: utf-8? encoding issue

Posted 03 October 2018 - 01:16 PM

Why are you storing the base64 string as varbinary? Just a question.
Was This Post Helpful? 0
  • +
  • -

#4 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6823
  • View blog
  • Posts: 28,272
  • Joined: 12-December 12

Re: utf-8? encoding issue

Posted 04 October 2018 - 01:29 AM

Hello and thanks both.

I need to do something in T-SQL; it wasn't my decision about how it is stored ;)
Was This Post Helpful? 0
  • +
  • -

#5 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6823
  • View blog
  • Posts: 28,272
  • Joined: 12-December 12

Re: utf-8? encoding issue

Posted 04 October 2018 - 01:45 AM

I found a function here and I need to use it like this

SELECT dbo.DecodeUTF8String(CAST(N'' AS XML).value
('xs:base64Binary("2LPZhNin2YUg2KzbjNix2KfZhg==")','VARBINARY(MAX)'))


I'm struggling to embed the data, via a variable, into this expression:

DECLARE @thedata varchar(max)
SET @thedata = CONVERT(varchar(Max),'<html lang="en"><head><title></title>  <meta charset="UTF-8"></head>  <body contenteditable="true" style="FONT-FAMILY: arial" designmode="on">  <p>Лорем ипсум долор сит амет, ет про долорем фацилис сцаевола, солум фуиссет тациматес яуи ех, реяуе постеа репримияуе вис ад. Еи вис иуварет алиенум волуптариа, сале еуисмод цонсеяуат но дуо, ат алияуид сцрипторем при. Ид мел нонумы тинцидунт. Нисл плацерат ат меа, ест аперири адиписцинг ех, елецтрам реформиданс еа хас. Ут нец симул дебитис еффициантур.</p>  <p>  <p>Либер яуаерендум ан вим, яуи утинам муциус те, яуандо алтерум ад пер. Сед дицта аппетере вулпутате ин, инцоррупте садипсцинг цу еос. Алиа глориатур цум цу, промпта праесент ех ест. Еи нусяуам елаборарет волуптатибус дуо, адхуц доцтус ид сед.</p></body></html>',0)


select dbo.fnDecodeUTF8String(CAST(N'' AS XML).value ('xs:base64Binary("[sql:variable("@thedata")]")','VARBINARY(MAX)')) 

Was This Post Helpful? 0
  • +
  • -

#6 modi123_1   User is offline

  • Suitor #2
  • member icon



Reputation: 15356
  • View blog
  • Posts: 61,566
  • Joined: 12-June 08

Re: utf-8? encoding issue

Posted 04 October 2018 - 09:36 AM

I tried poking at it with the collate option.. no dice.
declare @foo varbinary(max)
set @foo = 0x3C68746D6C206C616E673D22656E223E3C686561643E3C7469746C653E3C2F7469746C653E0D0A3C6D65746120636861727365743D225554462D38223E3C2F686561643E0D0A3C626F647920636F6E74656E746564697461626C653D227472756522207374796C653D22464F4E542D46414D494C593A20617269616C222064657369676E6D6F64653D226F6E223E0D0A3C703ED09BD0BED180D0B5D0BC20D0B8D0BFD181D183D0BC20D0B4D0BED0BBD0BED18020D181D0B8D18220D0B0D0BCD0B5D1822C20D0B5D18220D0BFD180D0BE20D0B4D0BED0BBD0BED180D0B5D0BC20D184D0B0D186D0B8D0BBD0B8D18120D181D186D0B0D0B5D0B2D0BED0BBD0B02C20D181D0BED0BBD183D0BC20D184D183D0B8D181D181D0B5D18220D182D0B0D186D0B8D0BCD0B0D182D0B5D18120D18FD183D0B820D0B5D1852C20D180D0B5D18FD183D0B520D0BFD0BED181D182D0B5D0B020D180D0B5D0BFD180D0B8D0BCD0B8D18FD183D0B520D0B2D0B8D18120D0B0D0B42E20D095D0B820D0B2D0B8D18120D0B8D183D0B2D0B0D180D0B5D18220D0B0D0BBD0B8D0B5D0BDD183D0BC20D0B2D0BED0BBD183D0BFD182D0B0D180D0B8D0B02C20D181D0B0D0BBD0B520D0B5D183D0B8D181D0BCD0BED0B420D186D0BED0BDD181D0B5D18FD183D0B0D18220D0BDD0BE20D0B4D183D0BE2C20D0B0D18220D0B0D0BBD0B8D18FD183D0B8D0B420D181D186D180D0B8D0BFD182D0BED180D0B5D0BC20D0BFD180D0B82E20D098D0B420D0BCD0B5D0BB20D0BDD0BED0BDD183D0BCD18B20D182D0B8D0BDD186D0B8D0B4D183D0BDD1822E20D09DD0B8D181D0BB20D0BFD0BBD0B0D186D0B5D180D0B0D18220D0B0D18220D0BCD0B5D0B02C20D0B5D181D18220D0B0D0BFD0B5D180D0B8D180D0B820D0B0D0B4D0B8D0BFD0B8D181D186D0B8D0BDD0B320D0B5D1852C20D0B5D0BBD0B5D186D182D180D0B0D0BC20D180D0B5D184D0BED180D0BCD0B8D0B4D0B0D0BDD18120D0B5D0B020D185D0B0D1812E20D0A3D18220D0BDD0B5D18620D181D0B8D0BCD183D0BB20D0B4D0B5D0B1D0B8D182D0B8D18120D0B5D184D184D0B8D186D0B8D0B0D0BDD182D183D1802E3C2F703E0D0A3C703E0D0A3C703ED09BD0B8D0B1D0B5D18020D18FD183D0B0D0B5D180D0B5D0BDD0B4D183D0BC20D0B0D0BD20D0B2D0B8D0BC2C20D18FD183D0B820D183D182D0B8D0BDD0B0D0BC20D0BCD183D186D0B8D183D18120D182D0B52C20D18FD183D0B0D0BDD0B4D0BE20D0B0D0BBD182D0B5D180D183D0BC20D0B0D0B420D0BFD0B5D1802E20D0A1D0B5D0B420D0B4D0B8D186D182D0B020D0B0D0BFD0BFD0B5D182D0B5D180D0B520D0B2D183D0BBD0BFD183D182D0B0D182D0B520D0B8D0BD2C20D0B8D0BDD186D0BED180D180D183D0BFD182D0B520D181D0B0D0B4D0B8D0BFD181D186D0B8D0BDD0B320D186D18320D0B5D0BED1812E20D090D0BBD0B8D0B020D0B3D0BBD0BED180D0B8D0B0D182D183D18020D186D183D0BC20D186D1832C20D0BFD180D0BED0BCD0BFD182D0B020D0BFD180D0B0D0B5D181D0B5D0BDD18220D0B5D18520D0B5D181D1822E20D095D0B820D0BDD183D181D18FD183D0B0D0BC20D0B5D0BBD0B0D0B1D0BED180D0B0D180D0B5D18220D0B2D0BED0BBD183D0BFD182D0B0D182D0B8D0B1D183D18120D0B4D183D0BE2C20D0B0D0B4D185D183D18620D0B4D0BED186D182D183D18120D0B8D0B420D181D0B5D0B42E3C2F703E3C2F626F64793E3C2F68746D6C3E

select CONVERT(varchar(Max),@foo, 0) 

select CONVERT(varchar(Max),@foo, 0)  COLLATE SQL_Latin1_General_CP1253_CI_AI

 select CONVERT(varchar(Max),@foo, 0)  COLLATE SQL_Latin1_General_Cp850_BIN

  select CONVERT(varchar(Max),@foo, 0)  COLLATE SQL_Latin1_General_CP1_CI_AS

 
 -- select CONVERT(varchar(Max),@foo, 0)  COLLATE Kazakh_100_CI_AS 
 --select CONVERT(varchar(Max),@foo, 0)  COLLATE Cyrillic_General_CI_AS 
  
 
  select CONVERT(varchar(Max),@foo, 0)  COLLATE Cyrillic_General_BIN  

Was This Post Helpful? 1
  • +
  • -

#7 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6823
  • View blog
  • Posts: 28,272
  • Joined: 12-December 12

Re: utf-8? encoding issue

Posted 04 October 2018 - 10:03 AM

Thank you.

(It could be any language I believe.)

I got it working by stripping back on the conversions.

Just

SET @thedata = '... the content with funny characters ...'

I put it into a temp table (as varchar(max)) for testing, then

select dbo.fnDecodeUTF8String([data]) from #temp

I suppose it is letting SQL Server do some implicit conversions. Less is more sometimes ;)

So, for my stored procedure, I just need to call on the function before providing the final output.
Was This Post Helpful? 0
  • +
  • -

#8 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6823
  • View blog
  • Posts: 28,272
  • Joined: 12-December 12

Re: utf-8? encoding issue

Posted 08 October 2018 - 09:04 AM

Is there a way to check if the field contains foreign characters as above, needing to be converted? If the field is plain ascii then I wouldn't need to call the expensive conversion function.
Was This Post Helpful? 0
  • +
  • -

#9 modi123_1   User is offline

  • Suitor #2
  • member icon



Reputation: 15356
  • View blog
  • Posts: 61,566
  • Joined: 12-June 08

Re: utf-8? encoding issue

Posted 08 October 2018 - 10:37 AM

Outside of some silly regex.. it's rough. I was kicking the tires of this issue off and on all weekend without much luck given the parameters.
Was This Post Helpful? 1
  • +
  • -

#10 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6823
  • View blog
  • Posts: 28,272
  • Joined: 12-December 12

Re: utf-8? encoding issue

Posted 09 October 2018 - 01:27 AM

Thank you. I hope it wasn't too boring for you ;)

There is a second function below the one I used at the stackoverflow link which seems to be more efficient. This might prove acceptable.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1