Welcome to Dream.In.Code
Getting C# Help is Easy!

Join 132,683 C# Programmers for FREE! Get instant access to thousands of C# experts, tutorials, code snippets, and more! There are 1,223 people online right now. Registration is fast and FREE... Join Now!




subtitle

 
Reply to this topicStart new topic

subtitle, encoding problem

Terro_Girl
post 27 May, 2008 - 09:25 AM
Post #1


New D.I.C Head

*
Joined: 27 May, 2008
Posts: 1

hello everyone .. i'm trying to modify a text in wich i want to replce some special characters with usual ones , but all i get are some other special characters (squares , question marks..)
my code looks like this :
CODE
  

void ApplyToFiles(string target)
        {
            string[] fnames;
            fnames = Directory.GetFiles(".",target);
            foreach (string fname in fnames)
            {
                FileInfo fi = new FileInfo(fname);
              
                StreamReader sr = null;
                StreamWriter sw = null;
                try
                {
                    sr = new StreamReader(new FileStream(fname, FileMode.Open, FileAccess.Read), Encoding.UTF8);
                    sw = new StreamWriter(new FileStream("NEW" + fi.Name, FileMode.Create), Encoding.UTF8);
                    Console.SetOut(sw);
                    
                    String line = "";
                    while ((line = sr.ReadLine()) != null)
                    {
                        string nLine = Regex.Replace(line, "º", "s");
                        nLine = Regex.Replace(line, "ª", "S");

                        Console.Out.WriteLine(nLine);
                    }
                }
                finally
                {
                    sr.Close();
                   Console.Out.Close();

                }



the code works only on normal characters (for example , it replaces "a" with "b") .. i've tryed all the possible encodings but yet no result .. any ideeas ?
User is offlineProfile CardPM

Go to the top of the page

crcapps
post 28 May, 2008 - 12:01 PM
Post #2


D.I.C Head

**
Joined: 13 May, 2008
Posts: 53


My Contributions


I am guessing that you will need to use the unicode values of the character in question, rather than the character itself.

QUOTE(Terro_Girl @ 27 May, 2008 - 10:25 AM) *

hello everyone .. i'm trying to modify a text in wich i want to replce some special characters with usual ones , but all i get are some other special characters (squares , question marks..)
my code looks like this :
CODE
  

void ApplyToFiles(string target)
        {
            string[] fnames;
            fnames = Directory.GetFiles(".",target);
            foreach (string fname in fnames)
            {
                FileInfo fi = new FileInfo(fname);
              
                StreamReader sr = null;
                StreamWriter sw = null;
                try
                {
                    sr = new StreamReader(new FileStream(fname, FileMode.Open, FileAccess.Read), Encoding.UTF8);
                    sw = new StreamWriter(new FileStream("NEW" + fi.Name, FileMode.Create), Encoding.UTF8);
                    Console.SetOut(sw);
                    
                    String line = "";
                    while ((line = sr.ReadLine()) != null)
                    {
                        string nLine = Regex.Replace(line, "º", "s");
                        nLine = Regex.Replace(line, "ª", "S");

                        Console.Out.WriteLine(nLine);
                    }
                }
                finally
                {
                    sr.Close();
                   Console.Out.Close();

                }



the code works only on normal characters (for example , it replaces "a" with "b") .. i've tryed all the possible encodings but yet no result .. any ideeas ?

User is offlineProfile CardPM

Go to the top of the page

Martyr2
post 28 May, 2008 - 10:20 PM
Post #3


Programming Theoretician

Group Icon
Joined: 18 Apr, 2007
Posts: 5,062



Thanked 175 times

Expert In: C/C++, Java, VB, VB.NET, C#, PHP, Web Development, HTML & CSS, Javascript

My Contributions


crcapps is on the right track. You will have to use the unicode value or its class as part of the regular expression pattern like so...

The symbol "¶" which represents a paragraph in formal writing is the value "182" or "00B6" in hexidecimal format. To find these character values you can look in any unicode chart which you can find at a site like Unicode.org. Once you find the hex value, you can specify the value in the regular expression as the format "\u00B6". Here is an example of how this would look for a classic string...

csharp

// Create our regular expression object with the unicode character 00B6 which
// is the value for a paragraph symbol
Regex r = new Regex("\u00B6");

// Using the regular expression replace function, we provide our string and its
// replacement. This results in the location of the paragraph character and replacing
// it with the letter "s"
MessageBox.Show(r.Replace("te¶t","s"));


Our example locates the paragraph character and replaces it with the letter "s". To replace various characters like this you could construct an array of unicode chars, or if the characters belong to a certain set like arabic you can use the "\p" escape sequence to outline the set. Read more about that at the link below...

Regular Expressions Tutorial - Unicode Characters

Hope that helps you solve the problem. If you enjoyed this answer, please be sure to try out our new experimental thumbs up feature located at the bottom of this posting. Thanks! smile.gif
User is offlineProfile CardPM

Go to the top of the page

Fast ReplyReply to this topicStart new topic
Time is now: 11/23/08 06:50AM

Live C# Help!

C# Tutorials

Reference Sheets

C# Snippets

Bye Bye Ads

Free DIC T-Shirt

T-Shirt Example

Related Sites

Monthly Drawing

Thumb Drive

Partners

Top Contributors

Top 10 Kudos This Month