Welcome to Dream.In.Code
Become a Java Expert!

Join 149,486 Java Programmers for FREE! Get instant access to thousands of Java experts, tutorials, code snippets, and more! There are 1,345 people online right now. Registration is fast and FREE... Join Now!




Autogenerating text based on already analyzed text

 
Reply to this topicStart new topic

Autogenerating text based on already analyzed text

KevinBT
21 Apr, 2007 - 05:19 PM
Post #1

New D.I.C Head
*

Joined: 21 Apr, 2007
Posts: 2


My Contributions
Hi, all....

I'm posting here to see if anybody might be able to give me a push in the right direction. I have created an application that reads a txt-file and analyzes the statistics of character-combinations within it. I store these statistics in a hashMap; with the character-combo as a key, and the number of times each key occurs as value. So if a file contains the text: "dentist" the hashMap stores de, en, nt, ti, st as keys and sets the value of each to 1.

I'm not quite sure how I'm gonna tackle this task, and would really appreciate any ideas you might have.
And simpler is better, of course wink2.gif

Thanks smile.gif

EDIT: I now have the application working....It correctly stores these character-combinations as keys in the hashMap and updates the number of occurences correctly. What I now need some tips on, is how can I use these statistics to generate a new text. I want the application to automatically generate text that resembles the language in the analyzed text(s). So if I analyze a text in say, german; the generated text should kind of look like german, because it is generated based on the statistical probability that a certain combination of characters will occur. Any ideas as to how I can accomplish this?

This post has been edited by KevinBT: 22 Apr, 2007 - 04:38 AM
User is offlineProfile CardPM
+Quote Post

vasdueva
RE: Autogenerating Text Based On Already Analyzed Text
21 Apr, 2007 - 05:46 PM
Post #2

D.I.C Head
**

Joined: 3 Apr, 2007
Posts: 69


My Contributions
Are you always starting with the first two characters, next two.... etc. Do you count spaces? If not try reading the file two characters at a time, in every iteration of the loop, make another loop nested within that runs through every character combo already stored, compares it and appropriately changes the count.

Oh and I suppose spaces make no difference, I was just curious.

This post has been edited by vasdueva: 21 Apr, 2007 - 05:52 PM
User is offlineProfile CardPM
+Quote Post

KevinBT
RE: Autogenerating Text Based On Already Analyzed Text
22 Apr, 2007 - 02:52 AM
Post #3

New D.I.C Head
*

Joined: 21 Apr, 2007
Posts: 2


My Contributions
Well, when the program runs, the user has to choose how many characters should be in the combination. So it the user chooses 5, and he analyzes the text "ABCDEFGHIJKLMNOP"the combinations that are stored in the hashMap as keys will be:

ABCDE
BCDEF
CDEFG
DEFGH
EFGHI
FGHIJ
GHIJK
HIJKL
IJKLM
JKLMN
KLMNO
LMNOP


So the analyzer only moves 1 character forward each time, no matter how many characters there are in each combination. And spaces are also counted as any other character. Obviously the text needs to be pretty long, and the user would probably be better off choosing to use 3-letter combinations, to make the generated text resemble the language that was analyzed.



See EDIT in first post.....

This post has been edited by KevinBT: 22 Apr, 2007 - 04:39 AM
User is offlineProfile CardPM
+Quote Post

vasdueva
RE: Autogenerating Text Based On Already Analyzed Text
22 Apr, 2007 - 11:29 AM
Post #4

D.I.C Head
**

Joined: 3 Apr, 2007
Posts: 69


My Contributions
I suppose you could just randomly call the different combinations(weighted by how often it occurs)

Lets say you have 2 different combinations, AB and CD. If AB occurs 25% of the time and CD occurs 75%. Generate random numbers everytime you wish to generate a combo,shoot the text to file, repeat. I don't imagine it would look all to much like real text, but perhaps the more frequent combinations would end up with corresponding partners.

The statistical occurrence of the combinations would be very close to the original text, wether or not they line up to form real words.... over the course of a longer file you would end up with a number of real words.

Other then that... donno.
User is offlineProfile CardPM
+Quote Post

Fast ReplyReply to this topicStart new topic
Time is now: 1/7/09 04:54PM

Be Social

Dream.In.Code RSS Feed Dream.In.Code LinkedIn Group Follow Us On Twitter

Live Java Help!

Java Tutorials

Reference Sheets

Java Snippets

DIC Chatroom

Bye Bye Ads

Monthly Drawing

Thumb Drive

Top Contributors

Top 10 Kudos This Month