News:

As a consequence of the forum being updated and repaired, the chatbox has been lost.
However, you can still come say hi on our Discord server!

Main Menu

Text compression

Started by approxies, 30, October, 2014, 01:35:53 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Daddy Poi's Oily Gorillas

#20
Okay. I just want to say that I did finish the compression in C# enough to make my own compression possible... and seems to match with GS's original compression... (I might still make some more minor changes to it, though...) (See mine here http://pastebin.com/WNTPhv2t , and it also puts the data in a ROM. The end-goal is to have the new compressed data put in the same spot as the original data, though.... Which requires a code scanner to detect all the offsets and modify them accordingly... Not that difficult.) (My src data is supposed to be formatted the same way GSToolkit puts the uncompressed data into the ROM, there is a possibility in my changing that in the future, though. At least for compatibility to add more lines of text... when I get there, maybe.. - Speaking of that, I would also like to try an experiment that finds all text indexes in the code... but not sure how well that will go.)
It takes maybe around 0.02~0.03 seconds to compress all text, which is about ~10 times faster than my decompression routine. (Which I plan to optimize later by generating some tables from the char data before decompressing.)

Anyway, I did look at your Haskell program, but unfortunately, I'm pretty new with it, so I might need to do some research... Did you ever find the problem though? (Probably not, since your github wasn't updated?) And how long does it take your program to compress?


---
For my editor, I can see the possibility of having checks to detecting if ROM was modified by:
- Atrius's editor (If master file table doesn't start with 08000000)
  - Since all the text would get decompressed on loading the ROM in the editor, I wouldn't need to fix that area, but I would need to scan for edited sprites, and the Innate Psynergies.
- GSToolkit. (Do checks on the code it inserts.)
  - So we can load the text as already uncompressed... and patch the code to re-allow compression.
- I am not sure, but I might review your Haskell program some time to see if I could make my editor compatible with ROMs modified with it. (My guess is that I wouldn't need to do anything.)
Golden Sun Docs: Broken Seal - The Lost Age - Dark Dawn | Mario Sports Docs: Mario Golf & Mario Tennis | Misc. Docs
Refer to Yoshi's Lighthouse for any M&L hacking needs...

Sometimes I like to compare apples to oranges. (Figuratively) ... They are both fruits, but which one would you eat more? (If taken literally, I'd probably choose apples.)
Maybe it is over-analyzing, but it doesn't mean the information is useless.


The only GS Discord servers with significance are:
Golden Sun Hacking Community
GS Speedrunning
/r/Golden Sun
GS United Nations
Temple of Kraden

Can you believe how small the Golden Sun Community is?

2+2=5 Don't believe me? Those are rounded decimal numbers. Take that, flat earth theorists! :)

approxies

Quote from: Fox on 11, August, 2015, 06:34:20 AM

Anyway, I did look at your Haskell program, but unfortunately, I'm pretty new with it, so I might need to do some research... Did you ever find the problem though? (Probably not, since your github wasn't updated?) And how long does it take your program to compress?


I have found the issue - my compression works better now for a dosen of bytes, than original. The problem was in a way I've put nodes and leafs with the same weights: there are 4 ways to organize trees with the same efficiency. The trick was that each messages is stored with 8-bits alignment (as pointers are in bytes, of course). So, some of ways for tree build is more effective for exact this script, and some are not. I just found the way, which gave me even better result, than Atlus had.

Now I'm writing interface code for compression, I don't have much time, so hope to finish it this week.

For speed, Haskell is pretty slow: on my 1.6 GHz it takes about 3-4 seconds to compress the script, it also takes much more, than your code to decompress. That all due to code nature: it doesn't work with byte arrays or something: it parses binary to higher algebraic abstractions for programmer's conveniency and then takes a bunch of time to manipulate these items. It's very visual, easy to debug and error-proof, but pretty slow. I don't plan to optimize it - it will hurt code clarity.

Daddy Poi's Oily Gorillas

#22
Quoteon my 1.6 GHz
Mine is:
Processor: Intel(R) Core (TM) i5-4200U CPU @ 1.60GHz  2.30 GHz

I also rewritten my decompression routine... It's around ~0.1 secs now. (Rounding figure... it may be somewhat lower.) -As mentioned previously, the difference is that this time I scanned the char tables to generate bitcode data.

(By the way, with Visual Studio, there are two Build Modes.... Release Build works faster than Debug Builds.)

In the original GS2 game, the largeest bitcode is 16 bits... since my newer code has a limit on how long bitcodes can be, I might keep both versions of the decompression for now... Maybe
        public byte[] decompText(byte[] src) { //, int srcInd) { // int srcPos) {
            DateTime c = DateTime.Now;
            int[] bitcode = new int[0x10000];
            byte[] bitLen = new byte[0x10000];
            short[] bitChar = new short[0x10000];
            //Scan char data to generate data for faster decompression than old method.
            int asmpchar = Bits.getInt32(src, 0x38578) - 0x8000000;
            int asmptext = Bits.getInt32(src, 0x385DC) - 0x8000000;
            int chardata = Bits.getInt32(src, asmpchar) - 0x08000000;
            int charpntrs = Bits.getInt32(src, asmpchar + 4) - 0x08000000;
            for (int char1 = 0; char1 < 0x100; char1++) {
                if (charpntrs == asmpchar) { break; }
                if (Bits.getInt16(src, charpntrs) == 0x8000) { charpntrs += 2; continue; }
                int charTree = (chardata + Bits.getInt16(src, charpntrs)) << 3; charpntrs += 2;
                int charSlot = charTree - 12;
                byte bits = 0; int bitC = 0; int entry = (char1 << 8);
                do {
                    while (((src[charTree >> 3] >> (charTree++ & 7)) & 1) == 0) { bits++; }
                    bitChar[entry] = (short)((Bits.getInt16(src, charSlot >> 3) >> (charSlot & 7)) & 0xFFF); charSlot -= 12;
                    bitLen[entry] = bits; if (bits >= 24) { return decompTextOld(src); }
                    bitcode[entry] = bitC;
                    while (((bitC >> (bits - 1)) & 1) == 1) { bits -= 1; bitC ^= 1 << bits; }
                    bitC |= 1 << (bits - 1);
                    entry += 1;
                } while (bits > 0);
            }
            //Console.WriteLine(DateTime.Now - c);
            //c = DateTime.Now;
            int textTree = 0, textLenAddr = 0;
            byte[] des = new byte[0x800000]; int desEntry = 0, desPos = 0xC300;
            for (int srcI = 0; srcI < 12461; srcI++) {
                Bits.setInt32(des, desEntry, desPos - 0xC300); desEntry += 4;
                int srcInd = srcI;
                if ((srcInd & 0xFF) == 0) {
                    textTree = Bits.getInt32(src, asmptext + ((srcInd >> 8) << 3)) - 0x08000000;
                    textLenAddr = Bits.getInt32(src, asmptext + ((srcInd >> 8) << 3) + 4) - 0x08000000;
                } else {
                    int cLen;
                    do {
                        cLen = src[textLenAddr++];
                        textTree += cLen;
                    } while (cLen == 0xFF);
                }
                int initChar = 0, bitnum = 0, val = 0, textTree2 = textTree;
                do {
                    while (bitnum < 24) { val |= (int)src[textTree2++] << bitnum; bitnum += 8; }
                    int entry = initChar << 8;
                    while ((val & ((1 << bitLen[entry]) - 1)) != bitcode[entry]) { entry++; }
                    initChar = bitChar[entry]; val >>= bitLen[entry]; bitnum -= bitLen[entry];
                    des[desPos++] = (byte)initChar;
                    //if (desPos >= 0x10000) { break; }
                } while (initChar != 0);
            }
            Console.WriteLine(DateTime.Now - c + " (Text Decompression)");
            return des;
        }



---
Wasn't really planning to link this at first, but I do have an early version that works... (My open-source program is here... But you can go to bin>Release to find the program. Once it's up, and ROM is loaded, then click Editors button.)
https://drive.google.com/file/d/0B-uU7QfJXcL9Q0FweEhxQTNmOGc/view?usp=sharing (Don't think it includes the rewritten decompression routine shown above, though.)
It's only a "Bug fix" because I gave someone a previous version... and then I fixed the minor bugs afterward.. (The bugs were not in the decompression or compression, though.)
Note that the reason the form takes a second or two to load is not b/c of decompression... It's because of the listbox. -- Otherwise, I likely would have worked on that Find feature sooner... and Listbox items don't update unless you close/open window.)
Basically once you hit "Edit", the text will be compressed to the ROM @ 0x08FA0000. But you still must save from the main form to write it back to your ROM file.
Golden Sun Docs: Broken Seal - The Lost Age - Dark Dawn | Mario Sports Docs: Mario Golf & Mario Tennis | Misc. Docs
Refer to Yoshi's Lighthouse for any M&L hacking needs...

Sometimes I like to compare apples to oranges. (Figuratively) ... They are both fruits, but which one would you eat more? (If taken literally, I'd probably choose apples.)
Maybe it is over-analyzing, but it doesn't mean the information is useless.


The only GS Discord servers with significance are:
Golden Sun Hacking Community
GS Speedrunning
/r/Golden Sun
GS United Nations
Temple of Kraden

Can you believe how small the Golden Sun Community is?

2+2=5 Don't believe me? Those are rounded decimal numbers. Take that, flat earth theorists! :)

approxies

Wow, that looks impressive.
Be warned, that I get unhandled exception when I push Editors.
Check details here.

Daddy Poi's Oily Gorillas

#24
Thanks.
*Tests that particular version through the EXE itself.* - I don't seem to be getting any errors?
Since it is an index out of range, I wonder which ROM you may be using? (Esp. with the foreign text in that link. Russian or something?- Don't know that language.)
GS2 (UE) is the only one with text supported in this version.
Looking back at my code, it looks like I may have map viewer compatibility for other versions. (Including GS1, and the Golf game.) Hope you like the instant map loading speed. :)
I go for English first, obviously. (Haven't really checked for this particular data, but I do know some GS2 ROMs may have some of their data in the same place.) 

---
Anyway, the plan for the editor some day is to support all versions of GS1, GS2, and GS3.. including translations if it's a mostly simple process... and perhaps the same for the Tennis and Golf games...? But these are my hopes, whether I get to them is the bigger question.

---
Looks like I might have done a booby.... If you open a ROM that is incompatible with map loading... (i.e. Tennis game, I think.) then the editor will have an error on loading it again... I guess go to C:\Users\<username>\appdata\Local ... and delete the editor directory (gsmagic) when that happens. - The only thing I store there is the ROM last loaded for convenience. (Well there, because that's where Visual Studio stores the settings.)


---[spoiler]
Thoughts about planning Editor file in ROM and/or as external file... Formatting idea / Work-In-Progress. (Thoughts/Ideas as in... none of this has been implemented yet. But I'd like to come up with an approach that doesn't take a lot of space. With the possibility of editor-related text being an exception, but not sure.)
Header:
ARM instruction (Because instruction at 08000000 may point here.... and this may be to point back to wherever code execution would go...)
Magic Number? (To verify that the pointed location was for the Editor File. Might go before the Arm Instruction??)
Sub-file header:
Identifier/magic number & may include size of this sub-file(?) (32-bit) = Whatever this is determines how/when file is read/used. (All these identifiers from each sub-file may be read before main form pops up.  - I thought this idea would help with re-ordering tables as well. - Oh, and this might just be a number, and not in ASCII. With a possiblity of certain ranges having flags for different stuff/etc.)
Size of table in ROM. (32-bit if 7 bits go unused... otherwise 25-bit, but could be determined by Identifier; Like if size in measured by number of 32-bits in some cases (i.e. code), max 23-bit could do.) = Size so that inserting new tables is simple enough... and when you check the next file, you'll add this to the address. ; Depending on what comes to mind, not sure if this could be combined with Identifier in some cases.
Data stuff. (What is here may depend on the Identifier value. - Some Identifiers may not need any data here.)

Identifier Ideas:
Code = All code gets scanned when things get repointed...
Unsupported = Basically data that is skipped over...
Free = When tables are resized, this helps to know how much free space is available.
Text Editor = May have two data values for main tables of Char and Text strings. (Depends on approach.)
Graphic = Well, there are some uncompressed graphics, at least.
Custom = There may be several custom identifiers... and the data is to list the datatypes/bit-lengths of the data and stuff...
Possible idea for Custom Types: (Not sure if any of these will share the same identifier or not.)
- Plain complex data tables using a constant length per entry. (Think items, abilities, enemies... enemy groups, etc... data)
- Object tables (Where the start of the list is a pointer list... Pretty much implying variable length entries. Sort of like the Master File Table(?) I don't think there are any others in that type of format, though?)
- When pointers need updating for repointering stuff?
- Simple stuff like when there is one data type used? / Or one kind of bitlength (aka: each entry could hold one piece of data that is either 8-bit, 16-bit, or 32-bit.)... (I mean, so simple that we don't need any data in the data portion of this editor subfile? Dunno, because labeling...)
Etc.


Most of the above should be taken with a grain of salt, I guess.... For all I know, a different strategy could be taken into account... i.e. I could look into storing only table size data, and everything else be done in the editor code.... with the expectation that the only thing needed is table resizing, and no compatibility with adding additional tables/changing their order... No idea. Never hurts to start simple, though.
[/spoiler]
Golden Sun Docs: Broken Seal - The Lost Age - Dark Dawn | Mario Sports Docs: Mario Golf & Mario Tennis | Misc. Docs
Refer to Yoshi's Lighthouse for any M&L hacking needs...

Sometimes I like to compare apples to oranges. (Figuratively) ... They are both fruits, but which one would you eat more? (If taken literally, I'd probably choose apples.)
Maybe it is over-analyzing, but it doesn't mean the information is useless.


The only GS Discord servers with significance are:
Golden Sun Hacking Community
GS Speedrunning
/r/Golden Sun
GS United Nations
Temple of Kraden

Can you believe how small the Golden Sun Community is?

2+2=5 Don't believe me? Those are rounded decimal numbers. Take that, flat earth theorists! :)

approxies

Ah, yes indeed I've tried GS1 version, sorry for misinformation.
Anyhow, I've finished my tool, that's mainly for translation purposes.
It supports GS1 and GS2, decoding and encoding. It can output as text or binary, but encoding is only for binary.*.bat file also inserts the data to ROM and warns, if data doesn't fit in place.