Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
Derek Martin wrote: >>* I have a list of several million strings, but I know that there are only >>about 100k unique ones. I want to make myself a frequency table that >>tells me how often each string occurred. Write me a data structure to do >>it. Everybody makes a hashmap, which is fine, but most people end up >>creating several million Integer objects when it can be done by only >>creating 100k. > > > Not knowing Java, I couldn't write the code to do it; but the obvious > solution seems to be to create a hash with the string as a key, and > the frequency would be the value. I'm having trouble imagining an > approach which would produce 1M hash objects and provide a correct > answer... What's the deal? What makes it tricky is two things about Java and its standard libraries: 1. You can only put objects in Java collections, not primitive data types. Java provides "wrapper" object types for the primitive types (Boolean, Byte, Character, Double, Float, Integer, Long, Short, String), largely for that reason. In current versions of Java, this is a pain, because you have to explicitly box the primitive types when you put them in the collection (using a constructor), and cast (since Java has only generic collections) and unbox (using a method of Integer or whatever) when you take them out. Java 1.5 introduces templated collections, like C++, and automatic boxing and unboxing of primitive types; that will simplify the code, but the efficiency cost remains. 2. The wrapper objects provided by Java are immutable. You can set their values only in their constructors. If you put Integers in your hash table, you will have to create a new object each time you change the value. If you want to avoid that overhead, you instead have to create a new class of object, say MutableInteger, that has a set() method or has the int as a public instance variable. In a real application, this comes more naturally; you will usually want to store other data in addition to the count of each string, so you create a class to hold all of it. Immutability of objects is sometimes a good thing. For instance, if you have a class, all of the members of which are immutable objects, you can safely do shallow rather than deep copies of that object. If the member objects can be modified, you have to do a deep copy if you clone your object. Here, however, immutability gets in your way, forcing you to create an extra class rather than using one from the library.
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |