I'm looking to process a large unicode text file that has over 6 GB. I need to count the frequency of each unique word. I'm currently using Data.Map to track the count of each word but it's taking way too much time and space. Here's the code:
import Data.Text.Lazy (Text(..), cons, pack, append)
import qualified Data.Text.Lazy as T
import qualified Data.Text.Lazy.IO as TI
import Data.Map.Strict hiding (foldr, map, foldl')
dictionate :: [Text] -> Map Text Word16
dictionate = fromListWith (+) . (`zip` [1,1..])
main = do
I've tried using Data.HashMap.Strict and it's a bit better. What am I doing wrong here? What's the best way to do it in terms of performance and time?
Free Guide: Managing storage for virtual environments
Complete a brief survey to get a complimentary 70-page whitepaper featuring the best methods and solutions for your virtual environment, as well as hypervisor-specific management advice from TechTarget experts. Don’t miss out on this exclusive content!