Rename files in Hadoop/Spark

1145765 pts.
Tags:
Big Data
Hadoop
Spark
A friend of mine (he would ask this question but his computer is down) has an input folder that contains over 100,000 files. He wants to do a batch operation and is trying to use Spark but when he tried this piece of code:
 final org.apache.hadoop.fs.FileSystem ghfs = org.apache.hadoop.fs.FileSystem.get(new java.net.URI(args[0]), new org.apache.hadoop.conf.Configuration());
        org.apache.hadoop.fs.FileStatus[] paths = ghfs.listStatus(new org.apache.hadoop.fs.Path(args[0]));
        List pathsList = new ArrayList<>();
        for (FileStatus path : paths) {
            pathsList.add(path.getPath().toString());
        }
        JavaRDD rddPaths = sc.parallelize(pathsList);

        rddPaths.foreach(new VoidFunction() {
            @Override
            public void call(String path) throws Exception {
                Path origPath = new Path(path);
                Path newPath = new Path(path.replace("taboola","customer"));
                ghfs.rename(origPath,newPath);
            }
        });
He gets an error saying: hadoop.fs.FileSystem is not Serializable. Is there another way to do this?
0

Answer Wiki

Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.

Discuss This Question:  

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

Thanks! We'll email you when relevant content is added and updated.

Following

Share this item with your network: