Friday, September 15, 2017

Hadoop Commands


Command
What It Does
Usage
Examples
dcat
Copies source paths tostdout.
hdfsdfs -cat URI [URI …]
hdfsdfs -cat hdfs://
<path>/file1; hdfs 
dfs -cat file:///file2 /user/hadoop/file3
chgrp
Changes the group association of files. With -R, makes the change recursively by way of the directory structure. The user must be the file owner or the superuser.
hdfsdfs -chgrp [-R] GROUP URI [URI …]

chmod
Changes the permissions of files. With -R, makes the change recursively by way of the directory structure. The user must be the file owner or the superuser.
hdfsdfs -chmod [-R] <MODE[,MODE]... | OCTALMODE> URI [URI …]
hdfsdfs -chmod 777 
test/data1.txt
chown
Changes the owner of files. With -R, makes the change recursively by way of the directory structure. The user must be the superuser.
hdfsdfs -chown [-R] [OWNER][:[GROUP]] URI [URI ]
hdfsdfs -chown -R 
hduser2 /opt/hadoop/logs
copyFromLocal
Works similarly to the put command, except that the source is restricted to a local file reference.
hdfsdfs -copyFromLocal<localsrc> URI
hdfsdfs -copyFromLocal input/docs/data2.txt hdfs://localhost/user/
rosemary/data2.txt
copyToLocal
Works similarly to the getcommand, except that the destination is restricted to a local file reference.
hdfsdfs -copyToLocal [-ignorecrc] [-crc] URI <localdst>
hdfsdfs -copyToLocal
data2.txt data2.copy.txt
count
Counts the number of directories, files, and bytes under the paths that match the specified file pattern.
hdfsdfs -count [-q] <paths>
hdfsdfs -count hdfs://nn1.example.com/
file1 hdfs://nn2.example.com/
file2
cp
Copies one or more files from a specified source to a specified destination. If you specify multiple sources, the specified destination must be a directory.
hdfsdfs -cp URI [URI …] <dest>
hdfsdfs -cp /user/hadoop/file1 /user/hadoop/file2 /user/hadoop/dir
du
Displays the size of the specified file, or the sizes of files and directories that are contained in the specified directory. If you specify the-s option, displays an aggregate summary of file sizes rather than individual file sizes. If you specify the-h option, formats the file sizes in a "human-readable" way.
hdfsdfs -du [-s] [-h] URI [URI …]
hdfsdfs -du /user/hadoop/dir1 /user/hadoop/file1
dus
Displays a summary of file sizes; equivalent to hdfsdfs -du –s.
hdfsdfs -dus<args>

expunge
Empties the trash. When you delete a file, it isn’t removed immediately from HDFS, but is renamed to a file in the /trash directory. As long as the file remains there, you can undelete it if you change your mind, though only the latest copy of the deleted file can be restored.
hdfsdfs –expunge

get
Copies files to the local file system. Files that fail a cyclic redundancy check (CRC) can still be copied if you specify the -ignorecrc option. The CRC is a common technique for detecting data transmission errors. CRC checksum files have the .crc extension and are used to verify the data integrity of another file. These files are copied if you specify the -crcoption.
hdfsdfs -get [-ignorecrc] [-crc] <src><localdst>
hdfsdfs -get /user/hadoop/file3
localfile
getmerge
Concatenates the files insrc and writes the result to the specified local destination file. To add a newline character at the end of each file, specify theaddnl option.
hdfsdfs -getmerge<src><localdst> [addnl]
hdfsdfs -getmerge /user/hadoop/mydir/ ~/result_fileaddnl
ls
Returns statistics for the specified files or directories.
hdfsdfs -ls<args>
hdfsdfs -ls /user/hadoop/file1
lsr
Serves as the recursive version of ls; similar to the Unix command ls -R.
hdfsdfs -lsr<args>
hdfsdfs -lsr /user/
hadoop
mkdir
Creates directories on one or more specified paths. Its behavior is similar to the Unix mkdir -p command, which creates all directories that lead up to the specified directory if they don’t exist already.
hdfsdfs -mkdir<paths>
hdfsdfs -mkdir /user/hadoop/dir5/temp
moveFromLocal
Works similarly to the putcommand, except that the source is deleted after it is copied.
hdfsdfs -moveFromLocal<localsrc><dest>
hdfsdfs -moveFromLocal localfile1 localfile2 /user/hadoop/hadoopdir
mv
Moves one or more files from a specified source to a specified destination. If you specify multiple sources, the specified destination must be a directory. Moving files across file systems isn’t permitted.
hdfsdfs -mv URI [URI …] <dest>
hdfsdfs -mv /user/hadoop/file1 /user/hadoop/file2
put
Copies files from the local file system to the destination file system. This command can also read input from stdin and write to the destination file system.
hdfsdfs -put <localsrc> ... <dest>
hdfsdfs -put localfile1 localfile2 /user/hadoop/hadoopdir;
hdfsdfs -put - /user/hadoop/hadoopdir (reads input from stdin)
rm
Deletes one or more specified files. This command doesn’t delete empty directories or files. To bypass the trash (if it’s enabled) and delete the specified files immediately, specify the -skipTrashoption.
hdfsdfs -rm [-skipTrash] URI [URI …]
hdfsdfs -rm hdfs://nn.example.com/
file9
rmr
Serves as the recursive version of –rm.
hdfsdfs -rmr [-skipTrash] URI [URI …]
hdfsdfs -rmr /user/hadoop/dir
setrep
Changes the replication factor for a specified file or directory. With -R, makes the change recursively by way of the directory structure.
hdfsdfs -setrep<rep> [-R] <path>
hdfsdfs -setrep 3 -R /user/hadoop/dir1
stat
Displays information about the specified path.
hdfsdfs -stat URI [URI …]
hdfsdfs -stat /user/hadoop/dir1
tail
Displays the last kilobyte of a specified file to stdout. The syntax supports the Unix -f option, which enables the specified file to be monitored. As new lines are added to the file by another process, tailupdates the display.
hdfsdfs -tail [-f] URI
hdfsdfs -tail /user/hadoop/dir1
test
Returns attributes of the specified file or directory. Specifies -e to determine whether the file or directory exists; -z to determine whether the file or directory is empty; and -d to determine whether the URI is a directory.
hdfsdfs -test -[ezd] URI
hdfsdfs -test /user/hadoop/dir1
text
Outputs a specified source file in text format. Valid input file formats are zipandTextRecordInputStream.
hdfsdfs -text <src>
hdfsdfs -text /user/hadoop/file8.zip
touchz
Creates a new, empty file of size 0 in the specified path.
hdfsdfs -touchz<path>
hdfsdfs -touchz /user/hadoop/file12