Here, are a few basic commands (too basic for those who have ever worked in linux) and instructions for installation of a couple of useful tools that was a kind of pre-requisite for setting up my working environment.
- View beginning of a file:
head -n3 file.txt
It displays the first 3 lines of the file.
- Finding the number of threads possible:
lscpu
It displays a lot of details: #threads = #cpu (“sockets”) * #cores_per_cpu (“Cores per socket”) * #threads_per_core (“Threads per core”).
- View a file:
cat file.txt
It displays the whole file in one go. To read it in fragments (screen-full), one option is to use editor like nano. Another, better in my opinion, option is to use less:
less file.txt
Use CTRL+F or CTRL+B to move forward or backward. There are other many useful navigation option. Use :q to quit. - Viewing content of an archived/compressed folder without extracting:
tar -tvf
It displays the human-readable summary of the files in the current directory.
- Viewing summary/file-sizes of files in a directory:
ls -sh .
It displays the human-readable summary of the files in the current directory.
- Viewing and killing a process:
ps -a
It displays the running processes with their pid.
kill pid
will kill a running process with the given pid.
- Viewing space taken by a directory:
du -h .
It displays the human-readable (
-h
flag) disk-space consumed by the current directory and its sub-directories. Use-s
flag to see the total and -a to include files as well. Use--max-depth n
to see space used by sub-directories up to level n. - Creating sym-links of all the files in one directory:
ln -s path/to/src/dir/* path/to/dest/dir
Note that it will not create sym-links to the hidden files.
- Writing output to files in addition to the std output:
some-command | tee file.txt It will write the output of the program to std terminal as well as to the file.txt .
- Redirection:
some-command > file.txt
“
>
” is for redirecting stdout. It will be overwritten. Use “>>
” for appending and not over-writing.some-command 2> file.txt
“
2>
” is for redirecting stderr (The file descriptor for stderr is 2, stdout is 1, and stdin is 0).some-command > 2>&1 “
2>&1
” is for redirecting sterr to stdout. Here, stdout is redirected to . Therefore, stderr and stdout will be written to . - Moving the whole directory along with the hidden files:
mv /sourcedir/{,.[^.]}* /destdir/
It will exclude
.
(pwd) and..
(parent directory) as is usually desired. - Disassociate the running process from the current terminal without stopping :
As it happens to me several times, I start a command (like wget) in a terminal and then I have to close the terminal with the command not yet finished and I prefer not to start the command from scratch usingnohup
. This is how we can proceed:
Hitctrl+z
for suspending the current foreground process (and place it as the stopped job). It will return something like
[1]+ Stopped
where 1 is the Job number.
Runbg %1
to start running the Job number 1 in the background.
Rundisown
for removing the Job from the list of the current shell (so that SIGHUP is not passed on to the process when the shell closes); Without job number, disown runs on the current process i.e. the process just ‘backgrounded’.
Some assembly related commands
- To count number of reads in .fastq.gz:
parallel "echo {} && gunzip -c {} | wc -l | awk '{d=\$1; print d/4;}'" ::: *.gz
- To count number of reads in .fasta:
grep -c '^>' file.txt
- To check first two lines of .fastq.gz:
zcat name_R1.fastq.gz | head -2
- To count length of each line:
awk '{print length($0)}'
GCC Issue
Installing newer version of gcc locally: Follow this for configuration and then the following to ~/.bashrc:
export PATH=~/gcc-8.2.0/bin:$PATH
export LD_LIBRARY_PATH=~/gcc-8.2.0/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=~/gcc-8.2.0/lib64:$LD_LIBRARY_PATH
Trouble: cmake still takes the older version.
Quick hack (gcc-8.2.0 is the latest version I installed locally):
export CC=~/gcc-8.2.0/bin/gcc export CXX=~/gcc-8.2.0/bin/g++
Or to permanently make this change, add the above lines in ~/.bash_profile:
PSCP
Securely copying files from linux to windows. Manual is here
Install pscp. In command prompt, write the following (to copy all files with extension “.ext” from the source directory into current directory ):
pscp -unsafe username@servername:/path/to/source/directory/*.ext .
I used -unsafe because I trust my server and it doesn’t support sftp.
Other Resources:
List of helpful Linux commands to process FASTQ files from NGS experiments