日常code相关总结 archieved

Posted by Hux Blog on August 6, 2018

Linux command

把文本分成多个文件 split

  • 比如万级别图片展示的网页展示
  • split -l 1000 no_bbx.html no_bbx_

查看文本编码方式

  • file 命令

多线程下载

  • cat query.gbk.res.urls | xargs -n 1 -P 64 wget -q -t 1 --timeout 1 -P test_data

awk

  • 优点
    • 相比较屏幕处理的优点,在处理庞大文件时不会出现内存溢出或是处理缓慢的问题,通常用来格式化文本信息

      获取linux命令pdf

  • man -t
    • Use /usr/bin/groff -Tps -mandoc -c to format the manual page, passing the output to stdout. The default output format of /usr/bin/groff -Tps -mandoc -c is Postscript, refer to the manual page of /usr/bin/groff -Tps -mandoc -c for ways to pick an alternate format.
    • 简而言之转成ps,且输出到stdout
    • 也可以用pipe输出为pdf。man -t awk | ps2pdf - awk.pdf
    • mac安装ps2pdf:brew info ghostscript

manual语法协议(conventions)

  • 看stackoverflow了解到有一种语言叫Brainfuck。

  • The list below shows conventional or suggested sections. Most manual pages should include at least the highlighted sections. Arrange a new manual page so that sections are placed in the order shown in the list.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Sections within a manual page

*NAME*
*SYNOPSIS*
CONFIGURATION      [Normally only in Section 4]
*DESCRIPTION*
OPTIONS            [Normally only in Sections 1, 8]
EXIT STATUS        [Normally only in Sections 1, 8]
RETURN VALUE       [Normally only in Sections 2, 3](如socket)
ERRORS             [Typically only in Sections 2, 3]
ENVIRONMENT
FILES
VERSIONS           [Normally only in Sections 2, 3]
CONFORMING TO
NOTES
BUGS
EXAMPLE
*SEE ALSO*
  • 各种符号
    • Brackets ([ ]) surround optional arguments.You can choose one or more items or no items. Do not type the square brackets themselves in the command line(hhh).
    • brackets (< >) indicate that the enclosed element (parameter, value, or information) is mandatory(强制的).
    • vertical bars (|) separate choices
    • ellipses (…) can be repeated(比如说cp、mv输入可以是多个文件,就会用到ellipses)
    • hyphen-minus
      • In Unix-like systems, the ASCII hyphen–minus is commonly used to specify options.
      • An argument that is a single hyphen–minus by itself without any letters usually specifies that a program should handle data coming from the standard input or send data to the standard output.
      • Two hyphen–minus characters ( – ) are used on some programs to specify “long options” where more descriptive option names are used.

代码规范(常用)

  • 所有函数名(lowcase) get_sth_from()
  • 所有变量(lowcase) num_channels
  • 所有两目、三目加空格;(i, j)也要加空格
  • for if等均要加大括号;且所有要前后空格。eg:for () {}

jupyter

  • 查看本机运行的server
    • jupyter notebook list
  • brew 安装特定版本

brew opencv多版本共存

cmake find_package 的路径设置。 cmake find_package 的搜索路径顺序

c++ 相对路径在CLION(其他软件中的使用)

在所有行中查找 字符串 出现的次数 :%s/字符串/&/gn

5,查找与替换

:s(substitute)命令用来查找和替换字符串。语法如下:

:{作用范围}s/{目标}/{替换}/{替换标志} 例如:%s/foo/bar/g会在全局范围(%)查找foo并替换为bar,所有出现都会被替换(g)

python的argv:python a.py ‘a’ ‘b’ argv[0]==a.py

Git

  • log图形化
    • git log --graph
    • 示例:How to read
    • 如图
    • The asterisks show where something was committed: e1193f8, 5a09431 and 30e367c were committed to the left branch (yielding a | on the right branch) whereas 420eac9 was committed to the right branch (yielding a | on the left branch). And that is what 420eac9 does different from the rest: it’s the only commit to the right branch.
  • 查看历史提交排名
    • git shortlog
      • summarize git log output
      • 具体参数:
      • git shortlog -nes
  • 查看所有分支
    • 列出所有分支
      • git branch -r (远程remote)
      • ` git branch -a (所有all)`

pip

  • upgrade命令
    • pip install –upgrade
    • Upgrade all specified packages to the newest available version. The handling of dependencies depends on the upgrade-strategy used.
  • 查看版本
    • pip -V
    • 一般软件查看版本:-v -V --version
  • 搜索库里某软件的所有版本(trick)
    • pip install tensorflow==
    • 如图

vim注意事项

  • vim 直接会把全部内容加入到内存中,如果文件超大,会oom。做批量操作,要确定不会爆。

拷贝到剪贴板

  • cat ~/.ssh/id_rsa.pub | pbcopy
  • pbcopy
    • pbcopy takes the standard input and places it in the specifie pasteboard. If no pasteboard is specified, the general pasteboard will be used by default.
    • usage
      • pbcopy < file
      • find . -name "*.png" | pbcopy

tar的版本

  • linux 一般是GNU的tar
  • mac 一般是BSD的tar
  • 两者不一致,会产生解压出error或者warning情况,brew install gnu-tar,然后mac使用gtar即可,命令一致。

split分割大的文件,便于传输

  • split命令
    • split a file into pieces
    • split -b 500m data.tar "data.tar.part",生成的文件会自动确定个数,然后命名为”data.tar.partaa”之类的。
  • 用cat合并
    • cat home.tar.bz2.parta* > backup.tar.gz.joined
  • 获取文件夹下所有文件夹的名字
    • 不像文件,可以直接find。需求是只要文件夹的名字,不是目录。目的是
      1
      2
      
      ll | grep '^d' | awk '{print $10}' > dir_list # 为了剔除文件
      for req in $(cat dir_list); do mkdir $req/end; done #批量建文件夹
      

tar

  • 压缩选项compress(-j)
    • (c mode only) Compress the resulting archive with bzip2(1). In extract or list modes, this option is ignored. Note that, unlike other tar implementations, this implementation recognizes bzip2 compression automatically when reading archives. 因此,我们会看到如果tar -jc那么命名一般是a.tar.bz,以此来标志是用-j压缩。
    • 若仅仅简单打包,直接用tar -cvf file.tar file/即可,不要加-j,否则很慢。
  • 压缩选项compress(-z)
    • (c mode only) Compress the resulting archive with gzip(1). In extract or list modes, this option is ignored. Note that, unlike other tar implementations, this implementation recognizes gzip compression automatically when reading archives.

mkdir

  • 递归新建目录(-p)
    • Create intermediate directories as required. If this option is not specified, the full path prefix of each operand must already exist. On the other hand, with this option specified, no error will be reported if a directory given as an operand already exists. Intermediate directories are created with permission bits of rwxrwxrwx (0777) as modified by the current umask, plus write and search permission for the owner.
    • mkdir -p path/to/your/nonexist/path

Web

图片快速生成html

  • 利用shell命令
1
2
find ./ -name '*.jpg' > list
for req in $(cat list); do echo "<img src='$req' width="40%">" >> index.html; done

上述代码中for req in $(cat list)会根据空格split,因此如果文件名中有空格,结果错误。下述代码,是以行为单位的。

1
cat list| while read req; do echo "<img src='$req'  width="40%">" >> index.html; done

PYTHON OS

os.makedirs 递归建立文件夹

  • 类似于linux的mkdir -p
  • os.makedirs
    • Super-mkdir; create a leaf directory and all intermediate ones. Works like mkdir, except that any intermediate path segment (not just the rightmost) will be created if it does not exist. This is recursive.
  • os.mkdir
    • Create a directory。父级目录必须存在。
  • os.walk 递归遍历文件夹
    • dirpath, dirnames, filenames = os.walk(‘root_path’)
    • 返回的是三元组,分别为当前目录路径、当前目录下文件夹(剔除.与..)、当前目录下文件
    • 为了获取完整路径,可使用os.path.join。
    • 该函数为自动递归,直到最底层,即无文件夹可递归。
    • 可以修改dirnames, filenames,比如筛除某些文件、给文件排。如果修改dirnames,那么接下来的递归只会进入dirnames剩余的目录。即可实现递归特定的文件夹,或者按照某种顺序递归文件夹。
    • 例子
    1
    2
    3
    4
    5
    6
    7
    8
    
      import os
      from os.path import join, getsize
      for root, dirs, files in os.walk('python/Lib/email'):
          print root, "consumes",
          print sum([getsize(join(root, name)) for name in files]),
          print "bytes in", len(files), "non-directory files"
          if 'CVS' in dirs:
              dirs.remove('CVS')  # don't visit CVS directories
    

PYTHON numpy

  • np.random.shuffle 随机不重复数组
    • np.random.rand可生成给定shape的从[0,1)抽样的float数组。 - np.random.randint可生成指定范围的整数,也可在此基础上生成数组,但是无法保证不重复。
    • np.random.randn返回数或者数组,从标准高斯抽样,通过反归一化可生成任意高斯分布抽样。
    • 很多情况下,不重复的数组实际上可转化为shuffle数据集,因为不重复的数组往往是为了shuffle。
    • np.random.shuffle(x), in-place 修改。

PYTHON

  • ipython、jupyter-notebook都是code,不是bin
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ipython code
#!/anaconda2/bin/python

# -*- coding: utf-8 -*-
import re
import sys

from IPython import start_ipython

if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
    sys.exit(start_ipython())
# re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])表示将sys.argv[0]中的'-script.pyw?或者'.exe'用''代替
# 也就是说去掉'-script.pyw?或者'.exe'
# sys.argv[0]是指第一个参数,比如'python a.py'指'a.py'
# C中的argc,在python中可由len(sys.argv)取得。
# C中的argv[0]是'./a.out'
  • python -m package
    • Searches sys.path for the named module and runs the corresponding .py file as a script.
    • 好处在于肯定用当前的python去执行,而不是*.py顶部指定的python路径。
    • example
      • python -m pip
      • python -m IPython

bbx的边界处理,np.clip

  • np.clip(a, a_min, a_max, out=None)
    • Given an interval, values outside the interval are clipped to the interval edges. For example, if an interval of [0, 1] is specified, values smaller than 0 become 0, and values larger than 1 become 1.
    • 一般先归一化到[0,1],然后为了鲁棒或者bbx真的在图外,用np.clip处理一下。

random.seed

  • 产生新的随机状态
    • random.seed(a=None)
    • None or no argument seeds from current time or from an operating system specific randomness source if available. If a is not None or an int or long, hash(a) is used instead.
    • eg: random.seed(42) random.shuffle(list)

直接return

  • 直接return,用res is None判断。

    eval字符串

  • ast.literal_eval
    • 某些情况下,我们的数据结构会被引号包裹,变为字符串,此时需要去掉字符串。
    • eg a='[3,4,5,6]',为了得到list,a_list = ast.literal_eval(a)即可。
    • Safely evaluate an expression node or a string containing a Python expression.

pickle

  • dump
    1
    2
    
    with open('filename','w') as f:
      pickle.dump(temp, f, pickle.HIGHEST_PROTOCOL)
    
  • load
    1
    2
    3
    
    fr = open('filename')
    inf = pickle.load(fr)
    fr.close()
    

list[int] 转 list(str)

  • 直接用str(list),整体会变为字符串。因此要使用map函数,注意举一反三。
    1
    2
    3
    
    a = [1,2,3,4]
    a = map(str, a)
    ' '.join(a) # 合成一个字符串'1 2 3 4'而不是'[1,2,3,4]'
    

file 与 base64转换

  • file2base64
    1
    2
    3
    4
    5
    6
    
    def img_file_encode_base64(img_path):
      file = open(img_path)
      imgfile_data = file.read()
      imgfile_base64 = base64.b64encode(imgfile_data)
      file.close()
      return imgfile_base64
    
  • cvimg2filebase64
    1
    2
    3
    4
    5
    
    def cvimg_encode_to_filebase64(img):
      #  cv::img to img_file_base64
      imgfile_data = cv2.imencode('.jpg', img)[1]  # [0] is bool [1] is data.  
      imgfile_base64 = base64.b64encode(imgfile_data)
      return imgfile_base64
    
  • filebase64tocvimg
    1
    2
    3
    4
    5
    6
    7
    
    def cvimg_decode_from_filebase64(imgfile_base64):
      # img_file_base64 to cv::img
      imgfile_data = base64.b64decode(imgfile_base64)
      # 内存中的数据,内存并不知道是什么,所以要用dytpe=np.uint8 分段读成数组
      npimg = np.frombuffer(imgfile_data, dtype=np.uint8);
      source_img = cv2.imdecode(npimg, 1)
      return source_img
    

机器学习

val与test的一个区别

  • validation是为了评价模型是否收敛,test才是实际中使用的。因此减均值这种操作,只用在validation里面,不用再test里面。由于没有使用减均值,阈值肯定会变化,因此test需要重新调阈值。

opencv过一遍

  • 无法解决:图片残缺,opencv只会报warning,无法catch

    skimage过一遍

  • 解决图片残缺问题 https://github.com/zeakey/tools/blob/master/python/rm_bad_imgs.py

NVIDIA 单卡监控

  • nvidia-smi dmon -i 15 -s -u