Python编程权威指南-快速上手-同乐学堂

一、字符串的 in 和 not in 操作符和切片操作

spam = 'hello world!'

fizz = spam[0:5]   #包括索引零到四

print(fizz)

二、有用的字符串方法

#1、字符串方法join()和split()

# 如果有一个字符串列表， 需要将它们连接起来，成为一个单独的字符串， join()方法就很有用。
# join()方法在一个字符串上调用， 参数是一个字符串列表， 返回一个字符串。
# 请注意，调用 join()方法的字符串， 被插入到列表参数中每个字符串的中间

print(', '.join(['cats', 'rats', 'bats']))

# 默认情况下，字符串'My name is Simon'按照各种空白字符分割， 诸如空格、 制表符或换行符。
# 这些空白字符不包含在返回列表的字符串中。也可以向 split()方法传入一个分割字符串， 指定它按照不同的字符串分割

# split()方法做的事情正好相反：它针对一个字符串调用， 返回一个字符串列表
print( 'My name is Simon'.split())

#2、rjust()和 ljust()、center()方法字符串方法返回调用它们的字符串的填充版本，通过插入空格来对齐文本。

print('Hello'.rjust(10))

print('Hello'.ljust(10))

print('Hello'.center(20, '='))

# 3、字符串方法startswith()和endswith()

# 如果它们所调用的字符串以该方法传入、的字符串开始或结束。否则，方法返回 False

print('Hello world!'.startswith('Hello'))
print('Hello world!'.endswith('world!'))

# 4、用strip()、rstrip()和lstrip()删除空白字符

# 有时候你希望删除字符串左边、右边或两边的空白字符（空格、制表符和换行符）

# strip()字符串方法将返回一个新的字符串，它的开头或末尾都没有空白字符

        print(' Hello World '.strip())
    

# lstrip()和 rstrip()方法将相应删除左边和右边的空白字符

print(' Hello World '.lstrip())

print(' Hello World '.rstrip())

# 有一个可选的字符串参数，指定两边的哪些字符应该删除

        print('SpamSpamBaconSpamEggsSpamSpam'.strip('ampS'))
    

# 向 strip()方法传入参数'ampS'，告诉它在变量中存储的字符串两端，删除出现的 a、m、p 和大写的 S。

# 字符的顺序并不重要：strip('ampS') 做的事情和strip('mapS')或 strip('Spam')一样。

#5、isX字符串方法

# isalnum（）
# 如果字符串中的所有字符都是字母数字并且至少有一个字符，则返回true，否则返回false

# isalpha（）
# 如果字符串中的所有字符都是字母并且至少有一个字符，则返回true，否则返回false

# isdecimal（）
# 如果字符串中的所有字符都是十进制字符并且至少有一个字符，则返回true，否则返回false

# isdigit（）
# 如果字符串中的所有字符都是数字并且至少有一个字符，则返回true，否则返回false

# isidentifier（）
# 根据语言定义标识符和关键字部分，如果字符串是有效标识符，则返回true 。
# 使用keyword.iskeyword()测试保留的标识符，例如 def和class。

# islower（）
# 如果字符串中的所有套用字符[4]都是小写，并且至少有一个套用字符，则返回true，否则返回false

# isnumeric（）
# 如果字符串中的所有字符都是数字字符，并且至少有一个字符，则返回true，否则返回false

# isprintable（）
# 如果字符串中的所有字符都可打印或字符串为空，则返回true，否则返回false

# isspace（）
# 如果字符串中只有空格字符，并且至少有一个字符，则返回true，否则返回false

# istitle（）
# 如果字符串是一个标题字符串并且至少有一个字符，则返回true，例如，大写字符只能跟在未写入的字符之后，而小写字母只能在已封装的字符之后。否则返回false

# isupper（）
# 如果字符串中的所有套用字符[4]都是大写且至少有一个套用字符，则返回true，否则返回false。

# 如果需要验证用户输入， isX 字符串方法是有用的。例如，下面的程序反复询

# 问用户年龄和口令，直到他们提供有效的输入。打开一个新的文件编辑器窗口，输

# 入以下程序，保存为 validateInput.py：

while True:
    print('Enter your age:')
    age = input()
    if age.isdecimal():
        break
    print('Please enter a number for your age.')

while True:
    print('Select a new password (letters and numbers only):')
    password = input()
    if password.isalnum():
        break
    print('Passwords can only have letters and numbers.')

# 在第一个 while 循环中，我们要求用户输入年龄，并将输入保存在 age 中。

# 如果 age 是有效的值（数字）我们就跳出第一个 while 循环，转向第二个循环，询问口令。

# 否则，我们告诉用户需要输入数字，并再次要求他们输入年龄。

# 在第二个while 循环中，我们要求输入口令，客户的输入保存在 password 中。

# 如果输入是字母或数字，就跳出循环。

#如果不是，我们并不满意，于是告诉用户口令必须是字母或数字，并再次要求他们输入口令。

# 6、用 pyperclip 模块拷贝粘贴字符串

# pyperclip 模块有 copy()和 paste()函数，可以向计算机的剪贴板发送文本，或从

# 它接收文本。将程序的输出发送到剪贴板，使它很容易粘贴到邮件、文字处理程序或其他软件中。pyperclip 模块不是 Python 自带的。要安装它，请遵从附录 A 中安装第三方模块的指南。

# 安装 pyperclip 模块后，在交互式环境中输入以下代码：

#如果你使用的是idea直接alt加回车即可

import pyperclip
pyperclip.copy('Hello world!')
pyperclip.paste()

三、在Wiki标记中添加无序列表

# 在编辑一篇维基百科的文章时，你可以创建一个无序列表，即让每个列表项占据一行，并在前面放置一个星号。

# 但是假设你有一个非常大的列表，希望添加前面的星号。

# 你可以在每一行开始处输入这些星号，一行接一行。或者也可以用一小段 Python 脚本，将这个任务自动化。

# Adds Wikipedia bullet points to the start

# of each line of text on the clipboard.

# 例如，剪贴板：

# Lists of animals

# Lists of aquarium life

# Lists of biologists by author abbreviation

# Lists of cultivars

# 代码运行后

# * Lists of animals

# * Lists of aquarium life

# * Lists of biologists by author abbreviation

# * Lists of cultivars

import pyperclip
text = pyperclip.paste()

#Separate lines and add stars.
lines = text.split('n')
for i in range(len(lines)):  # loop through all indexes for "lines" list
    lines[i] = '* ' + lines[i] # add star to each string in "lines" list
text = 'n'.join(lines)
pyperclip.copy(text)

四、模式匹配与正则表达式

1、利用正则表达式来查找文本模式

import re

# 向 re.compile()传入一个字符串值，表示正则表达式，
# 它将返回一个Regex 模式 对象（或者就简称为 Regex 对象）。

phoneNumRegex = re.compile(r'ddd-ddd-dddd')

# 变量名 mo 是一个通用的名称，用于 Match 对象
mo = phoneNumRegex.search('My number is 415-555-4242.')

# Match 对象有一个group()方法，它返回被查找字 符串中实际匹配的文本（稍后我会解释分组）
print('Phone number found: ' + mo.group())

# 正则表达式匹配复习

# 1．用import re 导入正则表达式模块。
# 2．用re.compile()函数创建一个Regex 对象（记得使用原始字符串）。
# 3．向Regex 对象的search()方法传入想查找的字符串。它返回一个 Match 对象。
# 4．调用 Match 对象的group()方法，返回实际匹配文本的字符串。

2、匹配更多模式

#2.1 利用括号分组

phoneNumRegex = re.compile(r'(ddd)-(ddd-dddd)')
mo = phoneNumRegex.search('My number is 415-555-4242.')

print(mo.group(0) )
print(mo.group(1))
print(mo.group(2))

# 如果想要一次就获取所有的分组，请使用groups()方法，注意函数名的复数形式

print(mo.groups())

areaCode, mainNumber = mo.groups()

print(areaCode)
print(mainNumber)

# 括号在正则表达式中有特殊的含义，但是如果你需要在文本中匹配括号，怎么办？

# 例如，你要匹配的电话号码，可能将区号放在一对括号中。

# 在这种情况下，就需要用倒斜杠对()进行字符转义。

phoneNumRegex = re.compile(r'((ddd)) (ddd-dddd)')
mo = phoneNumRegex.search('My phone number is (415) 555-4242.')
print(mo.group(1))
print(mo.group(2))

# 2.2用管道匹配多个分组

# 字符 | 称为“管道”。希望匹配许多表达式中的一个时，就可以使用它。

# 例如, 正则表达式r'Batman|Tina Fey'将匹配'Batman'或'Tina Fey'。

# 如果Batman 和Tina Fey 都出现在被查找的字符串中，第一次出现的匹配文本，

# 将作为Match 对象返回

        
heroRegex = re.compile (r'Batman|Tina Fey')
mo1 = heroRegex.search('Batman and Tina Fey.')

print(mo1.group()) # 和零没区别

mo2 = heroRegex.search('Tina Fey and Batman.')

print(mo2.group())

# 假设你希望匹配'Batman'、'Batmobile'、'Batcopter'和'Batbat'中任意一个。

# 因为所有这些字符串都以 Bat 开始，所以如果能够只指定一次前缀，就很方便。这可以通过括号实现

batRegex = re.compile(r'Bat(man|mobile|copter|bat)')
mo = batRegex.search('Batmobile lost a wheel')
print(mo.group())
print(mo.group(1))

# 如果需要匹配真正的管道字符，就用倒斜杠转义，即|。

#2.3 用问号实现可选匹配

# 有时候，想匹配的模式是可选的。就是说，不论这段文本在不在，正则表达式都会认为匹配

batRegex = re.compile(r'Bat(wo)?man')

mo1 = batRegex.search('The Adventures of Batman')

print(mo1.group())

mo2 = batRegex.search('The Adventures of Batwoman')

print(mo2.group())

# 正则表达式中的(wo)?部分表明，模式wo 是可选的分组。该正则表达式匹配的文本 中，wo 将出现零次或一次。
# 这就是为什么正则表达式既匹配'Batwoman'，又匹配'Batman'。

phoneRegex = re.compile(r'(ddd-)?ddd-dddd')

mo1 = phoneRegex.search('My number is 415-555-4242')

print(mo1.group())

mo2 = phoneRegex.search('My number is 555-4242')

print(mo2.group()) #如果需要匹配真正的问号字符，就使用转义字符?。

# 2.4用星号匹配零次或多次

# *（称为星号）意味着“匹配零次或多次”，即星号之前的分组，可以在文本中出现任意次。

# 它可以完全不存在，或一次又一次地重复。让我们再来看看Batman 的例子

batRegex = re.compile(r'Bat(wo)*man')
mo1 = batRegex.search('The Adventures of Batman')
print(mo1.group())

mo2 = batRegex.search('The Adventures of Batwoman')

print(mo2.group())
mo3 = batRegex.search('The Adventures of Batwowowowoman')

print(mo3.group())

# 对于'Batman'，正则表达式的(wo)*部分匹配wo 的零个实例。对于'Batwoman'，

# (wo)*匹配wo 的一个实例。对于'Batwowowowoman'，(wo)*匹配 wo 的4 个实例。

# 如果需要匹配真正的星号字符，就在正则表达式的星号字符前加上倒斜杠，即*。

#2.5 用加号匹配一次或多次

        batRegex = re.compile(r'Bat(wo)+man')
    

# 2.6用花括号匹配特定次数

# 如果想要一个分组重复特定次数，就在正则表达式中该分组的后面，跟上花括号包围的数字

# 除了一个数字，还可以指定一个范围，即在花括号中写下一个最小值、一个逗号和

# 一个最大值。例如，正则表达式(Ha){3,5}将匹配'HaHaHa'、'HaHaHaHa'和'HaHaHaHaHa'。

# (Ha){3,}将匹配 3 次或更多次实例，(Ha){,5}将匹配0 到5 次实例。花括号让正则表达式更简短。

# 这两个正则表达式匹配同样的模式

haRegex = re.compile(r'(Ha){3}')

mo1 = haRegex.search('HaHaHa')

print(mo1.group())

mo2 = haRegex.search('Ha')

print(mo2)#这里，(Ha){3}匹配'HaHaHa'，但不匹配'Ha'。因为它不匹配'Ha'，所以 search() 返回None。

# 3、贪心和非贪心匹配

# 在字符串'HaHaHaHaHa'中，因为(Ha){3,5}可以匹配3 个、4 个或5 个实例，

# 你可能会想，为什么在前面花括号的例子中，Match 对象的group()调用会返回'HaHaHaHaHa'，

# 而不是更短的可能结果。毕竟，'HaHaHa'和'HaHaHaHa'也能够有效地匹配正则表达式(Ha){3,5}。

# Python 的正则表达式默认是“贪心”的，这表示在有二义的情况下，它们会尽可能匹配最长的字符串。

# 花括号的“非贪心”版本匹配尽可能最短的字符串，即在结束的花括号后跟着一个问号。

# 花括号的贪心形式和非贪心形式之间的区别：

# Python 的正则表达式默认是“贪心”的，这表示在有二义的情况下，它们会尽 可能匹配最长的字符串
greedyHaRegex = re.compile(r'(Ha){3,5}')
mo1 = greedyHaRegex.search('HaHaHaHaHa')
print(mo1.group())
# 。花括号的“非贪心”版本匹配尽可能最短的字符串，即在 结束的花括号后跟着一个问号。
nongreedyHaRegex = re.compile(r'(Ha){3,5}?')
mo2 = nongreedyHaRegex.search('HaHaHaHaHa')

print(mo2.group())

#4、 findall()方法

# 除了search 方法外，Regex 对象也有一个findall()方法。
# search()将返回一个Match 对象，包含被查找字符串中的“第一次”匹配的文本，
# 而 findall()方法将返回一组字符串，包含被查找字符串中的所有匹配

phoneNumRegex = re.compile(r'ddd-ddd-dddd')
mo = phoneNumRegex.search('Cell: 415-555-9999 Work: 212-555-0000')
print(mo.group())

phoneNumRegex = re.compile(r'ddd-ddd-dddd') # has no group

# findall()不是返回一个 Match 对象，而是返回一个字符串列表，只要在正则表达式中没有分组
# 如果在正则表达式中有分组，那么findall 将返回元组的列表。
# 每个元组表示一个找 到的匹配，其中的项就是正则表达式中每个分组的匹配字符串

print(phoneNumRegex.findall('Cell: 415-555-9999 Work: 212-555-0000'))

# 5、建立自己的字符分类

# 有时候你想匹配一组字符，但缩写的字符分类（d、w、s 等）太宽泛。你可以用方括号定义自己的字符分类

        vowelRegex = re.compile(r'[aeiouAEIOU]')
    
        print(vowelRegex.findall('RoboCop eats baby food. BABY FOOD.'))

#6、通配字符

# 在正则表达式中，.（句点）字符称为“通配符”。它匹配除了换行之外的所有 字符
atRegex = re.compile(r'.at')
print( atRegex.findall('The cat in the hat sat on the flat mat.') )

# 用点-星匹配所有字符:有时候想要匹配所有字符串

nameRegex = re.compile(r'First Name: (.*) Last Name: (.*)')
mo = nameRegex.search('First Name: Al Last Name: Sweigart')
print(mo.group(1))
print(mo.group(2))
# 点-星使用“贪心”模式：它总是匹配尽可能多的文本。
# 要用“非贪心”模式匹配 所有文本，就使用点-星和问号

# 非贪心模式
nongreedyRegex = re.compile(r'<.*?>')
mo = nongreedyRegex.search('<To serve man> for dinner.>')

print(mo.group())

# 贪心模式
greedyRegex = re.compile(r'<.*>')

mo = greedyRegex.search('<To serve man> for dinner.>')
print(mo.group())

# 用句点字符匹配换行
# 点-星将匹配除换行外的所有字符,通过传入 re.DOTALL 作为 re.compile()的第
# 二个参数，可以让句点字符匹配所有字符，包括换行字符

noNewlineRegex = re.compile('.*')

print(noNewlineRegex.search('Serve the public trust.nProtect the innocent. nUphold the law.').group())

newlineRegex = re.compile('.*', re.DOTALL)

print(newlineRegex.search('Serve the public trust.nProtect the innocent.nUphold the law.').group())

# 不区分大小写的匹配

# 要让正则表达式不区分大小写，可以向 re.compile()传入 re.IGNORECASE 或re.I，作为第二个参数

        robocop = re.compile(r'robocop', re.I)
    
        print(robocop.search('RoboCop is part man, part machine, all cop.').group() )

#7、用 sub()方法替换字符串

# 正则表达式不仅能找到文本模式，而且能够用新的文本替换掉这些模式。

# Regex 对象的sub()方法需要传入两个参数。第一个参数是一个字符串，用于取代发现的匹配。

# 第二个参数是一个字符串，即正则表达式。sub()方法返回替换完成后的字符串

        
        namesRegex = re.compile(r'Agent w+')
    
        namesRegex.sub('CENSORED', 'Agent Alice gave the secr   et documents to Agent Bob.')

#8、项目：电话号码和 E-mail 地址提取程序

# 假设你有一个无聊的任务，要在一篇长的网页或文章中，找出所有电话号码和邮件地址

import pyperclip
phoneRegex = re.compile(r'''(
    (d{3}|(d{3}))? # area code
    (s|-|.)?         # separator
    (d{3})              # first 3 digits
    (s|-|.)          # separator
    (d{4})              # last 4 digits
    (s*(ext|x|ext.)s*(d{2,5}))?  # extension
    )''', re.VERBOSE)

# Create email regex.
emailRegex = re.compile(r'''(
    [a-zA-Z0-9._%+-]+      # username
    @                      # @ symbol
    [a-zA-Z0-9.-]+         # domain name
    (.[a-zA-Z]{2,4}){1,2} # dot-something
    )''', re.VERBOSE)

# Find matches in clipboard text.
text = str(pyperclip.paste())

matches = []
for groups in phoneRegex.findall(text):
    phoneNum = '-'.join([groups[1], groups[3], groups[5]])
    if groups[8] != '':
        phoneNum += ' x' + groups[8]
    matches.append(phoneNum)
for groups in emailRegex.findall(text):
    matches.append(groups[0])

# Copy results to the clipboard.
if len(matches) > 0:
    pyperclip.copy('n'.join(matches))
    print('Copied to clipboard:')
    print('n'.join(matches))
else:
    print('No phone numbers or email addresses found.')

###############################################################################################

# 运行完毕

        # 800-420-7240
    
        # 415-863-9900
    
        # 415-863-9950
    
        # info@nostarch.com
    
        # media@nostarch.com
    
        # academic@nostarch.com
    
        # info@nostarch.com

############################################################################

# 要处理的网页文本

# Skip to main content

# Home

# Search form

# Search

# Catalog

# Media

# Write for Us

# About Us

# Topics

# Arduino

# Art & Design

# General Computing

# Hacking & Computer Security

# Hardware / DIY

# JavaScript

# Kids

# LEGO®

# LEGO® MINDSTORMS®

# Linux & BSD

# Manga

# Minecraft

# Programming

# Python

# Science & Math

# Scratch

# System Administration

# Early Access

# Gift Certificates

# Free ebook edition with every print book purchased from nostarch.com!

# Shopping cart

# 0 Items Total: $0.00

# User login

# Log in

# Create account

# Contact Us

# No Starch Press, Inc.

# 245 8th Street

# San Francisco, CA 94103 USA

# Phone: 800.420.7240 or +1 415.863.9900 (9 a.m. to 5 p.m., M-F, PST)

# Fax: +1 415.863.9950

# Reach Us by Email

# General inquiries: info@nostarch.com

# Media requests: media@nostarch.com

# Academic requests: academic@nostarch.com (Please see this page for academic review requests)

# Help with your order: info@nostarch.com

# Reach Us on Social Media

# Twitter

# Facebook

# Navigation

# My account

# Want sweet deals?

# Sign up for our newsletter.

参考来源：Python编程快速上手让繁琐工作自动化

打赏

Python编程权威指南-快速上手

相关推荐

特别的技术，给特别的你！

觉得文章有用就打赏一下文章作者

非常感谢你的打赏，我们将继续给力更多优质内容，让我们一起创建更加美好的网络世界！

支付宝扫一扫打赏

微信扫一扫打赏