Base Crawler

pcapkit.vendor.default contains Vendor only, which is the base meta class for all vendor crawlers.

Vendor Crawler

class pcapkit.vendor.default.Vendor[source]

Bases: object

Default vendor generator.

Inherit this class with FLAG & LINK attributes, etc., to implement a new vendor generator.

Return type

Vendor

static __new__(cls)[source]

Subclassing checkpoint.

Raises

VendorNotImplemented – If cls is not a subclass of Vendor.

Return type

Vendor

__init__()[source]

Generate new constant files.

Return type

None

static wrap_comment(text)[source]

Wraps long-length text to shorter lines of comments.

Parameters

text (str) – Source text.

Return type

str

Returns

Wrapped comments.

safe_name(name)[source]

Convert enumeration name to enum.Enum friendly.

Parameters

name (str) – original enumeration name

Return type

str

Returns

Converted enumeration name.

rename(name, code, *, original=None)[source]

Rename duplicated fields.

Parameters
  • name (str) – Field name.

  • code (str) – Field code.

  • original (Optional[str]) – Original field name (extracted from CSV records).

Return type

str

Returns

Revised field name.

Example

If name has multiple occurrences in the source registry, the field name will be sanitised as ${name}_${code}.

Otherwise, the plain name will be returned.

process(data)[source]

Process CSV data.

Parameters

data (list[str]) – CSV data.

Returns

Enumeration fields and missing fields.

Return type

tuple[list[str], list[str]]

count(data)[source]

Count field records.

Parameters

data (list[str]) – CSV data.

Returns

Field recordings.

Return type

Counter[str]

context(data)[source]

Generate constant context.

Parameters

data (list[str]) – CSV data.

Returns

Constant context.

Return type

str

request(text=None)[source]

Fetch CSV file.

Parameters

text (Optional[str]) – Context from LINK.

Returns

CSV data.

Return type

list[str]

_request()[source]

Fetch CSV data from LINK.

This is the low-level call of request().

If LINK is None, it will directly call the upper method request() with NO arguments.

The method will first try to GET the content of LINK. Should any exception raised, it will first try with proxy settings from get_proxies().

Note

Since some LINK links are from Wikipedia, etc., they might not be available in certain areas, e.g. the amazing PRC :)

Would proxies failed again, it will prompt for user intervention, i.e. it will use webbrowser.open() to open the page in browser for you, and you can manually load that page and save the HTML source at the location it provides.

Returns

CSV data.

Warns

VendorRequestWarning – If connection failed with and/or without proxies.

Return type

list[str]

See also

request()

NAME: str

Name of constant enumeration.

Type

str

DOCS: str

Docstring of constant enumeration.

Type

str

FLAG: str

Value limit checker.

Type

str

Link to registry.

Type

str

Crawler Template

pcapkit.vendor.default.LINE(NAME, DOCS, FLAG, ENUM, MISS, MODL)

Default constant template of enumeration registry from IANA CSV.

Parameters
  • NAME (str) – name of the constant enumeration class

  • DOCS (str) – docstring for the constant enumeration class

  • FLAG (str) – threshold value validator (range of valid values)

  • ENUM (str) – enumeration data (class attributes)

  • MISS (str) – missing value handler (default value)

  • MODL (str) – module name of the constant enumeration class

Return type

str

Crawler Proxy

pcapkit.vendor.default.get_proxies()[source]

Get proxy for blocked sites.

The function will read PCAPKIT_HTTP_PROXY and PCAPKIT_HTTPS_PROXY, if any, for the proxy settings of requests.

Returns

Proxy settings for requests.

Return type

dict[str, str]