mirror of
https://github.com/FEX-Emu/linux.git
synced 2025-01-24 19:44:55 +00:00
net-timestamp: expand documentation
Expand Documentation/networking/timestamping.txt with new interfaces and bytestream timestamping. Also minor cleanup of the other text. Import txtimestamp.c test of the new features. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
parent
c5a65680b3
commit
8fe2f761ca
@ -1,102 +1,307 @@
|
|||||||
The existing interfaces for getting network packages time stamped are:
|
|
||||||
|
1. Control Interfaces
|
||||||
|
|
||||||
|
The interfaces for receiving network packages timestamps are:
|
||||||
|
|
||||||
* SO_TIMESTAMP
|
* SO_TIMESTAMP
|
||||||
Generate time stamp for each incoming packet using the (not necessarily
|
Generates a timestamp for each incoming packet in (not necessarily
|
||||||
monotonous!) system time. Result is returned via recv_msg() in a
|
monotonic) system time. Reports the timestamp via recvmsg() in a
|
||||||
control message as timeval (usec resolution).
|
control message as struct timeval (usec resolution).
|
||||||
|
|
||||||
* SO_TIMESTAMPNS
|
* SO_TIMESTAMPNS
|
||||||
Same time stamping mechanism as SO_TIMESTAMP, but returns result as
|
Same timestamping mechanism as SO_TIMESTAMP, but reports the
|
||||||
timespec (nsec resolution).
|
timestamp as struct timespec (nsec resolution).
|
||||||
|
|
||||||
* IP_MULTICAST_LOOP + SO_TIMESTAMP[NS]
|
* IP_MULTICAST_LOOP + SO_TIMESTAMP[NS]
|
||||||
Only for multicasts: approximate send time stamp by receiving the looped
|
Only for multicast:approximate transmit timestamp obtained by
|
||||||
packet and using its receive time stamp.
|
reading the looped packet receive timestamp.
|
||||||
|
|
||||||
The following interface complements the existing ones: receive time
|
* SO_TIMESTAMPING
|
||||||
stamps can be generated and returned for arbitrary packets and much
|
Generates timestamps on reception, transmission or both. Supports
|
||||||
closer to the point where the packet is really sent. Time stamps can
|
multiple timestamp sources, including hardware. Supports generating
|
||||||
be generated in software (as before) or in hardware (if the hardware
|
timestamps for stream sockets.
|
||||||
has such a feature).
|
|
||||||
|
|
||||||
SO_TIMESTAMPING:
|
|
||||||
|
|
||||||
Instructs the socket layer which kind of information should be collected
|
1.1 SO_TIMESTAMP:
|
||||||
and/or reported. The parameter is an integer with some of the following
|
|
||||||
bits set. Setting other bits is an error and doesn't change the current
|
|
||||||
state.
|
|
||||||
|
|
||||||
Four of the bits are requests to the stack to try to generate
|
This socket option enables timestamping of datagrams on the reception
|
||||||
timestamps. Any combination of them is valid.
|
path. Because the destination socket, if any, is not known early in
|
||||||
|
the network stack, the feature has to be enabled for all packets. The
|
||||||
|
same is true for all early receive timestamp options.
|
||||||
|
|
||||||
SOF_TIMESTAMPING_TX_HARDWARE: try to obtain send time stamps in hardware
|
For interface details, see `man 7 socket`.
|
||||||
SOF_TIMESTAMPING_TX_SOFTWARE: try to obtain send time stamps in software
|
|
||||||
SOF_TIMESTAMPING_RX_HARDWARE: try to obtain receive time stamps in hardware
|
|
||||||
SOF_TIMESTAMPING_RX_SOFTWARE: try to obtain receive time stamps in software
|
1.2 SO_TIMESTAMPNS:
|
||||||
|
|
||||||
|
This option is identical to SO_TIMESTAMP except for the returned data type.
|
||||||
|
Its struct timespec allows for higher resolution (ns) timestamps than the
|
||||||
|
timeval of SO_TIMESTAMP (ms).
|
||||||
|
|
||||||
|
|
||||||
|
1.3 SO_TIMESTAMPING:
|
||||||
|
|
||||||
|
Supports multiple types of timestamp requests. As a result, this
|
||||||
|
socket option takes a bitmap of flags, not a boolean. In
|
||||||
|
|
||||||
|
err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, (void *) val, &val);
|
||||||
|
|
||||||
|
val is an integer with any of the following bits set. Setting other
|
||||||
|
bit returns EINVAL and does not change the current state.
|
||||||
|
|
||||||
|
|
||||||
|
1.3.1 Timestamp Generation
|
||||||
|
|
||||||
|
Some bits are requests to the stack to try to generate timestamps. Any
|
||||||
|
combination of them is valid. Changes to these bits apply to newly
|
||||||
|
created packets, not to packets already in the stack. As a result, it
|
||||||
|
is possible to selectively request timestamps for a subset of packets
|
||||||
|
(e.g., for sampling) by embedding an send() call within two setsockopt
|
||||||
|
calls, one to enable timestamp generation and one to disable it.
|
||||||
|
Timestamps may also be generated for reasons other than being
|
||||||
|
requested by a particular socket, such as when receive timestamping is
|
||||||
|
enabled system wide, as explained earlier.
|
||||||
|
|
||||||
|
SOF_TIMESTAMPING_RX_HARDWARE:
|
||||||
|
Request rx timestamps generated by the network adapter.
|
||||||
|
|
||||||
|
SOF_TIMESTAMPING_RX_SOFTWARE:
|
||||||
|
Request rx timestamps when data enters the kernel. These timestamps
|
||||||
|
are generated just after a device driver hands a packet to the
|
||||||
|
kernel receive stack.
|
||||||
|
|
||||||
|
SOF_TIMESTAMPING_TX_HARDWARE:
|
||||||
|
Request tx timestamps generated by the network adapter.
|
||||||
|
|
||||||
|
SOF_TIMESTAMPING_TX_SOFTWARE:
|
||||||
|
Request tx timestamps when data leaves the kernel. These timestamps
|
||||||
|
are generated in the device driver as close as possible, but always
|
||||||
|
prior to, passing the packet to the network interface. Hence, they
|
||||||
|
require driver support and may not be available for all devices.
|
||||||
|
|
||||||
|
SOF_TIMESTAMPING_TX_SCHED:
|
||||||
|
Request tx timestamps prior to entering the packet scheduler. Kernel
|
||||||
|
transmit latency is, if long, often dominated by queuing delay. The
|
||||||
|
difference between this timestamp and one taken at
|
||||||
|
SOF_TIMESTAMPING_TX_SOFTWARE will expose this latency independent
|
||||||
|
of protocol processing. The latency incurred in protocol
|
||||||
|
processing, if any, can be computed by subtracting a userspace
|
||||||
|
timestamp taken immediately before send() from this timestamp. On
|
||||||
|
machines with virtual devices where a transmitted packet travels
|
||||||
|
through multiple devices and, hence, multiple packet schedulers,
|
||||||
|
a timestamp is generated at each layer. This allows for fine
|
||||||
|
grained measurement of queuing delay.
|
||||||
|
|
||||||
|
SOF_TIMESTAMPING_TX_ACK:
|
||||||
|
Request tx timestamps when all data in the send buffer has been
|
||||||
|
acknowledged. This only makes sense for reliable protocols. It is
|
||||||
|
currently only implemented for TCP. For that protocol, it may
|
||||||
|
over-report measurement, because the timestamp is generated when all
|
||||||
|
data up to and including the buffer at send() was acknowledged: the
|
||||||
|
cumulative acknowledgment. The mechanism ignores SACK and FACK.
|
||||||
|
|
||||||
|
|
||||||
|
1.3.2 Timestamp Reporting
|
||||||
|
|
||||||
The other three bits control which timestamps will be reported in a
|
The other three bits control which timestamps will be reported in a
|
||||||
generated control message. If none of these bits are set or if none of
|
generated control message. Changes to the bits take immediate
|
||||||
the set bits correspond to data that is available, then the control
|
effect at the timestamp reporting locations in the stack. Timestamps
|
||||||
message will not be generated:
|
are only reported for packets that also have the relevant timestamp
|
||||||
|
generation request set.
|
||||||
|
|
||||||
SOF_TIMESTAMPING_SOFTWARE: report systime if available
|
SOF_TIMESTAMPING_SOFTWARE:
|
||||||
SOF_TIMESTAMPING_SYS_HARDWARE: report hwtimetrans if available (deprecated)
|
Report any software timestamps when available.
|
||||||
SOF_TIMESTAMPING_RAW_HARDWARE: report hwtimeraw if available
|
|
||||||
|
|
||||||
It is worth noting that timestamps may be collected for reasons other
|
SOF_TIMESTAMPING_SYS_HARDWARE:
|
||||||
than being requested by a particular socket with
|
This option is deprecated and ignored.
|
||||||
SOF_TIMESTAMPING_[TR]X_(HARD|SOFT)WARE. For example, most drivers that
|
|
||||||
can generate hardware receive timestamps ignore
|
|
||||||
SOF_TIMESTAMPING_RX_HARDWARE. It is still a good idea to set that flag
|
|
||||||
in case future drivers pay attention.
|
|
||||||
|
|
||||||
If timestamps are reported, they will appear in a control message with
|
SOF_TIMESTAMPING_RAW_HARDWARE:
|
||||||
cmsg_level==SOL_SOCKET, cmsg_type==SO_TIMESTAMPING, and a payload like
|
Report hardware timestamps as generated by
|
||||||
this:
|
SOF_TIMESTAMPING_TX_HARDWARE when available.
|
||||||
|
|
||||||
|
|
||||||
|
1.3.3 Timestamp Options
|
||||||
|
|
||||||
|
The interface supports one option
|
||||||
|
|
||||||
|
SOF_TIMESTAMPING_OPT_ID:
|
||||||
|
|
||||||
|
Generate a unique identifier along with each packet. A process can
|
||||||
|
have multiple concurrent timestamping requests outstanding. Packets
|
||||||
|
can be reordered in the transmit path, for instance in the packet
|
||||||
|
scheduler. In that case timestamps will be queued onto the error
|
||||||
|
queue out of order from the original send() calls. This option
|
||||||
|
embeds a counter that is incremented at send() time, to order
|
||||||
|
timestamps within a flow.
|
||||||
|
|
||||||
|
This option is implemented only for transmit timestamps. There, the
|
||||||
|
timestamp is always looped along with a struct sock_extended_err.
|
||||||
|
The option modifies field ee_info to pass an id that is unique
|
||||||
|
among all possibly concurrently outstanding timestamp requests for
|
||||||
|
that socket. In practice, it is a monotonically increasing u32
|
||||||
|
(that wraps).
|
||||||
|
|
||||||
|
In datagram sockets, the counter increments on each send call. In
|
||||||
|
stream sockets, it increments with every byte.
|
||||||
|
|
||||||
|
|
||||||
|
1.4 Bytestream Timestamps
|
||||||
|
|
||||||
|
The SO_TIMESTAMPING interface supports timestamping of bytes in a
|
||||||
|
bytestream. Each request is interpreted as a request for when the
|
||||||
|
entire contents of the buffer has passed a timestamping point. That
|
||||||
|
is, for streams option SOF_TIMESTAMPING_TX_SOFTWARE will record
|
||||||
|
when all bytes have reached the device driver, regardless of how
|
||||||
|
many packets the data has been converted into.
|
||||||
|
|
||||||
|
In general, bytestreams have no natural delimiters and therefore
|
||||||
|
correlating a timestamp with data is non-trivial. A range of bytes
|
||||||
|
may be split across segments, any segments may be merged (possibly
|
||||||
|
coalescing sections of previously segmented buffers associated with
|
||||||
|
independent send() calls). Segments can be reordered and the same
|
||||||
|
byte range can coexist in multiple segments for protocols that
|
||||||
|
implement retransmissions.
|
||||||
|
|
||||||
|
It is essential that all timestamps implement the same semantics,
|
||||||
|
regardless of these possible transformations, as otherwise they are
|
||||||
|
incomparable. Handling "rare" corner cases differently from the
|
||||||
|
simple case (a 1:1 mapping from buffer to skb) is insufficient
|
||||||
|
because performance debugging often needs to focus on such outliers.
|
||||||
|
|
||||||
|
In practice, timestamps can be correlated with segments of a
|
||||||
|
bytestream consistently, if both semantics of the timestamp and the
|
||||||
|
timing of measurement are chosen correctly. This challenge is no
|
||||||
|
different from deciding on a strategy for IP fragmentation. There, the
|
||||||
|
definition is that only the first fragment is timestamped. For
|
||||||
|
bytestreams, we chose that a timestamp is generated only when all
|
||||||
|
bytes have passed a point. SOF_TIMESTAMPING_TX_ACK as defined is easy to
|
||||||
|
implement and reason about. An implementation that has to take into
|
||||||
|
account SACK would be more complex due to possible transmission holes
|
||||||
|
and out of order arrival.
|
||||||
|
|
||||||
|
On the host, TCP can also break the simple 1:1 mapping from buffer to
|
||||||
|
skbuff as a result of Nagle, cork, autocork, segmentation and GSO. The
|
||||||
|
implementation ensures correctness in all cases by tracking the
|
||||||
|
individual last byte passed to send(), even if it is no longer the
|
||||||
|
last byte after an skbuff extend or merge operation. It stores the
|
||||||
|
relevant sequence number in skb_shinfo(skb)->tskey. Because an skbuff
|
||||||
|
has only one such field, only one timestamp can be generated.
|
||||||
|
|
||||||
|
In rare cases, a timestamp request can be missed if two requests are
|
||||||
|
collapsed onto the same skb. A process can detect this situation by
|
||||||
|
enabling SOF_TIMESTAMPING_OPT_ID and comparing the byte offset at
|
||||||
|
send time with the value returned for each timestamp. It can prevent
|
||||||
|
the situation by always flushing the TCP stack in between requests,
|
||||||
|
for instance by enabling TCP_NODELAY and disabling TCP_CORK and
|
||||||
|
autocork.
|
||||||
|
|
||||||
|
These precautions ensure that the timestamp is generated only when all
|
||||||
|
bytes have passed a timestamp point, assuming that the network stack
|
||||||
|
itself does not reorder the segments. The stack indeed tries to avoid
|
||||||
|
reordering. The one exception is under administrator control: it is
|
||||||
|
possible to construct a packet scheduler configuration that delays
|
||||||
|
segments from the same stream differently. Such a setup would be
|
||||||
|
unusual.
|
||||||
|
|
||||||
|
|
||||||
|
2 Data Interfaces
|
||||||
|
|
||||||
|
Timestamps are read using the ancillary data feature of recvmsg().
|
||||||
|
See `man 3 cmsg` for details of this interface. The socket manual
|
||||||
|
page (`man 7 socket`) describes how timestamps generated with
|
||||||
|
SO_TIMESTAMP and SO_TIMESTAMPNS records can be retrieved.
|
||||||
|
|
||||||
|
|
||||||
|
2.1 SCM_TIMESTAMPING records
|
||||||
|
|
||||||
|
These timestamps are returned in a control message with cmsg_level
|
||||||
|
SOL_SOCKET, cmsg_type SCM_TIMESTAMPING, and payload of type
|
||||||
|
|
||||||
struct scm_timestamping {
|
struct scm_timestamping {
|
||||||
struct timespec systime;
|
struct timespec ts[3];
|
||||||
struct timespec hwtimetrans;
|
|
||||||
struct timespec hwtimeraw;
|
|
||||||
};
|
};
|
||||||
|
|
||||||
recvmsg() can be used to get this control message for regular incoming
|
The structure can return up to three timestamps. This is a legacy
|
||||||
packets. For send time stamps the outgoing packet is looped back to
|
feature. Only one field is non-zero at any time. Most timestamps
|
||||||
the socket's error queue with the send time stamp(s) attached. It can
|
are passed in ts[0]. Hardware timestamps are passed in ts[2].
|
||||||
be received with recvmsg(flags=MSG_ERRQUEUE). The call returns the
|
|
||||||
original outgoing packet data including all headers preprended down to
|
|
||||||
and including the link layer, the scm_timestamping control message and
|
|
||||||
a sock_extended_err control message with ee_errno==ENOMSG and
|
|
||||||
ee_origin==SO_EE_ORIGIN_TIMESTAMPING. A socket with such a pending
|
|
||||||
bounced packet is ready for reading as far as select() is concerned.
|
|
||||||
If the outgoing packet has to be fragmented, then only the first
|
|
||||||
fragment is time stamped and returned to the sending socket.
|
|
||||||
|
|
||||||
All three values correspond to the same event in time, but were
|
ts[1] used to hold hardware timestamps converted to system time.
|
||||||
generated in different ways. Each of these values may be empty (= all
|
Instead, expose the hardware clock device on the NIC directly as
|
||||||
zero), in which case no such value was available. If the application
|
a HW PTP clock source, to allow time conversion in userspace and
|
||||||
is not interested in some of these values, they can be left blank to
|
optionally synchronize system time with a userspace PTP stack such
|
||||||
avoid the potential overhead of calculating them.
|
as linuxptp. For the PTP clock API, see Documentation/ptp/ptp.txt.
|
||||||
|
|
||||||
systime is the value of the system time at that moment. This
|
2.1.1 Transmit timestamps with MSG_ERRQUEUE
|
||||||
corresponds to the value also returned via SO_TIMESTAMP[NS]. If the
|
|
||||||
time stamp was generated by hardware, then this field is
|
|
||||||
empty. Otherwise it is filled in if SOF_TIMESTAMPING_SOFTWARE is
|
|
||||||
set.
|
|
||||||
|
|
||||||
hwtimeraw is the original hardware time stamp. Filled in if
|
For transmit timestamps the outgoing packet is looped back to the
|
||||||
SOF_TIMESTAMPING_RAW_HARDWARE is set. No assumptions about its
|
socket's error queue with the send timestamp(s) attached. A process
|
||||||
relation to system time should be made.
|
receives the timestamps by calling recvmsg() with flag MSG_ERRQUEUE
|
||||||
|
set and with a msg_control buffer sufficiently large to receive the
|
||||||
|
relevant metadata structures. The recvmsg call returns the original
|
||||||
|
outgoing data packet with two ancillary messages attached.
|
||||||
|
|
||||||
hwtimetrans is always zero. This field is deprecated. It used to hold
|
A message of cm_level SOL_IP(V6) and cm_type IP(V6)_RECVERR
|
||||||
hw timestamps converted to system time. Instead, expose the hardware
|
embeds a struct sock_extended_err. This defines the error type. For
|
||||||
clock device on the NIC directly as a HW PTP clock source, to allow
|
timestamps, the ee_errno field is ENOMSG. The other ancillary message
|
||||||
time conversion in userspace and optionally synchronize system time
|
will have cm_level SOL_SOCKET and cm_type SCM_TIMESTAMPING. This
|
||||||
with a userspace PTP stack such as linuxptp. For the PTP clock API,
|
embeds the struct scm_timestamping.
|
||||||
see Documentation/ptp/ptp.txt.
|
|
||||||
|
|
||||||
|
|
||||||
SIOCSHWTSTAMP, SIOCGHWTSTAMP:
|
2.1.1.2 Timestamp types
|
||||||
|
|
||||||
|
The semantics of the three struct timespec are defined by field
|
||||||
|
ee_info in the extended error structure. It contains a value of
|
||||||
|
type SCM_TSTAMP_* to define the actual timestamp passed in
|
||||||
|
scm_timestamping.
|
||||||
|
|
||||||
|
The SCM_TSTAMP_* types are 1:1 matches to the SOF_TIMESTAMPING_*
|
||||||
|
control fields discussed previously, with one exception. For legacy
|
||||||
|
reasons, SCM_TSTAMP_SND is equal to zero and can be set for both
|
||||||
|
SOF_TIMESTAMPING_TX_HARDWARE and SOF_TIMESTAMPING_TX_SOFTWARE. It
|
||||||
|
is the first if ts[2] is non-zero, the second otherwise, in which
|
||||||
|
case the timestamp is stored in ts[0].
|
||||||
|
|
||||||
|
|
||||||
|
2.1.1.3 Fragmentation
|
||||||
|
|
||||||
|
Fragmentation of outgoing datagrams is rare, but is possible, e.g., by
|
||||||
|
explicitly disabling PMTU discovery. If an outgoing packet is fragmented,
|
||||||
|
then only the first fragment is timestamped and returned to the sending
|
||||||
|
socket.
|
||||||
|
|
||||||
|
|
||||||
|
2.1.1.4 Packet Payload
|
||||||
|
|
||||||
|
The calling application is often not interested in receiving the whole
|
||||||
|
packet payload that it passed to the stack originally: the socket
|
||||||
|
error queue mechanism is just a method to piggyback the timestamp on.
|
||||||
|
In this case, the application can choose to read datagrams with a
|
||||||
|
smaller buffer, possibly even of length 0. The payload is truncated
|
||||||
|
accordingly. Until the process calls recvmsg() on the error queue,
|
||||||
|
however, the full packet is queued, taking up budget from SO_RCVBUF.
|
||||||
|
|
||||||
|
|
||||||
|
2.1.1.5 Blocking Read
|
||||||
|
|
||||||
|
Reading from the error queue is always a non-blocking operation. To
|
||||||
|
block waiting on a timestamp, use poll or select. poll() will return
|
||||||
|
POLLERR in pollfd.revents if any data is ready on the error queue.
|
||||||
|
There is no need to pass this flag in pollfd.events. This flag is
|
||||||
|
ignored on request. See also `man 2 poll`.
|
||||||
|
|
||||||
|
|
||||||
|
2.1.2 Receive timestamps
|
||||||
|
|
||||||
|
On reception, there is no reason to read from the socket error queue.
|
||||||
|
The SCM_TIMESTAMPING ancillary data is sent along with the packet data
|
||||||
|
on a normal recvmsg(). Since this is not a socket error, it is not
|
||||||
|
accompanied by a message SOL_IP(V6)/IP(V6)_RECVERROR. In this case,
|
||||||
|
the meaning of the three fields in struct scm_timestamping is
|
||||||
|
implicitly defined. ts[0] holds a software timestamp if set, ts[1]
|
||||||
|
is again deprecated and ts[2] holds a hardware timestamp if set.
|
||||||
|
|
||||||
|
|
||||||
|
3. Hardware Timestamping configuration: SIOCSHWTSTAMP and SIOCGHWTSTAMP
|
||||||
|
|
||||||
Hardware time stamping must also be initialized for each device driver
|
Hardware time stamping must also be initialized for each device driver
|
||||||
that is expected to do hardware time stamping. The parameter is defined in
|
that is expected to do hardware time stamping. The parameter is defined in
|
||||||
@ -167,8 +372,7 @@ enum {
|
|||||||
*/
|
*/
|
||||||
};
|
};
|
||||||
|
|
||||||
|
3.1 Hardware Timestamping Implementation: Device Drivers
|
||||||
DEVICE IMPLEMENTATION
|
|
||||||
|
|
||||||
A driver which supports hardware time stamping must support the
|
A driver which supports hardware time stamping must support the
|
||||||
SIOCSHWTSTAMP ioctl and update the supplied struct hwtstamp_config with
|
SIOCSHWTSTAMP ioctl and update the supplied struct hwtstamp_config with
|
||||||
|
@ -1,14 +1,20 @@
|
|||||||
|
# To compile, from the source root
|
||||||
|
#
|
||||||
|
# make headers_install
|
||||||
|
# make M=documentation
|
||||||
|
|
||||||
# kbuild trick to avoid linker error. Can be omitted if a module is built.
|
# kbuild trick to avoid linker error. Can be omitted if a module is built.
|
||||||
obj- := dummy.o
|
obj- := dummy.o
|
||||||
|
|
||||||
# List of programs to build
|
# List of programs to build
|
||||||
hostprogs-y := timestamping hwtstamp_config
|
hostprogs-y := timestamping txtimestamp hwtstamp_config
|
||||||
|
|
||||||
# Tell kbuild to always build the programs
|
# Tell kbuild to always build the programs
|
||||||
always := $(hostprogs-y)
|
always := $(hostprogs-y)
|
||||||
|
|
||||||
HOSTCFLAGS_timestamping.o += -I$(objtree)/usr/include
|
HOSTCFLAGS_timestamping.o += -I$(objtree)/usr/include
|
||||||
|
HOSTCFLAGS_txtimestamp.o += -I$(objtree)/usr/include
|
||||||
HOSTCFLAGS_hwtstamp_config.o += -I$(objtree)/usr/include
|
HOSTCFLAGS_hwtstamp_config.o += -I$(objtree)/usr/include
|
||||||
|
|
||||||
clean:
|
clean:
|
||||||
rm -f timestamping hwtstamp_config
|
rm -f timestamping txtimestamp hwtstamp_config
|
||||||
|
470
Documentation/networking/timestamping/txtimestamp.c
Normal file
470
Documentation/networking/timestamping/txtimestamp.c
Normal file
@ -0,0 +1,470 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2014 Google Inc.
|
||||||
|
* Author: willemb@google.com (Willem de Bruijn)
|
||||||
|
*
|
||||||
|
* Test software tx timestamping, including
|
||||||
|
*
|
||||||
|
* - SCHED, SND and ACK timestamps
|
||||||
|
* - RAW, UDP and TCP
|
||||||
|
* - IPv4 and IPv6
|
||||||
|
* - various packet sizes (to test GSO and TSO)
|
||||||
|
*
|
||||||
|
* Consult the command line arguments for help on running
|
||||||
|
* the various testcases.
|
||||||
|
*
|
||||||
|
* This test requires a dummy TCP server.
|
||||||
|
* A simple `nc6 [-u] -l -p $DESTPORT` will do
|
||||||
|
*
|
||||||
|
*
|
||||||
|
* This program is free software; you can redistribute it and/or modify it
|
||||||
|
* under the terms and conditions of the GNU General Public License,
|
||||||
|
* version 2, as published by the Free Software Foundation.
|
||||||
|
*
|
||||||
|
* This program is distributed in the hope it will be useful, but WITHOUT
|
||||||
|
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
||||||
|
* FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
|
||||||
|
* more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License along with
|
||||||
|
* this program; if not, write to the Free Software Foundation, Inc.,
|
||||||
|
* 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <arpa/inet.h>
|
||||||
|
#include <asm/types.h>
|
||||||
|
#include <error.h>
|
||||||
|
#include <errno.h>
|
||||||
|
#include <linux/errqueue.h>
|
||||||
|
#include <linux/if_ether.h>
|
||||||
|
#include <linux/net_tstamp.h>
|
||||||
|
#include <netdb.h>
|
||||||
|
#include <net/if.h>
|
||||||
|
#include <netinet/in.h>
|
||||||
|
#include <netinet/ip.h>
|
||||||
|
#include <netinet/udp.h>
|
||||||
|
#include <netinet/tcp.h>
|
||||||
|
#include <netpacket/packet.h>
|
||||||
|
#include <poll.h>
|
||||||
|
#include <stdarg.h>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <sys/ioctl.h>
|
||||||
|
#include <sys/select.h>
|
||||||
|
#include <sys/socket.h>
|
||||||
|
#include <sys/time.h>
|
||||||
|
#include <sys/types.h>
|
||||||
|
#include <time.h>
|
||||||
|
#include <unistd.h>
|
||||||
|
|
||||||
|
/* command line parameters */
|
||||||
|
static int cfg_proto = SOCK_STREAM;
|
||||||
|
static int cfg_ipproto = IPPROTO_TCP;
|
||||||
|
static int cfg_num_pkts = 4;
|
||||||
|
static int do_ipv4 = 1;
|
||||||
|
static int do_ipv6 = 1;
|
||||||
|
static int cfg_payload_len = 10;
|
||||||
|
static uint16_t dest_port = 9000;
|
||||||
|
|
||||||
|
static struct sockaddr_in daddr;
|
||||||
|
static struct sockaddr_in6 daddr6;
|
||||||
|
static struct timespec ts_prev;
|
||||||
|
|
||||||
|
static void __print_timestamp(const char *name, struct timespec *cur,
|
||||||
|
uint32_t key, int payload_len)
|
||||||
|
{
|
||||||
|
if (!(cur->tv_sec | cur->tv_nsec))
|
||||||
|
return;
|
||||||
|
|
||||||
|
fprintf(stderr, " %s: %lu s %lu us (seq=%u, len=%u)",
|
||||||
|
name, cur->tv_sec, cur->tv_nsec / 1000,
|
||||||
|
key, payload_len);
|
||||||
|
|
||||||
|
if ((ts_prev.tv_sec | ts_prev.tv_nsec)) {
|
||||||
|
int64_t cur_ms, prev_ms;
|
||||||
|
|
||||||
|
cur_ms = (long) cur->tv_sec * 1000 * 1000;
|
||||||
|
cur_ms += cur->tv_nsec / 1000;
|
||||||
|
|
||||||
|
prev_ms = (long) ts_prev.tv_sec * 1000 * 1000;
|
||||||
|
prev_ms += ts_prev.tv_nsec / 1000;
|
||||||
|
|
||||||
|
fprintf(stderr, " (%+ld us)", cur_ms - prev_ms);
|
||||||
|
}
|
||||||
|
|
||||||
|
ts_prev = *cur;
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
static void print_timestamp_usr(void)
|
||||||
|
{
|
||||||
|
struct timespec ts;
|
||||||
|
struct timeval tv; /* avoid dependency on -lrt */
|
||||||
|
|
||||||
|
gettimeofday(&tv, NULL);
|
||||||
|
ts.tv_sec = tv.tv_sec;
|
||||||
|
ts.tv_nsec = tv.tv_usec * 1000;
|
||||||
|
|
||||||
|
__print_timestamp(" USR", &ts, 0, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void print_timestamp(struct scm_timestamping *tss, int tstype,
|
||||||
|
int tskey, int payload_len)
|
||||||
|
{
|
||||||
|
const char *tsname;
|
||||||
|
|
||||||
|
switch (tstype) {
|
||||||
|
case SCM_TSTAMP_SCHED:
|
||||||
|
tsname = " ENQ";
|
||||||
|
break;
|
||||||
|
case SCM_TSTAMP_SND:
|
||||||
|
tsname = " SND";
|
||||||
|
break;
|
||||||
|
case SCM_TSTAMP_ACK:
|
||||||
|
tsname = " ACK";
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
error(1, 0, "unknown timestamp type: %u",
|
||||||
|
tstype);
|
||||||
|
}
|
||||||
|
__print_timestamp(tsname, &tss->ts[0], tskey, payload_len);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __poll(int fd)
|
||||||
|
{
|
||||||
|
struct pollfd pollfd;
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
memset(&pollfd, 0, sizeof(pollfd));
|
||||||
|
pollfd.fd = fd;
|
||||||
|
ret = poll(&pollfd, 1, 100);
|
||||||
|
if (ret != 1)
|
||||||
|
error(1, errno, "poll");
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __recv_errmsg_cmsg(struct msghdr *msg, int payload_len)
|
||||||
|
{
|
||||||
|
struct sock_extended_err *serr = NULL;
|
||||||
|
struct scm_timestamping *tss = NULL;
|
||||||
|
struct cmsghdr *cm;
|
||||||
|
|
||||||
|
for (cm = CMSG_FIRSTHDR(msg);
|
||||||
|
cm && cm->cmsg_len;
|
||||||
|
cm = CMSG_NXTHDR(msg, cm)) {
|
||||||
|
if (cm->cmsg_level == SOL_SOCKET &&
|
||||||
|
cm->cmsg_type == SCM_TIMESTAMPING) {
|
||||||
|
tss = (void *) CMSG_DATA(cm);
|
||||||
|
} else if ((cm->cmsg_level == SOL_IP &&
|
||||||
|
cm->cmsg_type == IP_RECVERR) ||
|
||||||
|
(cm->cmsg_level == SOL_IPV6 &&
|
||||||
|
cm->cmsg_type == IPV6_RECVERR)) {
|
||||||
|
|
||||||
|
serr = (void *) CMSG_DATA(cm);
|
||||||
|
if (serr->ee_errno != ENOMSG ||
|
||||||
|
serr->ee_origin != SO_EE_ORIGIN_TIMESTAMPING) {
|
||||||
|
fprintf(stderr, "unknown ip error %d %d\n",
|
||||||
|
serr->ee_errno,
|
||||||
|
serr->ee_origin);
|
||||||
|
serr = NULL;
|
||||||
|
}
|
||||||
|
} else
|
||||||
|
fprintf(stderr, "unknown cmsg %d,%d\n",
|
||||||
|
cm->cmsg_level, cm->cmsg_type);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (serr && tss)
|
||||||
|
print_timestamp(tss, serr->ee_info, serr->ee_data, payload_len);
|
||||||
|
}
|
||||||
|
|
||||||
|
static int recv_errmsg(int fd)
|
||||||
|
{
|
||||||
|
static char ctrl[1024 /* overprovision*/];
|
||||||
|
static struct msghdr msg;
|
||||||
|
struct iovec entry;
|
||||||
|
static char *data;
|
||||||
|
int ret = 0;
|
||||||
|
|
||||||
|
data = malloc(cfg_payload_len);
|
||||||
|
if (!data)
|
||||||
|
error(1, 0, "malloc");
|
||||||
|
|
||||||
|
memset(&msg, 0, sizeof(msg));
|
||||||
|
memset(&entry, 0, sizeof(entry));
|
||||||
|
memset(ctrl, 0, sizeof(ctrl));
|
||||||
|
memset(data, 0, sizeof(data));
|
||||||
|
|
||||||
|
entry.iov_base = data;
|
||||||
|
entry.iov_len = cfg_payload_len;
|
||||||
|
msg.msg_iov = &entry;
|
||||||
|
msg.msg_iovlen = 1;
|
||||||
|
msg.msg_name = NULL;
|
||||||
|
msg.msg_namelen = 0;
|
||||||
|
msg.msg_control = ctrl;
|
||||||
|
msg.msg_controllen = sizeof(ctrl);
|
||||||
|
|
||||||
|
ret = recvmsg(fd, &msg, MSG_ERRQUEUE);
|
||||||
|
if (ret == -1 && errno != EAGAIN)
|
||||||
|
error(1, errno, "recvmsg");
|
||||||
|
|
||||||
|
__recv_errmsg_cmsg(&msg, ret);
|
||||||
|
|
||||||
|
free(data);
|
||||||
|
return ret == -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void do_test(int family, unsigned int opt)
|
||||||
|
{
|
||||||
|
char *buf;
|
||||||
|
int fd, i, val, total_len;
|
||||||
|
|
||||||
|
if (family == IPPROTO_IPV6 && cfg_proto != SOCK_STREAM) {
|
||||||
|
/* due to lack of checksum generation code */
|
||||||
|
fprintf(stderr, "test: skipping datagram over IPv6\n");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
total_len = cfg_payload_len;
|
||||||
|
if (cfg_proto == SOCK_RAW) {
|
||||||
|
total_len += sizeof(struct udphdr);
|
||||||
|
if (cfg_ipproto == IPPROTO_RAW)
|
||||||
|
total_len += sizeof(struct iphdr);
|
||||||
|
}
|
||||||
|
|
||||||
|
buf = malloc(total_len);
|
||||||
|
if (!buf)
|
||||||
|
error(1, 0, "malloc");
|
||||||
|
|
||||||
|
fd = socket(family, cfg_proto, cfg_ipproto);
|
||||||
|
if (fd < 0)
|
||||||
|
error(1, errno, "socket");
|
||||||
|
|
||||||
|
if (cfg_proto == SOCK_STREAM) {
|
||||||
|
val = 1;
|
||||||
|
if (setsockopt(fd, IPPROTO_TCP, TCP_NODELAY,
|
||||||
|
(char*) &val, sizeof(val)))
|
||||||
|
error(1, 0, "setsockopt no nagle");
|
||||||
|
|
||||||
|
if (family == PF_INET) {
|
||||||
|
if (connect(fd, (void *) &daddr, sizeof(daddr)))
|
||||||
|
error(1, errno, "connect ipv4");
|
||||||
|
} else {
|
||||||
|
if (connect(fd, (void *) &daddr6, sizeof(daddr6)))
|
||||||
|
error(1, errno, "connect ipv6");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
opt |= SOF_TIMESTAMPING_SOFTWARE |
|
||||||
|
SOF_TIMESTAMPING_OPT_ID;
|
||||||
|
if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING,
|
||||||
|
(char *) &opt, sizeof(opt)))
|
||||||
|
error(1, 0, "setsockopt timestamping");
|
||||||
|
|
||||||
|
for (i = 0; i < cfg_num_pkts; i++) {
|
||||||
|
memset(&ts_prev, 0, sizeof(ts_prev));
|
||||||
|
memset(buf, 'a' + i, total_len);
|
||||||
|
buf[total_len - 2] = '\n';
|
||||||
|
buf[total_len - 1] = '\0';
|
||||||
|
|
||||||
|
if (cfg_proto == SOCK_RAW) {
|
||||||
|
struct udphdr *udph;
|
||||||
|
int off = 0;
|
||||||
|
|
||||||
|
if (cfg_ipproto == IPPROTO_RAW) {
|
||||||
|
struct iphdr *iph = (void *) buf;
|
||||||
|
|
||||||
|
memset(iph, 0, sizeof(*iph));
|
||||||
|
iph->ihl = 5;
|
||||||
|
iph->version = 4;
|
||||||
|
iph->ttl = 2;
|
||||||
|
iph->daddr = daddr.sin_addr.s_addr;
|
||||||
|
iph->protocol = IPPROTO_UDP;
|
||||||
|
/* kernel writes saddr, csum, len */
|
||||||
|
|
||||||
|
off = sizeof(*iph);
|
||||||
|
}
|
||||||
|
|
||||||
|
udph = (void *) buf + off;
|
||||||
|
udph->source = ntohs(9000); /* random spoof */
|
||||||
|
udph->dest = ntohs(dest_port);
|
||||||
|
udph->len = ntohs(sizeof(*udph) + cfg_payload_len);
|
||||||
|
udph->check = 0; /* not allowed for IPv6 */
|
||||||
|
}
|
||||||
|
|
||||||
|
print_timestamp_usr();
|
||||||
|
if (cfg_proto != SOCK_STREAM) {
|
||||||
|
if (family == PF_INET)
|
||||||
|
val = sendto(fd, buf, total_len, 0, (void *) &daddr, sizeof(daddr));
|
||||||
|
else
|
||||||
|
val = sendto(fd, buf, total_len, 0, (void *) &daddr6, sizeof(daddr6));
|
||||||
|
} else {
|
||||||
|
val = send(fd, buf, cfg_payload_len, 0);
|
||||||
|
}
|
||||||
|
if (val != total_len)
|
||||||
|
error(1, errno, "send");
|
||||||
|
|
||||||
|
/* wait for all errors to be queued, else ACKs arrive OOO */
|
||||||
|
usleep(50 * 1000);
|
||||||
|
|
||||||
|
__poll(fd);
|
||||||
|
|
||||||
|
while (!recv_errmsg(fd)) {}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (close(fd))
|
||||||
|
error(1, errno, "close");
|
||||||
|
|
||||||
|
free(buf);
|
||||||
|
usleep(400 * 1000);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __attribute__((noreturn)) usage(const char *filepath)
|
||||||
|
{
|
||||||
|
fprintf(stderr, "\nUsage: %s [options] hostname\n"
|
||||||
|
"\nwhere options are:\n"
|
||||||
|
" -4: only IPv4\n"
|
||||||
|
" -6: only IPv6\n"
|
||||||
|
" -h: show this message\n"
|
||||||
|
" -l N: send N bytes at a time\n"
|
||||||
|
" -r: use raw\n"
|
||||||
|
" -R: use raw (IP_HDRINCL)\n"
|
||||||
|
" -p N: connect to port N\n"
|
||||||
|
" -u: use udp\n",
|
||||||
|
filepath);
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void parse_opt(int argc, char **argv)
|
||||||
|
{
|
||||||
|
int proto_count = 0;
|
||||||
|
char c;
|
||||||
|
|
||||||
|
while ((c = getopt(argc, argv, "46hl:p:rRu")) != -1) {
|
||||||
|
switch (c) {
|
||||||
|
case '4':
|
||||||
|
do_ipv6 = 0;
|
||||||
|
break;
|
||||||
|
case '6':
|
||||||
|
do_ipv4 = 0;
|
||||||
|
break;
|
||||||
|
case 'r':
|
||||||
|
proto_count++;
|
||||||
|
cfg_proto = SOCK_RAW;
|
||||||
|
cfg_ipproto = IPPROTO_UDP;
|
||||||
|
break;
|
||||||
|
case 'R':
|
||||||
|
proto_count++;
|
||||||
|
cfg_proto = SOCK_RAW;
|
||||||
|
cfg_ipproto = IPPROTO_RAW;
|
||||||
|
break;
|
||||||
|
case 'u':
|
||||||
|
proto_count++;
|
||||||
|
cfg_proto = SOCK_DGRAM;
|
||||||
|
cfg_ipproto = IPPROTO_UDP;
|
||||||
|
break;
|
||||||
|
case 'l':
|
||||||
|
cfg_payload_len = strtoul(optarg, NULL, 10);
|
||||||
|
break;
|
||||||
|
case 'p':
|
||||||
|
dest_port = strtoul(optarg, NULL, 10);
|
||||||
|
break;
|
||||||
|
case 'h':
|
||||||
|
default:
|
||||||
|
usage(argv[0]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!cfg_payload_len)
|
||||||
|
error(1, 0, "payload may not be nonzero");
|
||||||
|
if (cfg_proto != SOCK_STREAM && cfg_payload_len > 1472)
|
||||||
|
error(1, 0, "udp packet might exceed expected MTU");
|
||||||
|
if (!do_ipv4 && !do_ipv6)
|
||||||
|
error(1, 0, "pass -4 or -6, not both");
|
||||||
|
if (proto_count > 1)
|
||||||
|
error(1, 0, "pass -r, -R or -u, not multiple");
|
||||||
|
|
||||||
|
if (optind != argc - 1)
|
||||||
|
error(1, 0, "missing required hostname argument");
|
||||||
|
}
|
||||||
|
|
||||||
|
static void resolve_hostname(const char *hostname)
|
||||||
|
{
|
||||||
|
struct addrinfo *addrs, *cur;
|
||||||
|
int have_ipv4 = 0, have_ipv6 = 0;
|
||||||
|
|
||||||
|
if (getaddrinfo(hostname, NULL, NULL, &addrs))
|
||||||
|
error(1, errno, "getaddrinfo");
|
||||||
|
|
||||||
|
cur = addrs;
|
||||||
|
while (cur && !have_ipv4 && !have_ipv6) {
|
||||||
|
if (!have_ipv4 && cur->ai_family == AF_INET) {
|
||||||
|
memcpy(&daddr, cur->ai_addr, sizeof(daddr));
|
||||||
|
daddr.sin_port = htons(dest_port);
|
||||||
|
have_ipv4 = 1;
|
||||||
|
}
|
||||||
|
else if (!have_ipv6 && cur->ai_family == AF_INET6) {
|
||||||
|
memcpy(&daddr6, cur->ai_addr, sizeof(daddr6));
|
||||||
|
daddr6.sin6_port = htons(dest_port);
|
||||||
|
have_ipv6 = 1;
|
||||||
|
}
|
||||||
|
cur = cur->ai_next;
|
||||||
|
}
|
||||||
|
if (addrs)
|
||||||
|
freeaddrinfo(addrs);
|
||||||
|
|
||||||
|
do_ipv4 &= have_ipv4;
|
||||||
|
do_ipv6 &= have_ipv6;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void do_main(int family)
|
||||||
|
{
|
||||||
|
fprintf(stderr, "family: %s\n",
|
||||||
|
family == PF_INET ? "INET" : "INET6");
|
||||||
|
|
||||||
|
fprintf(stderr, "test SND\n");
|
||||||
|
do_test(family, SOF_TIMESTAMPING_TX_SOFTWARE);
|
||||||
|
|
||||||
|
fprintf(stderr, "test ENQ\n");
|
||||||
|
do_test(family, SOF_TIMESTAMPING_TX_SCHED);
|
||||||
|
|
||||||
|
fprintf(stderr, "test ENQ + SND\n");
|
||||||
|
do_test(family, SOF_TIMESTAMPING_TX_SCHED |
|
||||||
|
SOF_TIMESTAMPING_TX_SOFTWARE);
|
||||||
|
|
||||||
|
if (cfg_proto == SOCK_STREAM) {
|
||||||
|
fprintf(stderr, "\ntest ACK\n");
|
||||||
|
do_test(family, SOF_TIMESTAMPING_TX_ACK);
|
||||||
|
|
||||||
|
fprintf(stderr, "\ntest SND + ACK\n");
|
||||||
|
do_test(family, SOF_TIMESTAMPING_TX_SOFTWARE |
|
||||||
|
SOF_TIMESTAMPING_TX_ACK);
|
||||||
|
|
||||||
|
fprintf(stderr, "\ntest ENQ + SND + ACK\n");
|
||||||
|
do_test(family, SOF_TIMESTAMPING_TX_SCHED |
|
||||||
|
SOF_TIMESTAMPING_TX_SOFTWARE |
|
||||||
|
SOF_TIMESTAMPING_TX_ACK);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const char *sock_names[] = { NULL, "TCP", "UDP", "RAW" };
|
||||||
|
|
||||||
|
int main(int argc, char **argv)
|
||||||
|
{
|
||||||
|
if (argc == 1)
|
||||||
|
usage(argv[0]);
|
||||||
|
|
||||||
|
parse_opt(argc, argv);
|
||||||
|
resolve_hostname(argv[argc - 1]);
|
||||||
|
|
||||||
|
fprintf(stderr, "protocol: %s\n", sock_names[cfg_proto]);
|
||||||
|
fprintf(stderr, "payload: %u\n", cfg_payload_len);
|
||||||
|
fprintf(stderr, "server port: %u\n", dest_port);
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
|
||||||
|
if (do_ipv4)
|
||||||
|
do_main(PF_INET);
|
||||||
|
if (do_ipv6)
|
||||||
|
do_main(PF_INET6);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
Loading…
x
Reference in New Issue
Block a user