Since
commit 1dacc76d00
Author: Johannes Berg <johannes@sipsolutions.net>
Date: Wed Jul 1 11:26:02 2009 +0000
net/compat/wext: send different messages to compat tasks
we had a race condition when setting and then
restoring frag_list. Eric attempted to fix it,
but the fix created even worse problems.
However, the original motivation I had when I
added the code that turned out to be racy is
no longer clear to me, since we only copy up
to skb->len to userspace, which doesn't include
the frag_list length. As a result, not doing
any frag_list clearing and restoring avoids
the race condition, while not introducing any
other problems.
Additionally, while preparing this patch I found
that since none of the remaining netlink code is
really aware of the frag_list, we need to use the
original skb's information for packet information
and credentials. This fixes, for example, the
group information received by compat tasks.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: stable@kernel.org [2.6.31+, for 2.6.35 revert 1235f504aa]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
>Xin Xiaohui wrote:
> I looked into the code dev_gro_receive(), found the code here:
> if the frags[0] is pulled to 0, then the page will be released,
> and memmove() frags left.
> Is that right? I'm not sure if memmove do right or not, but
> frags[0].size is never set after memove at least. what I think
> a simple way is not to do anything if we found frags[0].size == 0.
> The patch is as followed.
...
This version of the patch fixes the bug directly in memmove.
Reported-by: "Xin, Xiaohui" <xiaohui.xin@intel.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We leak at least 32bits of kernel memory to user land in tc dump,
because we dont init all fields (capab ?) of the dumped structure.
Use C99 initializers so that holes and non explicit fields are zeroed.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit 24b36f019 (netfilter: {ip,ip6,arp}_tables: dont block
bottom half more than necessary), lockdep can raise a warning
because we attempt to lock a spinlock with BH enabled, while
the same lock is usually locked by another cpu in a softirq context.
Disable again BH to avoid these lockdep warnings.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Diagnosed-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 1235f504aa.
It causes regressions worse than the problem it was trying
to fix. Eric will try to solve the problem another way.
Signed-off-by: David S. Miller <davem@davemloft.net>
sysctl output ipv6 gc_elasticity and min_adv_mss as values divided by
HZ. However, they are not in unit of jiffies, since ip6_rt_min_advmss
refers to packet size and ip6_rt_fc_elasticity is used as scaler as in
expire>>ip6_rt_gc_elasticity, so replace the jiffies conversion
handler will regular handler for them.
This has impact on scripts that are currently working assuming the
divide by HZ, will yield different results with this patch in place.
Signed-off-by: Min Zhang <mzhang@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As xfrm_compile_policy runs within a read_lock, we cannot use
GFP_KERNEL for memory allocations.
Reported-by: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a limit for nframes as the number of frames in TX_SETUP and
RX_SETUP are derived from a single byte multiplex value by default.
Use-cases that would require to send/filter more than 256 CAN frames should
be implemented in userspace for complexity reasons anyway.
Additionally the assignments of unsigned values from userspace to signed
values in kernelspace and vice versa are fixed by using unsigned values in
kernelspace consistently.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Reported-by: Ben Hawkes <hawkes@google.com>
Acked-by: Urs Thuermann <urs.thuermann@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Require qdisc class ops .walk and .leaf for classful qdisc in
register_qdisc(). The checks could be done later insted, but these
ops are really needed and used by most of classful qdiscs.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sch_sfq as a classful qdisc needs the .leaf handler. Otherwise, there
is an oops possible in tc_modify_qdisc()/check_loop().
Fixes commit 7d2681a6ff
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Headroom size for control channel must be at least 48 bytes in some scenarios.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
remote_tx_win is intended to be set on receipt of an L2CAP
configuration request. The value is used to determine the size of the
transmit window on the remote side of an ERTM connection, so L2CAP
can stop sending frames when that remote window is full.
An incorrect remote_tx_win value will cause the stack to not fully
utilize the tx window (performance impact), or to overfill the remote
tx window (causing dropped frames or a disconnect).
This patch removes an extra setting of remote_tx_win when a
configuration response is received. The transmit window has a
different meaning in a response - it is an informational value
less than or equal to the local tx_win.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Incoming configuration values must be converted to native CPU order
before use. This fixes a bug where a little-endian MPS value is
compared to a native CPU value. On big-endian processors, this
can cause ERTM and streaming mode segmentation to produce PDUs
that are larger than the remote stack is expecting, or that would
produce fragmented skbs that the current FCS code cannot handle.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This is based on work originally done by Patric McHardy.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Verify in register_qdisc() some basic qdisc class handlers are present.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add dummy .unbind_tcf and .put qdisc class ops for easier verification.
(All other schedulers have it like this.)
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Accesses to "wdev->current_bss" must be
locked with the wdev lock, which action
frame transmission is missing.
Cc: stable@kernel.org [2.6.33+]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since there was added ->tcf_chain() method without ->bind_tcf() to
sch_sfq class options, there is oops when a filter is added with
the classid parameter.
Fixes commit 7d2681a6ff
netdev thread: null pointer at cls_api.c
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Reported-by: Franchoze Eric <franchoze@yandex.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although netif_rx() isn't expected to be called in process context with
preemption enabled, it'd better handle this case. And this is why get_cpu()
is used in the non-RPS #ifdef branch. If tree RCU is selected,
rcu_read_lock() won't disable preemption, so preempt_disable() should be
called explictly.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_parse_md5sig_option doesn't check md5sig option (TCPOPT_MD5SIG)
length, but tcp_v[46]_inbound_md5_hash assume that it's at least 16
bytes long.
Signed-off-by: Dmitry Popov <dp@highloadlab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netpoll_rx_on() check in __napi_gro_receive() skips part of the
"common" GRO_NORMAL path, especially "pull:" in dev_gro_receive(),
where at least eth header should be copied for entirely paged skbs.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PPP channel ops structure should be const.
Cleanup the declarations to use standard C99 format.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RxRPC can potentially deadlock as rxrpc_resend_time_expired() wants to get
call->state_lock so that it can alter the state of an RxRPC call. However, its
caller (call_timer_fn()) has an apparent lock on the timer struct.
The problem is that rxrpc_resend_time_expired() isn't permitted to lock
call->state_lock as this could cause a deadlock against rxrpc_send_abort() as
that takes state_lock and then attempts to delete the resend timer by calling
del_timer_sync().
The deadlock can occur because del_timer_sync() will sit there forever waiting
for rxrpc_resend_time_expired() to return, but the latter may then wait for
call->state_lock, which rxrpc_send_abort() holds around del_timer_sync()...
This leads to a warning appearing in the kernel log that looks something like
the attached.
It should be sufficient to simply dispense with the locks. It doesn't matter
if we set the resend timer expired event bit and queue the event processor
whilst we're changing state to one where the resend timer is irrelevant as the
event can just be ignored by the processor thereafter.
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.35-rc3-cachefs+ #115
-------------------------------------------------------
swapper/0 is trying to acquire lock:
(&call->state_lock){++--..}, at: [<ffffffffa00200d4>] rxrpc_resend_time_expired+0x56/0x96 [af_rxrpc]
but task is already holding lock:
(&call->resend_timer){+.-...}, at: [<ffffffff8103b675>] run_timer_softirq+0x182/0x2a5
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&call->resend_timer){+.-...}:
[<ffffffff810560bc>] __lock_acquire+0x889/0x8fa
[<ffffffff81056184>] lock_acquire+0x57/0x6d
[<ffffffff8103bb9c>] del_timer_sync+0x3c/0x86
[<ffffffffa002bb7a>] rxrpc_send_abort+0x50/0x97 [af_rxrpc]
[<ffffffffa002bdd9>] rxrpc_kernel_abort_call+0xa1/0xdd [af_rxrpc]
[<ffffffffa0061588>] afs_deliver_to_call+0x129/0x368 [kafs]
[<ffffffffa006181b>] afs_process_async_call+0x54/0xff [kafs]
[<ffffffff8104261d>] worker_thread+0x1ef/0x2e2
[<ffffffff81045f47>] kthread+0x7a/0x82
[<ffffffff81002cd4>] kernel_thread_helper+0x4/0x10
-> #0 (&call->state_lock){++--..}:
[<ffffffff81055237>] validate_chain+0x727/0xd23
[<ffffffff810560bc>] __lock_acquire+0x889/0x8fa
[<ffffffff81056184>] lock_acquire+0x57/0x6d
[<ffffffff813e6b69>] _raw_read_lock_bh+0x34/0x43
[<ffffffffa00200d4>] rxrpc_resend_time_expired+0x56/0x96 [af_rxrpc]
[<ffffffff8103b6e6>] run_timer_softirq+0x1f3/0x2a5
[<ffffffff81036828>] __do_softirq+0xa2/0x13e
[<ffffffff81002dcc>] call_softirq+0x1c/0x28
[<ffffffff810049f0>] do_softirq+0x38/0x80
[<ffffffff810361a2>] irq_exit+0x45/0x47
[<ffffffff81018fb3>] smp_apic_timer_interrupt+0x88/0x96
[<ffffffff81002893>] apic_timer_interrupt+0x13/0x20
[<ffffffff810011ac>] cpu_idle+0x4d/0x83
[<ffffffff813e06f3>] start_secondary+0x1bd/0x1c1
other info that might help us debug this:
1 lock held by swapper/0:
#0: (&call->resend_timer){+.-...}, at: [<ffffffff8103b675>] run_timer_softirq+0x182/0x2a5
stack backtrace:
Pid: 0, comm: swapper Not tainted 2.6.35-rc3-cachefs+ #115
Call Trace:
<IRQ> [<ffffffff81054414>] print_circular_bug+0xae/0xbd
[<ffffffff81055237>] validate_chain+0x727/0xd23
[<ffffffff810560bc>] __lock_acquire+0x889/0x8fa
[<ffffffff810539a7>] ? mark_lock+0x42f/0x51f
[<ffffffff81056184>] lock_acquire+0x57/0x6d
[<ffffffffa00200d4>] ? rxrpc_resend_time_expired+0x56/0x96 [af_rxrpc]
[<ffffffff813e6b69>] _raw_read_lock_bh+0x34/0x43
[<ffffffffa00200d4>] ? rxrpc_resend_time_expired+0x56/0x96 [af_rxrpc]
[<ffffffffa00200d4>] rxrpc_resend_time_expired+0x56/0x96 [af_rxrpc]
[<ffffffff8103b6e6>] run_timer_softirq+0x1f3/0x2a5
[<ffffffff8103b675>] ? run_timer_softirq+0x182/0x2a5
[<ffffffffa002007e>] ? rxrpc_resend_time_expired+0x0/0x96 [af_rxrpc]
[<ffffffff810367ef>] ? __do_softirq+0x69/0x13e
[<ffffffff81036828>] __do_softirq+0xa2/0x13e
[<ffffffff81002dcc>] call_softirq+0x1c/0x28
[<ffffffff810049f0>] do_softirq+0x38/0x80
[<ffffffff810361a2>] irq_exit+0x45/0x47
[<ffffffff81018fb3>] smp_apic_timer_interrupt+0x88/0x96
[<ffffffff81002893>] apic_timer_interrupt+0x13/0x20
<EOI> [<ffffffff81049de1>] ? __atomic_notifier_call_chain+0x0/0x86
[<ffffffff8100955b>] ? mwait_idle+0x6e/0x78
[<ffffffff81009552>] ? mwait_idle+0x65/0x78
[<ffffffff810011ac>] cpu_idle+0x4d/0x83
[<ffffffff813e06f3>] start_secondary+0x1bd/0x1c1
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The packet length should be checked before the packet data is dereferenced.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The packet length should be checked before the packet data is dereferenced.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The packet length should be checked before the packet data is dereferenced.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the TX path, skb->data points to the ethernet header, not the network
header. So when validating the packet length for accessing we should
take the ethernet header into account.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The warning is:
net/mac80211/main.c:688: warning: label ‘fail_ifa’ defined but not used
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Releasing the scan mutex while starting scans
can lead to unexpected things happening, so
we shouldn't do that. Fix that and hold the
mutex across the scan triggering.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1443 commits)
phy/marvell: add 88ec048 support
igb: Program MDICNFG register prior to PHY init
e1000e: correct MAC-PHY interconnect register offset for 82579
hso: Add new product ID
can: Add driver for esd CAN-USB/2 device
l2tp: fix export of header file for userspace
can-raw: Fix skb_orphan_try handling
Revert "net: remove zap_completion_queue"
net: cleanup inclusion
phy/marvell: add 88e1121 interface mode support
u32: negative offset fix
net: Fix a typo from "dev" to "ndev"
igb: Use irq_synchronize per vector when using MSI-X
ixgbevf: fix null pointer dereference due to filter being set for VLAN 0
e1000e: Fix irq_synchronize in MSI-X case
e1000e: register pm_qos request on hardware activation
ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice
net: Add getsockopt support for TCP thin-streams
cxgb4: update driver version
cxgb4: add new PCI IDs
...
Manually fix up conflicts in:
- drivers/net/e1000e/netdev.c: due to pm_qos registration
infrastructure changes
- drivers/net/phy/marvell.c: conflict between adding 88ec048 support
and cleaning up the IDs
- drivers/net/wireless/ipw2x00/ipw2100.c: trivial ipw2100_pm_qos_req
conflict (registration change vs marking it static)
Check result code of L2CAP information response. Otherwise
it would read invalid feature mask and access invalid memory.
Signed-off-by: Ville Tervo <ville.tervo@nokia.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
If the remote side doesn't support Enhanced Retransmission Mode neither
Streaming Mode, we shall not send the RFC option.
Some devices that only supports Basic Mode do not understanding the RFC
option. This patch fixes the regression found with these devices.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Commit fc6055a5ba (net: Introduce
skb_orphan_try()) allows an early orphan of the skb and takes care on
tx timestamping, which needs the sk-reference in the skb on driver level.
So does the can-raw socket, which has not been taken into account here.
The patch below adds a 'prevent_sk_orphan' bit in the skb tx shared info,
which fixes the problem discovered by Matthias Fuchs here:
http://marc.info/?t=128030411900003&r=1&w=2
Even if it's not a primary tx timestamp topic it fits well into some skb
shared tx context. Or should be find a different place for the information to
protect the sk reference until it reaches the driver level?
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 15e83ed788.
As explained by Johannes Berg, the optimization made here is
invalid. Or, at best, incomplete.
Not only destructor invocation, but conntract entry releasing
must be executed outside of hw IRQ context.
So just checking "skb->destructor" is insufficient.
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit ab95bfe01f replaces bridge and macvlan
hooks in __netif_receive_skb(), so dev.c doesn't need to include their headers.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was possible to use a negative offset in a u32 match to reference
the ethernet header or other parts of the link layer header.
This fixes the regression caused by:
commit fbc2e7d9cf
Author: Changli Gao <xiaosuo@gmail.com>
Date: Wed Jun 2 07:32:42 2010 -0700
cls_u32: use skb_header_pointer() to dereference data safely
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6c79bf0f24 subtracts PPPOE_SES_HLEN from mtu at
the front of ip_fragment(). So the later subtraction should be removed. The
MTU of 802.1q is also 1500, so MTU should not be changed.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Bart De Schuymer <bdschuym@pandora.bo>
----
net/ipv4/ip_output.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
Signed-off-by: Bart De Schuymer <bdschuym@pandora.bo>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initial TCP thin-stream commit did not add getsockopt support for the new
socket options: TCP_THIN_LINEAR_TIMEOUTS and TCP_THIN_DUPACK. This adds support
for them.
Signed-off-by: Josh Hunt <johunt@akamai.com>
Tested-by: Andreas Petlund <apetlund@simula.no>
Acked-by: Andreas Petlund <apetlund@simula.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
9P spec says:
"It is correct to consider remove to be a clunk with the
side effect of removing the file if permissions allow. "
So even if remove fails we need to destroy the fid.
Without this patch an rmdir on a directory with contents leave
the new cloned directory fid fid attached to fidlist. On umount
we dump the fids on the fidlist
~# rmdir /mnt2/test4/
rmdir: failed to remove `/mnt2/test4/': Directory not empty
~# umount /mnt2/
~# dmesg
[ 228.474323] Found fid 3 not clunked
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
TXATTRCREATE: Prepare a fid for setting xattr value on a file system object.
size[4] TXATTRCREATE tag[2] fid[4] name[s] attr_size[8] flags[4]
size[4] RXATTRCREATE tag[2]
txattrcreate gets a fid pointing to xattr. This fid can later be
used to set the xattr value.
flag value is derived from set Linux setxattr. The manpage says
"The flags parameter can be used to refine the semantics of the operation.
XATTR_CREATE specifies a pure create, which fails if the named attribute
exists already. XATTR_REPLACE specifies a pure replace operation, which
fails if the named attribute does not already exist. By default (no flags),
the extended attribute will be created if need be, or will simply replace
the value if the attribute exists."
The actual setxattr operation happens when the fid is clunked. At that point
the written byte count and the attr_size specified in TXATTRCREATE should be
same otherwise an error will be returned.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
TXATTRWALK: Descend a ATTR namespace
size[4] TXATTRWALK tag[2] fid[4] newfid[4] name[s]
size[4] RXATTRWALK tag[2] size[8]
txattrwalk gets a fid pointing to xattr. This fid can later be
used to read the xattr value. If name is NULL the fid returned
can be used to get the list of extended attribute associated to
the file system object.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>