Commit Graph

34116 Commits

Author SHA1 Message Date
Eric Dumazet
44a0873d52 net: Introduce unregister_netdevice_queue()
This patchs adds an unreg_list anchor to struct net_device, and
introduces an unregister_netdevice_queue() function, able to queue
a net_device to a list instead of immediately unregister it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28 02:22:06 -07:00
Kalle Valo
e36e49f733 mac80211: add ieee80211_rx_ni()
ieee80211_rx() must be called with bottom halves disabled. To simplify
driver development implement ieee80211_rx_ni() which disables BH. This
function must be used when in process context.

Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-27 16:48:21 -04:00
Holger Schurig
ce470613cd cfg80211: no cookies in cfg80211_send_XXX()
Get rid of cookies in cfg80211_send_XXX() functions.

Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-27 16:48:16 -04:00
Michael Buesch
8b45499ccb ssb: Put host pointers into a union
This slightly shrinks the structure.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-27 16:47:55 -04:00
David S. Miller
cfadf853f6 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/sh_eth.c
2009-10-27 01:03:26 -07:00
Eric Dumazet
05423b2413 vlan: allow null VLAN ID to be used
We currently use a 16 bit field (vlan_tci) to store VLAN ID/PRIO on a skb.

Null value is used as a special value, meaning vlan tagging not enabled.
This forbids use of null vlan ID.

As pointed by David, some drivers use the 3 high order bits (PRIO)

As VLAN ID is 12 bits, we can use the remaining bit (CFI) as a flag, and
allow null VLAN ID.

In case future code really wants to use VLAN_CFI_MASK, we'll have to use
a bit outside of vlan_tci.

#define VLAN_PRIO_MASK         0xe000 /* Priority Code Point */
#define VLAN_PRIO_SHIFT        13
#define VLAN_CFI_MASK          0x1000 /* Canonical Format Indicator */
#define VLAN_TAG_PRESENT       VLAN_CFI_MASK
#define VLAN_VID_MASK          0x0fff /* VLAN Identifier */

Reported-by: Gertjan Hofman <gertjan_hofman@yahoo.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-27 01:02:33 -07:00
Eric Dumazet
7c28bd0b8e rtnetlink: speedup rtnl_dump_ifinfo()
When handling large number of netdevice, rtnl_dump_ifinfo()
is very slow because it has O(N^2) complexity.

Instead of scanning one single list, we can use the 256 sub lists
of the dev_index hash table.

This considerably speedups "ip link" operations

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-24 06:13:17 -07:00
Eric Dumazet
ef9a9d1183 ipv6 sit: RCU conversion phase I
SIT tunnels use one rwlock to protect their prl entries.

This first patch adds RCU locking for prl management,
with standard call_rcu() calls.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-24 06:07:55 -07:00
jamal
1c55d62e77 pkt_sched: skbedit add support for setting mark
This adds support for setting the skb mark.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-22 21:56:42 -07:00
Krishna Kumar
ea94ff3b55 net: Fix for dst_negative_advice
dst_negative_advice() should check for changed dst and reset
sk_tx_queue_mapping accordingly. Pass sock to the callers of
dst_negative_advice.

(sk_reset_txq is defined just for use by dst_negative_advice. The
only way I could find to get around this is to move dst_negative_()
from dst.h to dst.c, include sock.h in dst.c, etc)

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-20 18:55:46 -07:00
Krishna Kumar
e022f0b4a0 net: Introduce sk_tx_queue_mapping
Introduce sk_tx_queue_mapping; and functions that set, test and
get this value. Reset sk_tx_queue_mapping to -1 whenever the dst
cache is set/reset, and in socket alloc. Setting txq to -1 and
using valid txq=<0 to n-1> allows the tx path to use the value
of sk_tx_queue_mapping directly instead of subtracting 1 on every
tx.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-20 18:55:45 -07:00
Eric Dumazet
abf90cca97 net: Fix struct inet_timewait_sock bitfield annotation
commit 9e337b0f (net: annotate inet_timewait_sock bitfields)
added 4/8 bytes in struct inet_timewait_sock.

Fix this by declaring tw_ipv6_offset in the 'flags' bitfield
The 14 bits hole is named tw_pad to make it cleary apparent.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-20 01:13:26 -07:00
Arnaldo Carvalho de Melo
748879776e net: Avoid compiler warning for mmsghdr when CONFIG_COMPAT is not selected
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-20 01:09:17 -07:00
Eric Dumazet
d19742fb1c filter: Add SKF_AD_QUEUE instruction
It can help being able to filter packets on their queue_mapping.

If filter performance is not good, we could add a "numqueue" field
in struct packet_type, so that netif_nit_deliver() and other functions
can directly ignore packets with not expected queue number.

Lets experiment this simple filter extension first.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-20 01:06:22 -07:00
Wolfgang Grandegger
7b6856a029 can: provide library functions for skb allocation
This patch makes the private functions alloc_can_skb() and
alloc_can_err_skb() of the at91_can driver public and adapts all
drivers to use these. While making the patch I realized, that
the skb's are *not* setup consistently. It's now done as shown
below:

  skb->protocol = htons(ETH_P_CAN);
  skb->pkt_type = PACKET_BROADCAST;
  skb->ip_summed = CHECKSUM_UNNECESSARY;
  *cf = (struct can_frame *)skb_put(skb, sizeof(struct can_frame));
  memset(*cf, 0, sizeof(struct can_frame));

The frame is zeroed out to avoid uninitialized data to be passed to
user space. Some drivers or library code did not set "pkt_type" or
"ip_summed". Also,  "__constant_htons()" should not be used for
runtime invocations, as pointed out by David Miller.

Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-20 00:08:01 -07:00
jamal
7e75f93eda pkt_sched: ingress socket filter by mark
Allow bpf to set a filter to drop packets that dont
match a specific mark

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-19 23:22:49 -07:00
Inaky Perez-Gonzalez
c2315b4ea9 wimax/i2400m: clarify and fix i2400m->{ready,updown}
The i2400m driver uses two different bits to distinguish how much the
driver is up. i2400m->ready is used to denote that the infrastructure
to communicate with the device is up and running. i2400m->updown is
used to indicate if 'ready' and the device is up and running, ready to
take control and data traffic.

However, all this was pretty dirty and not clear, with many open spots
where race conditions were present.

This commit cleans up the situation by:

 - documenting the usage of both bits

 - setting them only in specific, well controlled places
   (i2400m_dev_start, i2400m_dev_stop)

 - ensuring the i2400m workqueue can't get in the middle of the
   setting by flushing it when i2400m->ready is set to zero. This
   allows the report hook not having to check again for the bit to be
   set [rx.c:i2400m_report_hook_work()].

 - using i2400m->updown to determine if the device is up and running
   instead of the wimax state in i2400m_dev_reset_handle().

 - not loosing missed messages sent by the hardware before
   i2400m->ready is set. In rx.c, whatever the device sends can be
   sent to user space over the message pipes as soon as the wimax
   device is registered, so don't wait for i2400m->ready to be set.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
2009-10-19 15:56:07 +09:00
Dirk Brandewie
7329012e67 wimax/i6x50: add Intel WiFi/WiMAX Link 6050 Series support
Add support for the WiMAX device in the Intel WiFi/WiMAX Link 6050
Series; this involves:

 - adding the device ID to bind to and an endpoint mapping for the
   driver to use.

 - at probe() time, some things are set depending on the device id:

   + the list of firmware names to try

   + mapping of endpoints

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
2009-10-19 15:55:59 +09:00
Cindy H Kao
f8fc329557 wimax/iwmc3200: add new sdio device ID to support iwmc3200 2.5GHz sku
Different sdio device IDs are designated to support different intel
wimax silicon sku. The new macro SDIO_DEVICE_ID_IWMC3200_WIMAX_2G5(0x1407)
is added to support iwmc3200 2.5GHz sku.  The existing
SDIO_DEVICE_ID_IWMC3200_WIMAX(0x1402) is for iwmc3200 general sku.

Signed-off-by: Cindy H Kao <cindy.h.kao@intel.com>
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
2009-10-19 15:55:59 +09:00
Inaky Perez-Gonzalez
923d708fed wimax/i2400m: fix reboot echo/ack barker deadlock
The i2400m based devices can get in a sort of a deadlock some times;
when they boot, they send a reboot "barker" (a magic number) and then
the driver has to echo that same barker to ack reception
(echo/ack). Then the device does a final ack by sending an ACK barker.

The first time this happens, we don't know ahead of time with barker
the device is going to send, as different device models and SKUs will
send different barker depending on the EEPROM programming.

If the device has sent the barker before the driver has been able to
read it, the driver looses, as it doesn't know which barker it has to
echo/ack back. With older devices, we tried a couple of combinations
and that always worked; but now, with adding support for more, in
which we have an unlimited number of new barkers, that is not an
option.

So we rework said case so that when the device gets stuck, we just
cycle through all the known types until one forces the device to send
an ack. Otherwise, the driver gives up and aborts.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
2009-10-19 15:55:55 +09:00
Inaky Perez-Gonzalez
32742e6158 wimax/i2400m: decide properly if using signed vs non-signed firmware loading
The i2400m based devices can boot two main types of firmware images:
signed and non-signed. Signed images have signature data included that
must match that of a certificate stored in the device.

Currently the code is making the decission on what type of firmware
load (signed vs non-signed) is going to be loaded based on a hardcoded
decission in __i2400m_ack_verify(), based on the barker the device
sent upon boot.

This is not flexible enough as future hardware will emit more barkers;
thus the bit has to be set in a place where there is better knowledge
of what is going on. This will be done in follow-up commits -- however
this patch paves the way for it.

So the querying of the mode is packed into i2400m_boot_is_signed();
the main changes are just using i2400m_boot_is_signed() to determine
the method to follow and setting i2400m->sboot in
i2400m_is_boot_barker(). The modifications in i2400m_dnload_init() and
i2400m_dnload_finalize() are just reorganizing the order of the if
blocks and thus look larger than they really are.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
2009-10-19 15:55:52 +09:00
Inaky Perez-Gonzalez
4c2b1a1164 wimax: allow specifying debug levels as command line option
Add "debug" module options to all the wimax modules (including
drivers) so that the debug levels can be set upon kernel boot or
module load time.

This is needed as currently there was a limitation where the debug
levels could only be set when a device was succesfully
enumerated. This made it difficult to debug issues that made a device
not probe properly.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
2009-10-19 15:55:50 +09:00
Steffen Klassert
eb2ff967a5 xfrm: remove skb_icv_walk
The last users of skb_icv_walk are converted to ahash now,
so skb_icv_walk is unused and can be removed.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-18 21:32:01 -07:00
Steffen Klassert
2ad9afbf5c ah: Remove obsolete code
ah4 and ah6 are converted to ahash now, so we can remove the
code for the obsolete hash algorithm.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-18 21:32:00 -07:00
Steffen Klassert
49cbf95248 ah: Add struct crypto_ahash to ah_data
To support for ahash algorithms, we add a pointer to a
crypto_ahash to ah_data.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-18 21:31:58 -07:00
Eric Dumazet
c720c7e838 inet: rename some inet_sock fields
In order to have better cache layouts of struct sock (separate zones
for rx/tx paths), we need this preliminary patch.

Goal is to transfert fields used at lookup time in the first
read-mostly cache line (inside struct sock_common) and move sk_refcnt
to a separate cache line (only written by rx path)

This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
sport and id fields. This allows a future patch to define these
fields as macros, like sk_refcnt, without name clashes.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-18 18:52:53 -07:00
Rémi Denis-Courmont
f062f41d06 Phonet: routing table Netlink interface
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-14 15:04:17 -07:00
Rémi Denis-Courmont
55748ac046 Phonet: routing table backend
The Phonet "universe" only has 64 addresses, so we keep a trivial flat
routing table.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-14 15:04:16 -07:00
Rémi Denis-Courmont
f14001fcd7 Phonet: deliver broadcast packets to broadcast sockets
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-14 15:04:15 -07:00
David S. Miller
421355de87 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-10-13 12:55:20 -07:00
Manuel Lauss
aace495933 net: smsc911x: allow platform_data to specify mac address
Extend the driver to accept a MAC address specified in platform_data.

Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-13 11:48:32 -07:00
David S. Miller
417c5233db Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2009-10-13 11:41:34 -07:00
Wolfgang Grandegger
a6e4bc5304 can: make the number of echo skb's configurable
This patch allows the CAN controller driver to define the number of echo
skb's used for the local loopback (echo), as suggested by Kurt Van
Dijck, with the function:

  struct net_device *alloc_candev(int sizeof_priv,
                                  unsigned int echo_skb_max);

The CAN drivers have been adapted accordingly. For the ems_usb driver,
as suggested by Sebastian Haas, the number of echo skb's has been
increased to 10, which improves the transmission performance a lot.

Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Kurt Van Dijck <kurt.van.dijck@eia.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-13 03:44:04 -07:00
Eric Dumazet
61321bbd62 net: Add netdev_alloc_skb_ip_align() helper
Instead of hardcoding NET_IP_ALIGN stuff in various network drivers,
we can add a helper around netdev_alloc_skb()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-13 03:44:03 -07:00
Eric Dumazet
f373b53b5f tcp: replace ehash_size by ehash_mask
Storing the mask (size - 1) instead of the size allows fast path to be
a bit faster.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-13 03:44:02 -07:00
Arnaldo Carvalho de Melo
a2e2725541 net: Introduce recvmmsg socket syscall
Meaning receive multiple messages, reducing the number of syscalls and
net stack entry/exit operations.

Next patches will introduce mechanisms where protocols that want to
optimize this operation will provide an unlocked_recvmsg operation.

This takes into account comments made by:

. Paul Moore: sock_recvmsg is called only for the first datagram,
  sock_recvmsg_nosec is used for the rest.

. Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
  works in the same fashion as the ppoll one.

  If the underlying protocol returns a datagram with MSG_OOB set, this
  will make recvmmsg return right away with as many datagrams (+ the OOB
  one) it has received so far.

. Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
  datagrams and then recvmsg returns an error, recvmmsg will return
  the successfully received datagrams, store the error and return it
  in the next call.

This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
where we will be able to acquire the lock only at batch start and end, not at
every underlying recvmsg call.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-12 23:40:10 -07:00
Neil Horman
3b885787ea net: Generalize socket rx gap / receive queue overflow cmsg
Create a new socket level option to report number of queue overflows

Recently I augmented the AF_PACKET protocol to report the number of frames lost
on the socket receive queue between any two enqueued frames.  This value was
exported via a SOL_PACKET level cmsg.  AFter I completed that work it was
requested that this feature be generalized so that any datagram oriented socket
could make use of this option.  As such I've created this patch, It creates a
new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
overflowed between any two given frames.  It also augments the AF_PACKET
protocol to take advantage of this new feature (as it previously did not touch
sk->sk_drops, which this patch uses to record the overflow count).  Tested
successfully by me.

Notes:

1) Unlike my previous patch, this patch simply records the sk_drops value, which
is not a number of drops between packets, but rather a total number of drops.
Deltas must be computed in user space.

2) While this patch currently works with datagram oriented protocols, it will
also be accepted by non-datagram oriented protocols. I'm not sure if thats
agreeable to everyone, but my argument in favor of doing so is that, for those
protocols which aren't applicable to this option, sk_drops will always be zero,
and reporting no drops on a receive queue that isn't used for those
non-participating protocols seems reasonable to me.  This also saves us having
to code in a per-protocol opt in mechanism.

3) This applies cleanly to net-next assuming that commit
977750076d (my af packet cmsg patch) is reverted

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-12 13:26:31 -07:00
Johannes Berg
d20ef63d32 mac80211: document ieee80211_rx() context requirement
ieee80211_rx() must be called with softirqs disabled
since the networking stack requires this for netif_rx()
and some code in mac80211 can assume that it can not
be processing its own tasklet and this call at the same
time.

It may be possible to remove this requirement after a
careful audit of mac80211 and doing any needed locking
improvements in it along with disabling softirqs around
netif_rx(). An alternative might be to push all packet
processing to process context in mac80211, instead of
to the tasklet, and add other synchronisation.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-12 15:55:53 -04:00
David S. Miller
d5e63bded6 Revert "af_packet: add interframe drop cmsg (v6)"
This reverts commit 977750076d.

Neil is reimplementing this generically, outside of AF_PACKET.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-12 03:00:31 -07:00
David S. Miller
7fe13c5733 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-10-11 23:15:47 -07:00
Eric Dumazet
5fdb9973c1 net: Fix struct sock bitfield annotation
Since commit a98b65a3 (net: annotate struct sock bitfield), we lost
8 bytes in struct sock on 64bit arches because of
kmemcheck_bitfield_end(flags) misplacement.

Fix this by putting together sk_shutdown, sk_no_check, sk_userlocks,
sk_protocol and sk_type in the 'flags' 32bits bitfield

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-11 23:03:52 -07:00
David S. Miller
8aa0f64ac3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-10-09 14:40:09 -07:00
Jin Dongming
e0e6f55d29 ipv6: Fix the size overflow of addrconf_sysctl array
(This patch fixes bug of commit f7734fdf61
 title "make TLLAO option for NA packets configurable")

When the IPV6 conf is used, the function sysctl_set_parent is called and the
array addrconf_sysctl is used as a parameter of the function.

The above patch added new conf "force_tllao" into the array addrconf_sysctl,
but the size of the array was not modified, the static allocated size is
DEVCONF_MAX + 1 but the real size is DEVCONF_MAX + 2, so the problem is
that the function sysctl_set_parent accessed wrong address.

I got the following information.
Call Trace:
    [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
    [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
    [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
    [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
    [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
    [<ffffffff810622d5>] __register_sysctl_paths+0xde/0x272
    [<ffffffff8110892d>] ? __kmalloc_track_caller+0x16e/0x180
    [<ffffffffa00cfac3>] ? __addrconf_sysctl_register+0xc5/0x144 [ipv6]
    [<ffffffff8141f2c9>] register_net_sysctl_table+0x48/0x4b
    [<ffffffffa00cfaf5>] __addrconf_sysctl_register+0xf7/0x144 [ipv6]
    [<ffffffffa00cfc16>] addrconf_init_net+0xd4/0x104 [ipv6]
    [<ffffffff8139195f>] setup_net+0x35/0x82
    [<ffffffff81391f6c>] copy_net_ns+0x76/0xe0
    [<ffffffff8107ad60>] create_new_namespaces+0xf0/0x16e
    [<ffffffff8107afee>] copy_namespaces+0x65/0x9f
    [<ffffffff81056dff>] copy_process+0xb2c/0x12c3
    [<ffffffff810576e1>] do_fork+0x14b/0x2d2
    [<ffffffff8107ac4e>] ? up_read+0xe/0x10
    [<ffffffff81438e73>] ? do_page_fault+0x27a/0x2aa
    [<ffffffff8101044b>] sys_clone+0x28/0x2a
    [<ffffffff81011fb3>] stub_clone+0x13/0x20
    [<ffffffff81011c72>] ? system_call_fastpath+0x16/0x1b

And the information of IPV6 in .config is as following.
IPV6 in .config:
    CONFIG_IPV6=m
    CONFIG_IPV6_PRIVACY=y
    CONFIG_IPV6_ROUTER_PREF=y
    CONFIG_IPV6_ROUTE_INFO=y
    CONFIG_IPV6_OPTIMISTIC_DAD=y
    CONFIG_IPV6_MIP6=m
    CONFIG_IPV6_SIT=m
    # CONFIG_IPV6_SIT_6RD is not set
    CONFIG_IPV6_NDISC_NODETYPE=y
    CONFIG_IPV6_TUNNEL=m
    CONFIG_IPV6_MULTIPLE_TABLES=y
    CONFIG_IPV6_SUBTREES=y
    CONFIG_IPV6_MROUTE=y
    CONFIG_IPV6_PIMSM_V2=y
    # CONFIG_IP_VS_IPV6 is not set
    CONFIG_NF_CONNTRACK_IPV6=m
    CONFIG_IP6_NF_MATCH_IPV6HEADER=m

I confirmed this patch fixes this problem.

Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-08 22:44:47 -07:00
Anant Gole
3758bf25db can: add TI CAN (HECC) driver
TI HECC (High End CAN Controller) module is found on many TI devices. It
has 32 hardware mailboxes with full implementation of CAN protocol 2.0B
with bus speeds up to 1Mbps. Specifications of the module are available
on TI web <http://www.ti.com>

Signed-off-by: Anant Gole <anantgole@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 22:02:53 -07:00
Eric Dumazet
f86dcc5aa8 udp: dynamically size hash tables at boot time
UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for
several setups.

4000 active UDP sockets -> 32 sockets per chain in average. An
incoming frame has to lookup all sockets to find best match, so long
chains hurt latency.

Instead of a fixed size hash table that cant be perfect for every
needs, let UDP stack choose its table size at boot time like tcp/ip
route, using alloc_large_system_hash() helper

Add an optional boot parameter, uhash_entries=x so that an admin can
force a size between 256 and 65536 if needed, like thash_entries and
rhash_entries.

dmesg logs two new lines :
[    0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes)
[    0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes)

Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non
debugging spinlocks.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 22:00:22 -07:00
Brian Haley
125a77ed9f IPv6: Fix 6RD build error
Fix build error introduced in commit fa857afcf - ipv6 sit: 6rd
(IPv6 Rapid Deployment) Support.  Struct in6_addr is the issue.
I'm only seeing this on x86_64 systems, not on 32-bit with same
IPv6 config options, so it could be there's a missing forward
declaration somewhere, but including the correct header file
fixes the problem too.

  CC [M]  net/ipv6/ip6_tunnel.o
In file included from net/ipv6/ip6_tunnel.c:31:
include/linux/if_tunnel.h:59: error: field ‘prefix’ has incomplete type
make[2]: *** [net/ipv6/ip6_tunnel.o] Error 1
make[1]: *** [net/ipv6] Error 2

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 13:57:10 -07:00
Wolfram Sang
d308e38fa5 include/linux/netdevice.h: fix nanodoc mismatch
nanodoc was missing an ndo_-prefix.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 13:53:11 -07:00
Kalle Valo
dfce95f51f cfg80211: add firmware and hardware version to wiphy
It's useful to provide firmware and hardware version to user space and have a
generic interface to retrieve them. Users can provide the version information
in bug reports etc.

Add fields for firmware and hardware version to struct wiphy.

(Dropped nl80211 bits for now and modified remaining bits in favor of
ethtool. -- JWL)

Cc: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-07 16:39:46 -04:00
Johannes Berg
3d23e349d8 wext: refactor
Refactor wext to
 * split out iwpriv handling
 * split out iwspy handling
 * split out procfs support
 * allow cfg80211 to have wireless extensions compat code
   w/o CONFIG_WIRELESS_EXT

After this, drivers need to
 - select WIRELESS_EXT	- for wext support
 - select WEXT_PRIV	- for iwpriv support
 - select WEXT_SPY	- for iwspy support

except cfg80211 -- which gets new hooks in wext-core.c
and can then get wext handlers without CONFIG_WIRELESS_EXT.

Wireless extensions procfs support is auto-selected
based on PROC_FS and anything that requires the wext core
(i.e. WIRELESS_EXT or CFG80211_WEXT).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-07 16:39:43 -04:00
Holger Schurig
7c89606e24 nl80211: report age of scan results
Linux keeps scan results up to 15 seconds. This can be a problem for fast
moving clients: they get back stale data. But if the kernel reports the age
of the BSS items, then user-space can simply weed out old entries by itself.

Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-07 16:39:42 -04:00
Stephen Hemminger
3295354322 dcb: data center bridging ops should be r/o
The data center bridging ops structure can be const

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 01:10:47 -07:00
Stephen Hemminger
ec1b4cf74c net: mark net_proto_ops as const
All usages of structure net_proto_ops should be declared const.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 01:10:46 -07:00
Octavian Purdila
f7734fdf61 make TLLAO option for NA packets configurable
On Friday 02 October 2009 20:53:51 you wrote:

> This is good although I would have shortened the name.

Ah, I knew I forgot something :) Here is v4.

tavi

>From 24d96d825b9fa832b22878cc6c990d5711968734 Mon Sep 17 00:00:00 2001
From: Octavian Purdila <opurdila@ixiacom.com>
Date: Fri, 2 Oct 2009 00:51:15 +0300
Subject: [PATCH] ipv6: new sysctl for sending TLLAO with unicast NAs

Neighbor advertisements responding to unicast neighbor solicitations
did not include the target link-layer address option. This patch adds
a new sysctl option (disabled by default) which controls whether this
option should be sent even with unicast NAs.

The need for this arose because certain routers expect the TLLAO in
some situations even as a response to unicast NS packets.

Moreover, RFC 2461 recommends sending this to avoid a race condition
(section 4.4, Target link-layer address)

Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com>
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 01:10:45 -07:00
Ben Hutchings
d73d3a8cb4 ethtool: Add reset operation
After updating firmware stored in flash, users may wish to reset the
relevant hardware and start the new firmware immediately.  This should
not be completely automatic as it may be disruptive.

A selective reset may also be useful for debugging or diagnostics.

This adds a separate reset operation which takes flags indicating the
components to be reset.  Drivers are allowed to reset only a subset of
those requested, and must indicate the actual subset.  This allows the
use of generic component masks and some future expansion.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 01:10:44 -07:00
Eric Dumazet
d250a5f90e pkt_sched: gen_estimator: Dont report fake rate estimators
Jarek Poplawski a écrit :
>
>
> Hmm... So you made me to do some "real" work here, and guess what?:
> there is one serious checkpatch warning! ;-) Plus, this new parameter
> should be added to the function description. Otherwise:
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
>
> Thanks,
> Jarek P.
>
> PS: I guess full "Don't" would show we really mean it...

Okay :) Here is the last round, before the night !

Thanks again

[RFC] pkt_sched: gen_estimator: Don't report fake rate estimators

We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
is running.

# tc -s -d qdisc
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0

User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
one (because no estimator is active)

After this patch, tc command output is :
$ tc -s -d qdisc
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

We add a parameter to gnet_stats_copy_rate_est() function so that
it can use gen_estimator_active(bstats, r), as suggested by Jarek.

This parameter can be NULL if check is not necessary, (htb for
example has a mandatory rate estimator)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 01:07:42 -07:00
YOSHIFUJI Hideaki / 吉藤英明
fa857afcf7 ipv6 sit: 6rd (IPv6 Rapid Deployment) Support.
IPv6 Rapid Deployment (6rd; draft-ietf-softwire-ipv6-6rd) builds upon
mechanisms of 6to4 (RFC3056) to enable a service provider to rapidly
deploy IPv6 unicast service to IPv4 sites to which it provides
customer premise equipment.  Like 6to4, it utilizes stateless IPv6 in
IPv4 encapsulation in order to transit IPv4-only network
infrastructure.  Unlike 6to4, a 6rd service provider uses an IPv6
prefix of its own in place of the fixed 6to4 prefix.

With this option enabled, the SIT driver offers 6rd functionality by
providing additional ioctl API to configure the IPv6 Prefix for in
stead of static 2002::/16 for 6to4.

Original patch was done by Alexandre Cassen <acassen@freebox.fr>
based on old Internet-Draft.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 01:07:37 -07:00
Ilia K
ee5e81f000 add vif using local interface index instead of IP
When routing daemon wants to enable forwarding of multicast traffic it
performs something like:

       struct vifctl vc = {
               .vifc_vifi  = 1,
               .vifc_flags = 0,
               .vifc_threshold = 1,
               .vifc_rate_limit = 0,
               .vifc_lcl_addr = ip, /* <--- ip address of physical
interface, e.g. eth0 */
               .vifc_rmt_addr.s_addr = htonl(INADDR_ANY),
         };
       setsockopt(fd, IPPROTO_IP, MRT_ADD_VIF, &vc, sizeof(vc));

This leads (in the kernel) to calling  vif_add() function call which
search the (physical) device using assigned IP address:
       dev = ip_dev_find(net, vifc->vifc_lcl_addr.s_addr);

The current API (struct vifctl) does not allow to specify an
interface other way than using it's IP, and if there are more than a
single interface with specified IP only the first one will be found.

The attached patch (against 2.6.30.4) allows to specify an interface
by its index, instead of IP address:

       struct vifctl vc = {
               .vifc_vifi  = 1,
               .vifc_flags = VIFF_USE_IFINDEX,   /* NEW */
               .vifc_threshold = 1,
               .vifc_rate_limit = 0,
               .vifc_lcl_ifindex = if_nametoindex("eth0"),   /* NEW */
               .vifc_rmt_addr.s_addr = htonl(INADDR_ANY),
         };
       setsockopt(fd, IPPROTO_IP, MRT_ADD_VIF, &vc, sizeof(vc));

Signed-off-by: Ilia K. <mail4ilia@gmail.com>

=== modified file 'include/linux/mroute.h'
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 00:48:41 -07:00
David S. Miller
7ecc59c1b7 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-10-06 22:43:16 -07:00
Eric Dumazet
bcdce7195e net: speedup sk_wake_async()
An incoming datagram must bring into cpu cache *lot* of cache lines,
in particular : (other parts omitted (hash chains, ip route cache...))

On 32bit arches :

offsetof(struct sock, sk_rcvbuf)       =0x30    (read)
offsetof(struct sock, sk_lock)         =0x34   (rw)

offsetof(struct sock, sk_sleep)        =0x50 (read)
offsetof(struct sock, sk_rmem_alloc)   =0x64   (rw)
offsetof(struct sock, sk_receive_queue)=0x74   (rw)

offsetof(struct sock, sk_forward_alloc)=0x98   (rw)

offsetof(struct sock, sk_callback_lock)=0xcc    (rw)
offsetof(struct sock, sk_drops)        =0xd8 (read if we add dropcount support, rw if frame dropped)
offsetof(struct sock, sk_filter)       =0xf8    (read)

offsetof(struct sock, sk_socket)       =0x138 (read)

offsetof(struct sock, sk_data_ready)   =0x15c   (read)


We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
with no fasync() structures. (socket->fasync_list ptr is probably already in cache
because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)

This avoids one cache line load per incoming packet for common cases (no fasync())

We can leave (or even move in a future patch) sk->sk_socket in a cold location

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-06 17:28:29 -07:00
Johannes Berg
7ffbe3fdac net: introduce NETDEV_POST_INIT notifier
For various purposes including a wireless extensions
bugfix, we need to hook into the netdev creation before
before netdev_register_kobject(). This will also ease
doing the dev type assignment that Marcel was working
on for cfg80211 drivers w/o touching them all.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 00:43:34 -07:00
Marcel Holtmann
e1e499eef2 usbnet: Use wwan%d interface name for mobile broadband devices
Add support for usbnet based devices like CDC-Ether to indicate that they
are actually mobile broadband devices. In that case use wwan%d as default
interface name.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 00:43:33 -07:00
Ben Hutchings
9c501935a3 net: Support inclusion of <linux/socket.h> before <sys/socket.h>
The following user-space program fails to compile:

    #include <linux/socket.h>
    #include <sys/socket.h>
    int main() { return 0; }

The reason is that <linux/socket.h> tests __GLIBC__ to decide whether it
should define various structures and macros that are now defined for
user-space by <sys/socket.h>, but __GLIBC__ is not defined if no libc
headers have yet been included.

It seems safe to drop support for libc 5 now.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Bastian Blank <waldi@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 00:24:36 -07:00
Eric Dumazet
0bfbedb14a tunnels: Optimize tx path
We currently dirty a cache line to update tunnel device stats
(tx_packets/tx_bytes). We better use the txq->tx_bytes/tx_packets
counters that already are present in cpu cache, in the cache
line shared with txq->_xmit_lock

This patch extends IPTUNNEL_XMIT() macro to use txq pointer
provided by the caller.

Also &tunnel->dev->stats can be replaced by &dev->stats

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 00:21:57 -07:00
Stephen Hemminger
16c6cf8bb4 ipv4: fib table algorithm performance improvement
The FIB algorithim for IPV4 is set at compile time, but kernel goes through
the overhead of function call indirection at runtime. Save some
cycles by turning the indirect calls to direct calls to either
hash or trie code.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 00:21:56 -07:00
Neil Horman
977750076d af_packet: add interframe drop cmsg (v6)
Add Ancilliary data to better represent loss information

I've had a few requests recently to provide more detail regarding frame loss
during an AF_PACKET packet capture session.  Specifically the requestors want to
see where in a packet sequence frames were lost, i.e. they want to see that 40
frames were lost between frames 302 and 303 in a packet capture file.  In order
to do this we need:

1) The kernel to export this data to user space
2) The applications to make use of it

This patch addresses item (1).  It does this by doing the following:

A) Anytime we drop a frame for which we would increment po->stats.tp_drops, we
also no increment a stats called po->stats.tp_gap.

B) Every time we successfully enqueue a frame to sk_receive_queue, we record the
value of po->stats.tp_gap in skb->mark.  skb->cb would nominally be the place to
record this, but since all the space there is used up, we're overloading
skb->mark.  Its safe to do since any enqueued packet is guaranteed to be
unshared at this point, and skb->mark isn't used for anything else in the rx
path to the application.  After we record tp_gap in the skb, we zero
po->stats.tp_gap.  This allows us to keep a counter of the number of frames lost
between any two enqueued packets

C) When the application goes to dequeue a frame from the packet socket, we look
at skb->mark for that frame.  If it is non-zero, we add a cmsg chunk to the
msghdr of level SOL_PACKET and type PACKET_GAPDATA.  Its a 32 bit integer that
represents the number of frames lost between this packet and the last previous
frame received.

Note there is a chance that if there is frame loss after a receive, and then the
socket is closed, some gap data might be lost.  This is covered by the use of
the PACKET_AUXDATA socket option, which gives total loss data.  With a bit of
math, the final gap can be determined that way.

I've tested this patch myself, and it works well.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

 include/linux/if_packet.h |    2 ++
 net/packet/af_packet.c    |   33 +++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 00:21:55 -07:00
Ben Hutchings
a9828ec6bc ethtool: Remove support for obsolete string query operations
The in-tree implementations have all been converted to
get_sset_count().

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 00:10:11 -07:00
Alexey Dobriyan
a99bbaf5ee headers: remove sched.h from poll.h
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-04 15:05:10 -07:00
Linus Torvalds
58e57fbd1c Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (41 commits)
  Revert "Seperate read and write statistics of in_flight requests"
  cfq-iosched: don't delay async queue if it hasn't dispatched at all
  block: Topology ioctls
  cfq-iosched: use assigned slice sync value, not default
  cfq-iosched: rename 'desktop' sysfs entry to 'low_latency'
  cfq-iosched: implement slower async initiate and queue ramp up
  cfq-iosched: delay async IO dispatch, if sync IO was just done
  cfq-iosched: add a knob for desktop interactiveness
  Add a tracepoint for block request remapping
  block: allow large discard requests
  block: use normal I/O path for discard requests
  swapfile: avoid NULL pointer dereference in swapon when s_bdev is NULL
  fs/bio.c: move EXPORT* macros to line after function
  Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
  cciss: fix build when !PROC_FS
  block: Do not clamp max_hw_sectors for stacking devices
  block: Set max_sectors correctly for stacking devices
  cciss: cciss_host_attr_groups should be const
  cciss: Dynamically allocate the drive_info_struct for each logical drive.
  cciss: Add usage_count attribute to each logical drive in /sys
  ...
2009-10-04 12:39:14 -07:00
Jens Axboe
0f78ab9899 Revert "Seperate read and write statistics of in_flight requests"
This reverts commit a9327cac44.

Corrado Zoccolo <czoccolo@gmail.com> reports:

"with 2.6.32-rc1 I started getting the following strange output from
"iostat -kx 2":
Linux 2.6.31bisect (et2) 	04/10/2009 	_i686_	(2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10,70    0,00    3,16   15,75    0,00   70,38

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda              18,22     0,00    0,67    0,01    14,77     0,02
43,94     0,01   10,53 39043915,03 2629219,87
sdb              60,89     9,68   50,79    3,04  1724,43    50,52
65,95     0,70   13,06 488437,47 2629219,87

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2,72    0,00    0,74    0,00    0,00   96,53

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00
sdb               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6,68    0,00    0,99    0,00    0,00   92,33

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00
sdb               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4,40    0,00    0,73    1,47    0,00   93,40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00
sdb               0,00     4,00    0,00    3,00     0,00    28,00
18,67     0,06   19,50 333,33 100,00

Global values for service time and utilization are garbage. For
interval values, utilization is always 100%, and service time is
higher than normal.

I bisected it down to:
[a9327cac44] Seperate read and write
statistics of in_flight requests
and verified that reverting just that commit indeed solves the issue
on 2.6.32-rc1."

So until this is debugged, revert the bad commit.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-04 21:04:38 +02:00
Martin K. Petersen
ac481c20ef block: Topology ioctls
Not all users of the topology information want to use libblkid.  Provide
the topology information through bdev ioctls.

Also clarify sector size comments for existing BLK ioctls.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-03 20:52:01 +02:00
Jens Axboe
8e29675555 cfq-iosched: implement slower async initiate and queue ramp up
This slowly ramps up the async queue depth based on the time
passed since the sync IO, and doesn't allow async at all until
a sync slice period has passed.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-03 16:27:13 +02:00
Linus Torvalds
90d5ffc729 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (46 commits)
  cnic: Fix NETDEV_UP event processing.
  uvesafb/connector: Disallow unpliviged users to send netlink packets
  pohmelfs/connector: Disallow unpliviged users to configure pohmelfs
  dst/connector: Disallow unpliviged users to configure dst
  dm/connector: Only process connector packages from privileged processes
  connector: Removed the destruct_data callback since it is always kfree_skb()
  connector/dm: Fixed a compilation warning
  connector: Provide the sender's credentials to the callback
  connector: Keep the skb in cn_callback_data
  e1000e/igb/ixgbe: Don't report an error if devices don't support AER
  net: Fix wrong sizeof
  net: splice() from tcp to pipe should take into account O_NONBLOCK
  net: Use sk_mark for routing lookup in more places
  sky2: irqname based on pci address
  skge: use unique IRQ name
  IPv4 TCP fails to send window scale option when window scale is zero
  net/ipv4/tcp.c: fix min() type mismatch warning
  Kconfig: STRIP: Remove stale bits of STRIP help text
  NET: mkiss: Fix typo
  tg3: Remove prev_vlan_tag from struct tx_ring_info
  ...
2009-10-02 13:37:18 -07:00
Philipp Reisner
f1489cfb17 connector: Removed the destruct_data callback since it is always kfree_skb()
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-02 10:54:05 -07:00
Philipp Reisner
7069331dbe connector: Provide the sender's credentials to the callback
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-02 10:54:01 -07:00
Philipp Reisner
293500a23f connector: Keep the skb in cn_callback_data
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-02 10:53:58 -07:00
KAMEZAWA Hiroyuki
4e649152cb memcg: some modification to softlimit under hierarchical memory reclaim.
This patch clean up/fixes for memcg's uncharge soft limit path.

Problems:
  Now, res_counter_charge()/uncharge() handles softlimit information at
  charge/uncharge and softlimit-check is done when event counter per memcg
  goes over limit. Now, event counter per memcg is updated only when
  memory usage is over soft limit. Here, considering hierarchical memcg
  management, ancesotors should be taken care of.

  Now, ancerstors(hierarchy) are handled in charge() but not in uncharge().
  This is not good.

  Prolems:
  1. memcg's event counter incremented only when softlimit hits. That's bad.
     It makes event counter hard to be reused for other purpose.

  2. At uncharge, only the lowest level rescounter is handled. This is bug.
     Because ancesotor's event counter is not incremented, children should
     take care of them.

  3. res_counter_uncharge()'s 3rd argument is NULL in most case.
     ops under res_counter->lock should be small. No "if" sentense is better.

Fixes:
  * Removed soft_limit_xx poitner and checks in charge and uncharge.
    Do-check-only-when-necessary scheme works enough well without them.

  * make event-counter of memcg incremented at every charge/uncharge.
    (per-cpu area will be accessed soon anyway)

  * All ancestors are checked at soft-limit-check. This is necessary because
    ancesotor's event counter may never be modified. Then, they should be
    checked at the same time.

Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01 16:11:13 -07:00
Mike Frysinger
b3db4a8ad1 asm-generic/gpio.h: pull in linux/kernel.h for might_sleep()
The asm-generic/gpio.h header uses the might_sleep() macro but doesn't
include the header for it, so any source code that might include
linux/gpio.h before linux/kernel.h can easily lead to a build failure.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01 16:11:11 -07:00
Alexey Dobriyan
828c09509b const: constify remaining file_operations
[akpm@linux-foundation.org: fix KVM]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01 16:11:11 -07:00
Jun'ichi Nomura
b0da3f0dad Add a tracepoint for block request remapping
Since 2.6.31 now has request-based device-mapper, it's useful to have
a tracepoint for request-remapping as well as bio-remapping.
This patch adds a tracepoint for request-remapping, trace_block_rq_remap().

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01 21:19:34 +02:00
Christoph Hellwig
67efc92580 block: allow large discard requests
Currently we set the bio size to the byte equivalent of the blocks to
be trimmed when submitting the initial DISCARD ioctl.  That means it
is subject to the max_hw_sectors limitation of the HBA which is
much lower than the size of a DISCARD request we can support.
Add a separate max_discard_sectors tunable to limit the size for discard
requests.

We limit the max discard request size in bytes to 32bit as that is the
limit for bio->bi_size.  This could be much larger if we had a way to pass
that information through the block layer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01 21:19:34 +02:00
Christoph Hellwig
c15227de13 block: use normal I/O path for discard requests
prepare_discard_fn() was being called in a place where memory allocation
was effectively impossible.  This makes it inappropriate for all but
the most trivial translations of Linux's DISCARD operation to the block
command set.  Additionally adding a payload there makes the ownership
of the bio backing unclear as it's now allocated by the device driver
and not the submitter as usual.

It is replaced with QUEUE_FLAG_DISCARD which is used to indicate whether
the queue supports discard operations or not.  blkdev_issue_discard now
allocates a one-page, sector-length payload which is the right thing
for the common ATA and SCSI implementations.

The mtd implementation of prepare_discard_fn() is replaced with simply
checking for the request being a discard.

Largely based on a previous patch from Matthew Wilcox <matthew@wil.cx>
which did the prepare_discard_fn but not the different payload allocation
yet.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01 21:19:30 +02:00
Zdenek Kabelac
48c0d4d4c0 Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
introduced in commit 1d54ad6da9.
Release kobject also in case the request_fn is NULL.

Problem was noticed via kmemleak backtrace when some sysfs entries were
note properly destroyed during  device removal:

unreferenced object 0xffff88001aa76640 (size 80):
  comm "lvcreate", pid 2120, jiffies 4294885144
  hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 f0 65 a7 1a 00 88 ff ff  .........e......
    90 66 a7 1a 00 88 ff ff 86 1d 53 81 ff ff ff ff  .f........S.....
  backtrace:
    [<ffffffff813f9cc6>] kmemleak_alloc+0x26/0x60
    [<ffffffff8111d693>] kmem_cache_alloc+0x133/0x1c0
    [<ffffffff81195891>] sysfs_new_dirent+0x41/0x120
    [<ffffffff81194b0c>] sysfs_add_file_mode+0x3c/0xb0
    [<ffffffff81197c81>] internal_create_group+0xc1/0x1a0
    [<ffffffff81197d93>] sysfs_create_group+0x13/0x20
    [<ffffffff810d8004>] blk_trace_init_sysfs+0x14/0x20
    [<ffffffff8123f45c>] blk_register_queue+0x3c/0xf0
    [<ffffffff812447e4>] add_disk+0x94/0x160
    [<ffffffffa00d8b08>] dm_create+0x598/0x6e0 [dm_mod]
    [<ffffffffa00de951>] dev_create+0x51/0x350 [dm_mod]
    [<ffffffffa00de823>] ctl_ioctl+0x1a3/0x240 [dm_mod]
    [<ffffffffa00de8f2>] dm_compat_ctl_ioctl+0x12/0x20 [dm_mod]
    [<ffffffff81177bfd>] compat_sys_ioctl+0xcd/0x4f0
    [<ffffffff81036ed8>] sysenter_dispatch+0x7/0x2c
    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Zdenek Kabelac <zkabelac@redhat.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01 21:15:46 +02:00
Linus Torvalds
817b33d38f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  ax25: Fix possible oops in ax25_make_new
  net: restore tx timestamping for accelerated vlans
  Phonet: fix mutex imbalance
  sit: fix off-by-one in ipip6_tunnel_get_prl
  net: Fix sock_wfree() race
  net: Make setsockopt() optlen be unsigned.
2009-09-30 17:36:45 -07:00
David S. Miller
b7058842c9 net: Make setsockopt() optlen be unsigned.
This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.

Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-30 16:12:20 -07:00
Linus Torvalds
e399835c34 Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  MIPS: Avoid spurious make includecheck message
  MIPS: VPE: Get rid of BKL.
  MIPS: VPE: Fix build after the credential changes a while ago.
  MIPS: Excite: Get rid of BKL.
  MIPS: Sibyte: Get rid of BKL.
  MIPS: BCM63xx: Add PCMCIA & Cardbus support.
  MIPS: MSP71xx: request_irq() failure ignored in msp_pcibios_config_access()
  MIPS: Decrease size of au1xxx_dbdma_pm_regs[][]
  MIPS: SMP: Inline arch_send_call_function_{single_ipi,ipi_mask}
  MIPS: SMP: Fix build.
  MIPS: MIPSxx SC: Avoid destructive invalidation on partial L2 cachelines.
  MIPS: Sibyte: Fix compilation error.
  MIPS: BCM1480: Re-apply patch lost due to bad resolution of merge conflict.
  MIPS: BCM63xx: Add serial driver for bcm63xx integrated UART.
  MIPS: Loongson2: Fix typo "enalbe" -> "enable"
  MIPS: SMTC: Remove duplicate structure field initialization
  MIPS: Remove duplicated #include
  MIPS: BCM63xx: Remove duplicated #include
2009-09-30 13:46:56 -07:00
Maxime Bizon
9fcd66e572 MIPS: BCM63xx: Add serial driver for bcm63xx integrated UART.
Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2009-09-30 21:46:59 +02:00
Linus Torvalds
9f44fdc518 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: Fix time encoding with extra epoch bits
  ext4: Add a stub for mpage_da_data in the trace header
  jbd2: Use tracepoints for history file
  ext4: Use tracepoints for mb_history trace file
  ext4, jbd2: Drop unneeded printks at mount and unmount time
  ext4: Handle nested ext4_journal_start/stop calls without a journal
  ext4: Make sure ext4_dirty_inode() updates the inode in no journal mode
  ext4: Avoid updating the inode table bh twice in no journal mode
  ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first
  ext4: async direct IO for holes and fallocate support
  ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O
  ext4: Split uninitialized extents for direct I/O
  ext4: release reserved quota when block reservation for delalloc retry
  ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks
  ext4: Fix hueristic which avoids group preallocation for closed files
  ext4: Use ext4_msg() for ext4_da_writepage() errors
  ext4: Update documentation about quota mount options
2009-09-30 09:32:30 -07:00
Linus Torvalds
9c1fe834c1 Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM / yenta: Fix cardbus suspend/resume regression
  PM / PCMCIA: Drop second argument of pcmcia_socket_dev_suspend()
2009-09-30 08:11:24 -07:00
Linus Torvalds
5a4c8d75f4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (33 commits)
  sony-laptop: re-read the rfkill state when resuming from suspend
  sony-laptop: check for rfkill hard block at load time
  wext: add back wireless/ dir in sysfs for cfg80211 interfaces
  wext: Add bound checks for copy_from_user
  mac80211: improve/fix mlme messages
  cfg80211: always get BSS
  iwlwifi: fix 3945 ucode info retrieval after failure
  iwlwifi: fix memory leak in command queue handling
  iwlwifi: fix debugfs buffer handling
  cfg80211: don't set privacy w/o key
  cfg80211: wext: don't display BSSID unless associated
  net: Add explicit bound checks in net/socket.c
  bridge: Fix double-free in br_add_if.
  isdn: fix netjet/isdnhdlc build errors
  atm: dereference of he_dev->rbps_virt in he_init_group()
  ax25: Add missing dev_put in ax25_setsockopt
  Revert "sit: stateless autoconf for isatap"
  net: fix double skb free in dcbnl
  net: fix nlmsg len size for skb when error bit is set.
  net: fix vlan_get_size to include vlan_flags size
  ...
2009-09-30 08:07:12 -07:00
Linus Torvalds
e15daf6cdf Merge branch 'drm-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (25 commits)
  drm/radeon/kms: Convert R520 to new init path and associated cleanup
  drm/radeon/kms: Convert RV515 to new init path and associated cleanup
  drm: fix radeon DRM warnings when !CONFIG_DEBUG_FS
  drm: fix drm_fb_helper warning when !CONFIG_MAGIC_SYSRQ
  drm/r600: fix memory leak introduced with 64k malloc avoidance fix.
  drm/kms: make fb helper work for all drivers.
  drm/radeon/r600: fix offset handling in CS parser
  drm/radeon/kms/r600: fix forcing pci mode on agp cards
  drm/radeon/kms: fix for the extra pages copying.
  drm/radeon/kms/r600: add support for vline relocs
  drm/radeon/kms: fix some bugs in vline reloc
  drm/radeon/kms/r600: clamp vram to aperture size
  drm/kms: protect against fb helper not being created.
  drm/r600: get values from the passed in IB not the copy.
  drm: create gitignore file for radeon
  drm/radeon/kms: remove unneeded master create/destroy functions.
  drm/kms: start adding command line interface using fb.
  fb: change rules for global rules match.
  drm/radeon/kms: don't require up to 64k allocations. (v2)
  drm/radeon/kms: enable dac load detection by default.
  ...

Trivial conflicts in drivers/gpu/drm/radeon/radeon_asic.h due to adding
'->vga_set_state' function pointers.
2009-09-30 08:03:00 -07:00
Josh Stone
0ef1224940 ext4: Add a stub for mpage_da_data in the trace header
The tracepoint ext4_da_write_pages has a struct mpage_da_data*
parameter, but that struct is only defined in fs/ext4/ext4.h.  This
patch adds a forward declaration for that struct, so this tracepoint
header can still be used by tools like SystemTap.

This is a continuation of the fix in commit 3661d286.

http://sourceware.org/bugzilla/show_bug.cgi?id=10703

Signed-off-by: Josh Stone <jistone@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-30 00:51:22 -04:00
Theodore Ts'o
bf6993276f jbd2: Use tracepoints for history file
The /proc/fs/jbd2/<dev>/history was maintained manually; by using
tracepoints, we can get all of the existing functionality of the /proc
file plus extra capabilities thanks to the ftrace infrastructure.  We
save memory as a bonus.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-30 00:32:06 -04:00
Theodore Ts'o
296c355cd6 ext4: Use tracepoints for mb_history trace file
The /proc/fs/ext4/<dev>/mb_history was maintained manually, and had a
number of problems: it required a largish amount of memory to be
allocated for each ext4 filesystem, and the s_mb_history_lock
introduced a CPU contention problem.  

By ripping out the mb_history code and replacing it with ftrace
tracepoints, and we get more functionality: timestamps, event
filtering, the ability to correlate mballoc history with other ext4
tracepoints, etc.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-30 00:32:42 -04:00
Rafael J. Wysocki
827b4649d4 PM / PCMCIA: Drop second argument of pcmcia_socket_dev_suspend()
pcmcia_socket_dev_suspend() doesn't use its second argument, so it
may be dropped safely.

This change is necessary for the subsequent yenta suspend/resume fix.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@kernel.org
2009-09-29 00:10:41 +02:00
Johannes Berg
8f1546cadf wext: add back wireless/ dir in sysfs for cfg80211 interfaces
The move away from having drivers assign wireless handlers,
in favour of making cfg80211 assign them, broke the sysfs
registration (the wireless/ dir went missing) because the
handlers are now assigned only after registration, which is
too late.

Fix this by special-casing cfg80211-based devices, all
of which are required to have an ieee80211_ptr, in the
sysfs code, and also using get_wireless_stats() to have
the same values reported as in procfs.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reported-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Tested-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-28 16:55:07 -04:00
Theodore Ts'o
55138e0bc2 ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks
Work around problems in the writeback code to force out writebacks in
larger chunks than just 4mb, which is just too small.  This also works
around limitations in the ext4 block allocator, which can't allocate
more than 2048 blocks at a time.  So we need to defeat the round-robin
characteristics of the writeback code and try to write out as many
blocks in one inode before allowing the writeback code to move on to
another inode.  We add a a new per-filesystem tunable,
max_writeback_mb_bump, which caps this to a default of 128mb per
inode.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-29 13:31:31 -04:00
Dave Airlie
74bf2ad508 drm/kms: make fb helper work for all drivers.
This initialises the fb helper with the connector helper,
so that the fb cmdline code works for intel as well.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-09-28 15:31:10 +10:00
Dave Young
f278a2f7bb tty: Fix regressions caused by commit b50989dc
The following commit made console open fails while booting:

	commit b50989dc44
	Author: Alan Cox <alan@linux.intel.com>
	Date:   Sat Sep 19 13:13:22 2009 -0700

	tty: make the kref destructor occur asynchronously

Due to tty release routines run in a workqueue now, error like the
following will be reported while booting:

INIT open /dev/console Input/output error

It also causes hibernation regression to appear as reported at
http://bugzilla.kernel.org/show_bug.cgi?id=14229

The reason is that now there's latency issue with closing, but when
we open a "closing not finished" tty, -EIO will be returned.

Fix it as per the following Alan's suggestion:

  Fun but it's actually not a bug and the fix is wrong in itself as
  the port may be closing but not yet being destructed, in which case
  it seems to do the wrong thing.  Opening a tty that is closing (and
  could be closing for long periods) is supposed to return -EIO.

  I suspect a better way to deal with this and keep the old console
  timing is to split tty->shutdown into two functions.

  tty->shutdown() - called synchronously just before we dump the tty
  onto the waitqueue for destruction

  tty->cleanup() - called when the destructor runs.

  We would then do the shutdown part which can occur in IRQ context
  fine, before queueing the rest of the release (from tty->magic = 0
  ...  the end) to occur asynchronously

  The USB update in -next would then need a call like

       if (tty->cleanup)
               tty->cleanup(tty);

  at the top of the async function and the USB shutdown to be split
  between shutdown and cleanup as the USB resource cleanup and final
  tidy cannot occur synchronously as it needs to sleep.

  In other words the logic becomes

       final kref put
               make object unfindable

       async
               clean it up

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
[ rjw: Rebased on top of 2.6.31-git, reworked the changelog. ]
Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
[ Changed serial naming to match new rules, dropped tty_shutdown as per
  comments from Alan Stern  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-27 13:35:16 -07:00
Alexey Dobriyan
f0f37e2f77 const: mark struct vm_struct_operations
* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code

But leave TTM code alone, something is fishy there with global vm_ops
being used.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-27 11:39:25 -07:00
Sascha Hlusiak
d1f8297a96 Revert "sit: stateless autoconf for isatap"
This reverts commit 645069299a.

While the code does not actually break anything, it does not completely follow
RFC5214 yet. After talking back with Fred L. Templin, I agree that completing the
ISATAP specific RS/RA code, would pollute the kernel a lot with code that is better
implemented in userspace.

The kernel should not send RS packages for ISATAP at all.

Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Acked-by: Fred L. Templin <Fred.L.Templin@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-26 20:28:07 -07:00
Linus Torvalds
cce1d9f232 Merge branch 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds
* 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds:
  leds: move leds-clevo-mail's probe function to .devinit.text
  leds: Fix indentation in LEDS_LP3944 Kconfig entry
  leds: Fix LED names 
  leds: Fix leds-pca9532 whitespace issues
  leds: fix coding style in worker thread code for ledtrig-gpio.
  leds: gpio-leds: fix typographics fault
  leds: Add WM831x status LED driver
2009-09-26 10:50:47 -07:00
Linus Torvalds
d910fc7860 Merge branch 'for-linus' of git://git.o-hand.com/linux-rpurdie-backlight
* 'for-linus' of git://git.o-hand.com/linux-rpurdie-backlight:
  backlight: new driver for ADP5520/ADP5501 MFD PMICs
  backlight: extend event support to also support poll()
  backlight/eeepc-laptop: Update the backlight state when we change brightness
  backlight/acpi: Update the backlight state when we change brightness
  backlight: Allow drivers to update the core, and generate events on changes
  backlight: switch to da903x driver to dev_pm_ops
  backlight: Add support for the Avionic Design Xanthos backlight device.
  backlight: spi driver for LMS283GF05 LCD
  backlight: move hp680-bl's probe function to .devinit.text
  backlight: Add support for new Apple machines.
  backlight: mbp_nvidia_bl: add support for MacBookAir 1,1
  backlight: Add WM831x backlight driver

Trivial conflicts due to '#ifdef CONFIG_PM' differences in
drivers/video/backlight/da903x_bl.c
2009-09-26 10:49:42 -07:00
Alexey Dobriyan
1d1764c398 headers: kref.h redux
* remove asm/atomic.h inclusion from kref.h -- not needed, linux/types.h
  is enough for atomic_t
* remove linux/kref.h inclusion from files which do not need it.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-26 10:17:19 -07:00
Linus Torvalds
49e70dda35 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf tools: Dont use openat()
  perf tools: Fix buffer allocation
  perf tools: .gitignore += perf*.html
  perf tools: Handle relative paths while loading module symbols
  perf tools: Fix module symbol loading bug
  perf_event, x86: Fix 'perf sched record' crashing the machine
  perf_event: Update PERF_EVENT_FORK header definition
  perf stat: Fix zero total printouts
2009-09-26 10:15:33 -07:00
Linus Torvalds
4187e7e9f1 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  modules, tracing: Remove stale struct marker signature from module_layout()
  tracing/workqueue: Use %pf in workqueue trace events
  tracing: Fix a comment and a trivial format issue in tracepoint.h
  tracing: Fix failure path in ftrace_regex_open()
  tracing: Fix failure path in ftrace_graph_write()
  tracing: Check the return value of trace_get_user()
  tracing: Fix off-by-one in trace_get_user()
2009-09-26 10:13:54 -07:00
Linus Torvalds
76e0134f41 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (32 commits)
  ACPI: i2c-scmi: don't use acpi_device_uid()
  ACPI: simplify building device HID/CID list
  ACPI: remove acpi_device_uid() and related stuff
  ACPI: remove acpi_device.flags.hardware_id
  ACPI: remove acpi_device.flags.compatible_ids
  ACPI: maintain a single list of _HID and _CID IDs
  ACPI: make sure every acpi_device has an ID
  ACPI: use acpi_device_hid() when possible
  ACPI: fix synthetic HID for \_SB_
  ACPI: handle re-enumeration, when acpi_devices might already exist
  ACPI: factor out device type and status checking
  ACPI: add acpi_bus_get_status_handle()
  ACPI: use acpi_walk_namespace() to enumerate devices
  ACPI: identify device tree root by null parent pointer, not ACPI_BUS_TYPE
  ACPI: enumerate namespace before adding functional fixed hardware devices
  ACPI: convert acpi_bus_scan() to operate on an acpi_handle
  ACPI: add acpi_bus_get_parent() and remove "parent" arguments
  ACPI: remove unnecessary argument checking
  ACPI: remove redundant "type" arguments
  ACPI: remove acpi_device_set_context() "type" argument
  ...
2009-09-26 10:12:03 -07:00
Jens Axboe
a72bfd4dea writeback: pass in super_block to bdi_start_writeback()
Sometimes we only want to write pages from a specific super_block,
so allow that to be passed in.

This fixes a problem with commit 56a131dcf7
causing writeback on all super_blocks on a bdi, where we only really
want to sync a specific sb from writeback_inodes_sb().

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-26 00:10:40 +02:00
Bjorn Helgaas
6622d8cee7 ACPI: remove acpi_device_uid() and related stuff
Nobody uses acpi_device_uid(), so this patch removes it.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-09-25 15:09:49 -04:00
Bjorn Helgaas
1131b938f0 ACPI: remove acpi_device.flags.hardware_id
Every acpi_device has at least one ID (if there's no _HID or _CID, we
give it a synthetic or default ID).  So there's no longer a need to
check whether an ID exists; we can just use it.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-09-25 15:09:48 -04:00
Bjorn Helgaas
b2972f8750 ACPI: remove acpi_device.flags.compatible_ids
We now keep a single list of IDs that includes both the _HID and any
_CIDs.  We no longer need to keep track of whether the device has a _CID.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-09-25 15:09:47 -04:00
Bjorn Helgaas
7f47fa6c2f ACPI: maintain a single list of _HID and _CID IDs
There's no need to treat _HID and _CID differently.  Keeping them in
a single list makes code that uses the IDs a little simpler because it
can just traverse the list rather than checking "do we have a HID?",
"do we have any CIDs?"

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Reviewed-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-09-25 15:09:31 -04:00
Bjorn Helgaas
402ac53614 ACPI: add acpi_bus_get_status_handle()
Add acpi_bus_get_status_handle() so we can get the status of a namespace
object before building a struct acpi_device.

This removes a use of "device->flags.dynamic_status", a cached indicator of
whether _STA exists.  It seems simpler and more reliable to just evaluate
_STA and catch AE_NOT_FOUND errors.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-09-25 14:24:30 -04:00
Bjorn Helgaas
859ac9a4be ACPI: identify device tree root by null parent pointer, not ACPI_BUS_TYPE
We can identify the root of the ACPI device tree by the fact that it
has no parent.  This is simpler than passing around ACPI_BUS_TYPE_SYSTEM
and will help remove special treatment of the device tree root.

Currently, we add the root by hand with ACPI_BUS_TYPE_SYSTEM.  If we
traverse the tree treating the root as just another device and use
acpi_get_type(), the root shows up as ACPI_TYPE_DEVICE.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-09-25 14:24:29 -04:00
Bjorn Helgaas
caaa6efb3d ACPI: save device_type in acpi_device
Most uses of the ACPI bus device_type (ACPI_BUS_TYPE_DEVICE,
ACPI_BUS_TYPE_POWER, etc) are during device initialization, but
we do need it later for notify handler installation, since that
is different for fixed hardware devices vs. namespace devices.

This patch saves the device_type in the acpi_device structure,
so we can check that rather than comparing against the _HID string.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-09-25 14:24:25 -04:00
Linus Torvalds
5c3cc2084d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (94 commits)
  genetlink: fix netns vs. netlink table locking (2)
  3c59x: Get rid of "Trying to free already-free IRQ"
  tunnel: eliminate recursion field
  ems_pci: fix size of CAN controllers BAR mapping for CPC-PCI v2
  net: fix htmldocs sunrpc, clnt.c
  Phonet: error on broadcast sending (unimplemented)
  Phonet: fix race for port number in concurrent bind()
  pktgen: better scheduler friendliness
  pktgen: T_TERMINATE flag is unused
  ipv4: check optlen for IP_MULTICAST_IF option
  ath9k: Initialize txgain and rxgain for newer AR9287 chipsets.
  iwlagn: fix panic in iwl{5000,4965}_rx_reply_tx
  ath9k: Fix RFKILL bugs
  drivers/net/wireless: Use usb_endpoint_dir_out
  cfg80211: don't overwrite privacy setting
  wl12xx: fix kconfig/link errors
  rt2x00: fix the definition of rt2x00crypto_rx_insert_iv
  iwlwifi: reduce noise when skb allocation fails
  iwlwifi: do not send sync command while holding spinlock
  mac80211: fix DTIM setting
  ...
2009-09-25 07:22:11 -07:00
Dave Airlie
d50ba256b5 drm/kms: start adding command line interface using fb.
[note this requires an fb patch posted to linux-fbdev-devel already]

This uses the normal video= command line option to control the kms
output setup at boot time. It is used to override the autodetection
done by kms.

video= normally takes a framebuffer as the first parameter, in kms
it will take a connector name, DVI-I-1, or LVDS-1 etc. If no output
connector is specified the mode string will apply to all connectors.

The mode specification used will match down the probed modes, and if
no mode is found it will add a CVT mode that matches.

video=1024x768 - all connectors match a 1024x768 mode or add a CVT on
video=VGA-1:1024x768, VGA-1 connector gets mode only.

The same strings as used in current fb modedb.c are used, except I've
added three more letters, e, D, d, e = enable, D = enable Digital,
d = disable, which allow a connector to be forced into a certain state.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-09-25 13:08:20 +10:00
David Howells
934831d060 NOMMU: Fallback for is_vmalloc_or_module_addr() should be inline
The NOMMU fallback for is_vmalloc_or_module_addr() should be static inline,
not just static, in linux/mm.h.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 17:20:20 -07:00
Tim Abbott
1b2086227c Optimize the ordering of sections in RW_DATA_SECTION.
The old RW_DATA_SECTION had INIT_TASK_DATA (which was
more-than-PAGE_SIZE-aligned), followed by a bunch of small alignment
stuff, followed by more PAGE_SIZE-aligned stuff, so you wasted memory
in the middle of .data re-aligning back up to PAGE_SIZE.

This patch sorts the sections by alignment requirements, which should
pack them essentially optimally.

Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 17:16:21 -07:00
Andrew Morton
e9ea0e2d1d hugetlb_file_setup(): use C, not cpp
Why macros are always wrong:

  mm/mmap.c: In function 'do_mmap_pgoff':
  mm/mmap.c:953: warning: unused variable 'user'

also, move a couple of struct forward-decls outside `#ifdef
CONFIG_HUGETLB_PAGE' - it's pointless and frequently harmful to make these
conditional (eg, this patch needed `struct user_struct').

Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Eric B Munson <ebmunson@us.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 17:11:24 -07:00
Johannes Berg
b8273570f8 genetlink: fix netns vs. netlink table locking (2)
Similar to commit d136f1bd36,
there's a bug when unregistering a generic netlink family,
which is caught by the might_sleep() added in that commit:

    BUG: sleeping function called from invalid context at net/netlink/af_netlink.c:183
    in_atomic(): 1, irqs_disabled(): 0, pid: 1510, name: rmmod
    2 locks held by rmmod/1510:
     #0:  (genl_mutex){+.+.+.}, at: [<ffffffff8138283b>] genl_unregister_family+0x2b/0x130
     #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8138270c>] __genl_unregister_mc_group+0x1c/0x120
    Pid: 1510, comm: rmmod Not tainted 2.6.31-wl #444
    Call Trace:
     [<ffffffff81044ff9>] __might_sleep+0x119/0x150
     [<ffffffff81380501>] netlink_table_grab+0x21/0x100
     [<ffffffff813813a3>] netlink_clear_multicast_users+0x23/0x60
     [<ffffffff81382761>] __genl_unregister_mc_group+0x71/0x120
     [<ffffffff81382866>] genl_unregister_family+0x56/0x130
     [<ffffffffa0007d85>] nl80211_exit+0x15/0x20 [cfg80211]
     [<ffffffffa000005a>] cfg80211_exit+0x1a/0x40 [cfg80211]

Fix in the same way by grabbing the netlink table lock
before doing rcu_read_lock().

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-24 15:44:05 -07:00
Eric Dumazet
a43912ab19 tunnel: eliminate recursion field
It seems recursion field from "struct ip_tunnel" is not anymore needed.
recursion prevention is done at the upper level (in dev_queue_xmit()),
since we use HARD_TX_LOCK protection for tunnels.

This avoids a cache line ping pong on "struct ip_tunnel" : This structure
should be now mostly read on xmit and receive paths.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-24 15:39:22 -07:00
Rémi Denis-Courmont
18a1166de9 Phonet: error on broadcast sending (unimplemented)
If we ever implement this, then we can stop returning an error.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-24 15:38:57 -07:00
David S. Miller
8b3f6af863 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/
Conflicts:
	drivers/staging/Kconfig
	drivers/staging/Makefile
	drivers/staging/cpc-usb/TODO
	drivers/staging/cpc-usb/cpc-usb_drv.c
	drivers/staging/cpc-usb/cpc.h
	drivers/staging/cpc-usb/cpc_int.h
	drivers/staging/cpc-usb/cpcusb.h
2009-09-24 15:13:11 -07:00
Russell King
baea7b946f Merge branch 'origin' into for-linus
Conflicts:
	MAINTAINERS
2009-09-24 21:22:33 +01:00
Linus Torvalds
94e0fb086f Merge branch 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel
* 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel: (57 commits)
  drm/i915: Handle ERESTARTSYS during page fault
  drm/i915: Warn before mmaping a purgeable buffer.
  drm/i915: Track purged state.
  drm/i915: Remove eviction debug spam
  drm/i915: Immediately discard any backing storage for uneeded objects
  drm/i915: Do not mis-classify clean objects as purgeable
  drm/i915: Whitespace correction for madv
  drm/i915: BUG_ON page refleak during unbind
  drm/i915: Search harder for a reusable object
  drm/i915: Clean up evict from list.
  drm/i915: Add tracepoints
  drm/i915: framebuffer compression for GM45+
  drm/i915: split display functions by chip type
  drm/i915: Skip the sanity checks if the current relocation is valid
  drm/i915: Check that the relocation points to within the target
  drm/i915: correct FBC update when pipe base update occurs
  drm/i915: blacklist Acer AspireOne lid status
  ACPI: make ACPI button funcs no-ops if not built in
  drm/i915: prevent FIFO calculation overflows on 32 bits with high dotclocks
  drm/i915: intel_display.c handle latency variable efficiently
  ...

Fix up trivial conflicts in drivers/gpu/drm/i915/{i915_dma.c|i915_drv.h}
2009-09-24 10:30:41 -07:00
Linus Torvalds
b7f21bb2e2 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (21 commits)
  x86/PCI: make 32 bit NUMA node array int, not unsigned char
  x86/PCI: default pcibus cpumask to all cpus if it lacks affinity
  MAINTAINTERS: remove hotplug driver entries
  PCI: pciehp: remove slot capabilities definitions
  PCI: pciehp: remove error message definitions
  PCI: pciehp: remove number field
  PCI: pciehp: remove hpc_ops
  PCI: pciehp: remove pci_dev field
  PCI: pciehp: remove crit_sect mutex
  PCI: pciehp: remove slot_bus field
  PCI: pciehp: remove first_slot field
  PCI: pciehp: remove slot_device_offset field
  PCI: pciehp: remove hp_slot field
  PCI: pciehp: remove device field
  PCI: pciehp: remove bus field
  PCI: pciehp: remove slot_num_inc field
  PCI: pciehp: remove num_slots field
  PCI: pciehp: remove slot_list field
  PCI: fix VGA arbiter header file
  PCI: Disable AER with pci=nomsi
  ...

Fixed up trivial conflicts in MAINTAINERS
2009-09-24 09:57:08 -07:00
Linus Torvalds
2c9871de0a Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  module: don't call percpu_modfree on NULL pointer.
  module: fix memory leak when load fails after srcversion/version allocated
  module: preferred way to use MODULE_AUTHOR
  param: allow whitespace as kernel parameter separator
  module: reduce string table for loaded modules (v2)
  module: reduce symbol table for loaded modules (v2)
2009-09-24 09:01:05 -07:00
Linus Torvalds
6c5daf012c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  truncate: use new helpers
  truncate: new helpers
  fs: fix overflow in sys_mount() for in-kernel calls
  fs: Make unload_nls() NULL pointer safe
  freeze_bdev: grab active reference to frozen superblocks
  freeze_bdev: kill bd_mount_sem
  exofs: remove BKL from super operations
  fs/romfs: correct error-handling code
  vfs: seq_file: add helpers for data filling
  vfs: remove redundant position check in do_sendfile
  vfs: change sb->s_maxbytes to a loff_t
  vfs: explicitly cast s_maxbytes in fiemap_check_ranges
  libfs: return error code on failed attr set
  seq_file: return a negative error code when seq_path_root() fails.
  vfs: optimize touch_time() too
  vfs: optimization for touch_atime()
  vfs: split generic_forget_inode() so that hugetlbfs does not have to copy it
  fs/inode.c: add dev-id and inode number for debugging in init_special_inode()
  libfs: make simple_read_from_buffer conventional
2009-09-24 08:32:11 -07:00
Johannes Berg
1d7015caa0 module: preferred way to use MODULE_AUTHOR
For the longest time now we've been using multiple MODULE_AUTHOR()
statements when a module has more than one author, but the comment here
disagrees.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Luciano Coelho <luciano.coelho@nokia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-25 00:32:58 +09:30
Jan Beulich
554bdfe5ac module: reduce string table for loaded modules (v2)
Also remove all parts of the string table (referenced by the symbol
table) that are not needed for kallsyms use (i.e. which were only
referenced by symbols discarded by the previous patch, or not
referenced at all for whatever reason).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-25 00:32:57 +09:30
Jan Beulich
4a4962263f module: reduce symbol table for loaded modules (v2)
Discard all symbols not interesting for kallsyms use: absolute,
section, and in the common case (!KALLSYMS_ALL) data ones.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-25 00:32:57 +09:30
Linus Torvalds
a487b6705a Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md: (97 commits)
  md: raid-1/10: fix RW bits manipulation
  md: remove unnecessary memset from multipath.
  md: report device as congested when suspended
  md: Improve name of threads created by md_register_thread
  md: remove sparse warnings about lock context.
  md: remove sparse waring "symbol xxx shadows an earlier one"
  async_tx/raid6: add missing dma_unmap calls to the async fail case
  ioat3: fix uninitialized var warnings
  drivers/dma/ioat/dma_v2.c: fix warnings
  raid6test: fix stack overflow
  ioat2: clarify ring size limits
  md/raid6: cleanup ops_run_compute6_2
  md/raid6: eliminate BUG_ON with side effect
  dca: module load should not be an error message
  ioat: driver version 4.0
  dca: registering requesters in multiple dca domains
  async_tx: remove HIGHMEM64G restriction
  dmaengine: sh: Add Support SuperH DMA Engine driver
  dmaengine: Move all map_sg/unmap_sg for slave channel to its client
  fsldma: Add DMA_SLAVE support
  ...
2009-09-24 07:55:29 -07:00
Linus Torvalds
db16826367 Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6
* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
  HWPOISON: Enable error_remove_page on btrfs
  HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
  HWPOISON: Add madvise() based injector for hardware poisoned pages v4
  HWPOISON: Enable error_remove_page for NFS
  HWPOISON: Enable .remove_error_page for migration aware file systems
  HWPOISON: The high level memory error handler in the VM v7
  HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
  HWPOISON: shmem: call set_page_dirty() with locked page
  HWPOISON: Define a new error_remove_page address space op for async truncation
  HWPOISON: Add invalidate_inode_page
  HWPOISON: Refactor truncate to allow direct truncating of page v2
  HWPOISON: check and isolate corrupted free pages v2
  HWPOISON: Handle hardware poisoned pages in try_to_unmap
  HWPOISON: Use bitmask/action code for try_to_unmap behaviour
  HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
  HWPOISON: Add poison check to page fault handling
  HWPOISON: Add basic support for poisoned pages in fault handler v3
  HWPOISON: Add new SIGBUS error codes for hardware poison signals
  HWPOISON: Add support for poison swap entries v2
  HWPOISON: Export some rmap vma locking to outside world
  ...
2009-09-24 07:53:22 -07:00
Hiroshi Shimamoto
801460d0cf task_struct cleanup: move binfmt field to mm_struct
Because the binfmt is not different between threads in the same process,
it can be moved from task_struct to mm_struct.  And binfmt moudle is
handled per mm_struct instead of task_struct.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:05 -07:00
Albin Tonnerre
2fa4341074 include/linux/unaligned/{l,b}e_byteshift.h: fix usage for compressed kernels
When unaligned accesses are required for uncompressing a kernel (such as
for LZO decompression on ARM in a patch that follows), including
<linux/kernel.h> causes issues as it brings in a lot of things that are
not available in the decompression environment.

linux/kernel.h brings at least:
extern int console_printk[];
extern const char hex_asc[];
which causes errors at link-time as they are not available when
compiling the pre-boot environement. There are also a few others:

  arch/arm/boot/compressed/misc.o: In function `valid_user_regs':
   arch/arm/include/asm/ptrace.h:158: undefined reference to `elf_hwcap'
  arch/arm/boot/compressed/misc.o: In function `console_silent':
   include/linux/kernel.h:292: undefined reference to `console_printk'
  arch/arm/boot/compressed/misc.o: In function `console_verbose':
   include/linux/kernel.h:297: undefined reference to `console_printk'
  arch/arm/boot/compressed/misc.o: In function `pack_hex_byte':
   include/linux/kernel.h:360: undefined reference to `hex_asc'
  arch/arm/boot/compressed/misc.o: In function `hweight_long':
   include/linux/bitops.h:45: undefined reference to `hweight32'
  arch/arm/boot/compressed/misc.o: In function `__cmpxchg_local_generic':
   include/asm-generic/cmpxchg-local.h:21: undefined reference to `wrong_size_cmpxchg'
   include/asm-generic/cmpxchg-local.h:42: undefined reference to `wrong_size_cmpxchg'
  arch/arm/boot/compressed/misc.o: In function `__xchg':
   arch/arm/include/asm/system.h:309: undefined reference to `__bad_xchg'

However, those files apparently use nothing from <linux/kernel.h>, all
they need is the declaration of types such as u32 or u64, so
<linux/types.h> should be enough

Signed-off-by: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:05 -07:00
Alexey Dobriyan
858f09930b aio: ifdef fields in mm_struct
->ioctx_lock and ->ioctx_list are used only under CONFIG_AIO.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:05 -07:00
Mike Frysinger
9064a6787a linux/futex.h: place kernel types behind __KERNEL__
The forward decls for some kernel types are only needed by the code behind
__KERNEL__, so don't bleed these types to userspace.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:04 -07:00
Alexey Dobriyan
8d65af789f sysctl: remove "struct file *" argument of ->proc_handler
It's unused.

It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.

It _was_ used in two places at arch/frv for some reason.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:04 -07:00
Roland McGrath
d9588725e5 signals: inline __fatal_signal_pending
__fatal_signal_pending inlines to one instruction on x86, probably two
instructions on other machines.  It takes two longer x86 instructions just
to call it and test its return value, not to mention the function itself.

On my random x86_64 config, this saved 70 bytes of text (59 of those being
__fatal_signal_pending itself).

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:01 -07:00
Peter Zijlstra
ba0a6c9f6f fcntl: add F_[SG]ETOWN_EX
In order to direct the SIGIO signal to a particular thread of a
multi-threaded application we cannot, like suggested by the manpage, put a
TID into the regular fcntl(F_SETOWN) call.  It will still be send to the
whole process of which that thread is part.

Since people do want to properly direct SIGIO we introduce F_SETOWN_EX.

The need to direct SIGIO comes from self-monitoring profiling such as with
perf-counters.  Perf-counters uses SIGIO to notify that new sample data is
available.  If the signal is delivered to the same task that generated the
new sample it can augment that data by inspecting the task's user-space
state right after it returns from the kernel.  This is esp.  convenient
for interpreted or virtual machine driven environments.

Both F_SETOWN_EX and F_GETOWN_EX take a pointer to a struct f_owner_ex
as argument:

struct f_owner_ex {
	int   type;
	pid_t pid;
};

Where type is one of F_OWNER_TID, F_OWNER_PID or F_OWNER_GID.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: stephane eranian <eranian@googlemail.com>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:01 -07:00
Oleg Nesterov
4a30debfb7 signals: introduce do_send_sig_info() helper
Introduce do_send_sig_info() and convert group_send_sig_info(),
send_sig_info(), do_send_specific() to use this helper.

Hopefully it will have more users soon, it allows to specify
specific/group behaviour via "bool group" argument.

Shaves 80 bytes from .text.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:01 -07:00
Oleg Nesterov
964ee7df90 exec: fix set_binfmt() vs sys_delete_module() race
sys_delete_module() can set MODULE_STATE_GOING after
search_binary_handler() does try_module_get().  In this case
set_binfmt()->try_module_get() fails but since none of the callers
check the returned error, the task will run with the wrong old
->binfmt.

The proper fix should change all ->load_binary() methods, but we can
rely on fact that the caller must hold a reference to binfmt->module
and use __module_get() which never fails.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:01 -07:00
Roland McGrath
ae6d2ed7bb signals: tracehook_notify_jctl change
This changes tracehook_notify_jctl() so it's called with the siglock held,
and changes its argument and return value definition.  These clean-ups
make it a better fit for what new tracing hooks need to check.

Tracing needs the siglock here, held from the time TASK_STOPPED was set,
to avoid potential SIGCONT races if it wants to allow any blocking in its
tracing hooks.

This also folds the finish_stop() function into its caller
do_signal_stop().  The function is short, called only once and only
unconditionally.  It aids readability to fold it in.

[oleg@redhat.com: do not call tracehook_notify_jctl() in TASK_STOPPED state]
[oleg@redhat.com: introduce tracehook_finish_jctl() helper]
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:00 -07:00
Oleg Nesterov
a7f0765edf ptrace: __ptrace_detach: do __wake_up_parent() if we reap the tracee
The bug is old, it wasn't cause by recent changes.

Test case:

	static void *tfunc(void *arg)
	{
		int pid = (long)arg;

		assert(ptrace(PTRACE_ATTACH, pid, NULL, NULL) == 0);
		kill(pid, SIGKILL);

		sleep(1);
		return NULL;
	}

	int main(void)
	{
		pthread_t th;
		long pid = fork();

		if (!pid)
			pause();

		signal(SIGCHLD, SIG_IGN);
		assert(pthread_create(&th, NULL, tfunc, (void*)pid) == 0);

		int r = waitpid(-1, NULL, __WNOTHREAD);
		printf("waitpid: %d %m\n", r);

		return 0;
	}

Before the patch this program hangs, after this patch waitpid() correctly
fails with errno == -ECHILD.

The problem is, __ptrace_detach() reaps the EXIT_ZOMBIE tracee if its
->real_parent is our sub-thread and we ignore SIGCHLD.  But in this case
we should wake up other threads which can sleep in do_wait().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Vitaly Mayatskikh <vmayatsk@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:20:59 -07:00
Balbir Singh
4e41695356 memory controller: soft limit reclaim on contention
Implement reclaim from groups over their soft limit

Permit reclaim from memory cgroups on contention (via the direct reclaim
path).

memory cgroup soft limit reclaim finds the group that exceeds its soft
limit by the largest number of pages and reclaims pages from it and then
reinserts the cgroup into its correct place in the rbtree.

Add additional checks to mem_cgroup_hierarchical_reclaim() to detect long
loops in case all swap is turned off.  The code has been refactored and
the loop check (loop < 2) has been enhanced for soft limits.  For soft
limits, we try to do more targetted reclaim.  Instead of bailing out after
two loops, the routine now reclaims memory proportional to the size by
which the soft limit is exceeded.  The proportion has been empirically
determined.

[akpm@linux-foundation.org: build fix]
[kamezawa.hiroyu@jp.fujitsu.com: fix softlimit css refcnt handling]
[nishimura@mxp.nes.nec.co.jp: refcount of the "victim" should be decremented before exiting the loop]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:20:59 -07:00
Balbir Singh
f64c3f5494 memory controller: soft limit organize cgroups
Organize cgroups over soft limit in a RB-Tree

Introduce an RB-Tree for storing memory cgroups that are over their soft
limit.  The overall goal is to

1. Add a memory cgroup to the RB-Tree when the soft limit is exceeded.
   We are careful about updates, updates take place only after a particular
   time interval has passed
2. We remove the node from the RB-Tree when the usage goes below the soft
   limit

The next set of patches will exploit the RB-Tree to get the group that is
over its soft limit by the largest amount and reclaim from it, when we
face memory contention.

[hugh.dickins@tiscali.co.uk: CONFIG_CGROUP_MEM_RES_CTLR=y CONFIG_PREEMPT=y fails to boot]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:20:59 -07:00
Balbir Singh
296c81d89f memory controller: soft limit interface
Add an interface to allow get/set of soft limits.  Soft limits for memory
plus swap controller (memsw) is currently not supported.  Resource
counters have been enhanced to support soft limits and new type
RES_SOFT_LIMIT has been added.  Unlike hard limits, soft limits can be
directly set and do not need any reclaim or checks before setting them to
a newer value.

Kamezawa-San raised a question as to whether soft limit should belong to
res_counter.  Since all resources understand the basic concepts of hard
and soft limits, it is justified to add soft limits here.  Soft limits are
a generic resource usage feature, even file system quotas support soft
limits.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:20:59 -07:00
Balbir Singh
4b3bde4c98 memcg: remove the overhead associated with the root cgroup
Change the memory cgroup to remove the overhead associated with accounting
all pages in the root cgroup.  As a side-effect, we can no longer set a
memory hard limit in the root cgroup.

A new flag to track whether the page has been accounted or not has been
added as well.  Flags are now set atomically for page_cgroup,
pcg_default_flags is now obsolete and removed.

[akpm@linux-foundation.org: fix a few documentation glitches]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:20:58 -07:00
Ben Blum
be367d0992 cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time
Alter the ss->can_attach and ss->attach functions to be able to deal with
a whole threadgroup at a time, for use in cgroup_attach_proc.  (This is a
pre-patch to cgroup-procs-writable.patch.)

Currently, new mode of the attach function can only tell the subsystem
about the old cgroup of the threadgroup leader.  No subsystem currently
needs that information for each thread that's being moved, but if one were
to be added (for example, one that counts tasks within a group) this bit
would need to be reworked a bit to tell the subsystem the right
information.

[hidave.darkstar@gmail.com: fix build]
Signed-off-by: Ben Blum <bblum@google.com>
Signed-off-by: Paul Menage <menage@google.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Reviewed-by: Matt Helsley <matthltc@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:20:58 -07:00
Ben Blum
c378369d8b cgroups: change css_set freeing mechanism to be under RCU
Changes css_set freeing mechanism to be under RCU

This is a prepatch for making the procs file writable. In order to free the
old css_sets for each task to be moved as they're being moved, the freeing
mechanism must be RCU-protected, or else we would have to have a call to
synchronize_rcu() for each task before freeing its old css_set.

Signed-off-by: Ben Blum <bblum@google.com>
Signed-off-by: Paul Menage <menage@google.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:20:58 -07:00